Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@
},
{
"cell_type": "markdown",
"source": "# Configuration settings for scaling to larger data\n\n## Number and size of nodes in our Kubernetes cluster\nWe can control the number and size of nodes in our Kubernetes cluster via the node-vm-size and node-count switches in our `aks create` command:\n\n`az aks create --name mycluster --resource-group myrg --generate-ssh-keys --node-vm-size Standard_DS14_v2 --node-count 3 --kubernetes-version 1.10.9`\n\nMore information is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-on-aks?view=sqlallproducts-allversions#create-a-kubernetes-cluster).\n\n## Number of Spark pods\nWe can control the number of Spark pods via the CLUSTER_STORAGE_POOL_REPLICAS environment variable used by `mssqlctl create cluster`:\n\nSET CLUSTER_STORAGE_POOL_REPLICAS=2\n\n## YARN scheduler memory and cores\nWe can control the YARN scheduler memory and cores via the following environment variable used by `mssqlctl create cluster`:\n\n- YARN_SCHEDULER_MAX_MEMORY\n- YARN_SCHEDULER_MAX_VCORES\n- YARN_NODEMANAGER_RESOURCE_MEMORY\n- YARN_NODEMANAGER_RESOURCE_VCORES\n\nFurther information regarding mssqlctl environtment variables is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#define-environment-variables).\n\n## Livy timeout\nThe Livy timeout sets a limit on the runtime of a cell in a PySpark3 Jupyter notebook. In SQL Server 2019 Big Data CTP 2.1, the Livy timeout defaults to 1 hour. In CTP 2.2, it defaults to 24 days. One can modify this as follows:\n\n- Log into the mssql-master-pool-0 pod using this command (requires permission to run kubectl):\n\n```\nkubectl exec -it mssql-master-pool-0 -n <your-cluster-name> -- /bin/bash\n```\n- To set the Livy timeout to 24 days, run the following command or edit /livy/conf/livy.conf accordingly:\n\n```\necho 'livy.server.session.timeout = 24d' | cat >> /livy/conf/livy.conf \n```\n- Then restart the Livy server by running the following command:\n\n```\nsupervisorctl restart livy\n```",
"source": "# Configuration settings for scaling to larger data\n\n## Number and size of nodes in our Kubernetes cluster\nWe can control the number and size of nodes in our Kubernetes cluster via the node-vm-size and node-count switches in our `aks create` command:\n\n`az aks create --name mycluster --resource-group myrg --generate-ssh-keys --node-vm-size Standard_DS14_v2 --node-count 3 --kubernetes-version 1.10.9`\n\nMore information is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-on-aks?view=sqlallproducts-allversions#create-a-kubernetes-cluster).\n\n## Number of Spark pods\nWe can control the number of Spark pods via the CLUSTER_STORAGE_POOL_REPLICAS environment variable used by `mssqlctl create cluster`:\n\nSET CLUSTER_STORAGE_POOL_REPLICAS=2\n\n## YARN scheduler memory and cores\nWe can control the YARN scheduler memory and cores via the following environment variable used by `mssqlctl create cluster`:\n\n- YARN_SCHEDULER_MAX_MEMORY\n- YARN_SCHEDULER_MAX_VCORES\n- YARN_NODEMANAGER_RESOURCE_MEMORY\n- YARN_NODEMANAGER_RESOURCE_VCORES\n\nFurther information regarding mssqlctl environtment variables is available [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#define-environment-variables).\n\nIn CTP 2.5 and later, these environment variables are replaced by similarly named properties in a JSON file. See [Custom configurations](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sqlallproducts-allversions#customconfig).\n\n## Livy timeout\nThe Livy timeout sets a limit on the runtime of a cell in a PySpark3 Jupyter notebook. In SQL Server 2019 Big Data CTP 2.1, the Livy timeout defaults to 1 hour. In CTP 2.2, it defaults to 24 days. One can modify this as follows:\n\n- Log into the mssql-master-pool-0 pod using this command (requires permission to run kubectl):\n\n```\nkubectl exec -it mssql-master-pool-0 -n <your-cluster-name> -- /bin/bash\n```\n- To set the Livy timeout to 24 days, run the following command or edit /livy/conf/livy.conf accordingly:\n\n```\necho 'livy.server.session.timeout = 24d' | cat >> /livy/conf/livy.conf \n```\n- Then restart the Livy server by running the following command:\n\n```\nsupervisorctl restart livy\n```",
"metadata": {}
},
{
Expand Down