diff --git a/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb b/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb
index 2f1d39ac8b..bd9b8dedf6 100644
--- a/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb
+++ b/samples/features/sql-big-data-cluster/spark/config-install/installpackage_Spark.ipynb
@@ -16,76 +16,125 @@
"cells": [
{
"cell_type": "markdown",
- "source": "# Packaging in Spark\r\n",
- "metadata": {}
+ "source": [
+ "
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "# **Spark Package Management in SQL Server 2019 Big Data Clusters**\n",
+ "This guide covers installing packages and submitting jobs to a SQL Server 2019 Big Data Cluster using Spark.\n",
+ "* Built-In Tools\n",
+ "* Install Packages from a Maven Repository onto the Spark Cluster at Runtime\n",
+ "* Import .jar from HDFS for use at runtime\n",
+ "* Import .jar at runtime through Azure Data Studio notebook cell configuration\n",
+ "* Install Python Packages at Runtime for use with PySpark \n",
+ "* Submit local .jar or python file\n",
+ ""
+ ],
+ "metadata": {
+ "azdata_cell_guid": "cbc8ced8-8931-4302-b252-7e7e478a16d4"
+ }
},
{
"cell_type": "markdown",
- "source": "## Use Case 1: I can have key packages in boxed\r\n - All pacakges that come with spark and hadoop distribution\r\n - Python3.5 and Python 2.7\r\n - Pandas, Sklearn and several other supporting ml packages\r\n - R and supporting pacakges as part of MRO\r\n - sparklyr\r\n\r\n \r\n ",
- "metadata": {}
+ "source": [
+ "# Built-in Tools\n",
+ "* Spark and Hadoop base packages\n",
+ "* Python 3.5 and Python 2.7\n",
+ "* Pandas, Sklearn, Numpy, and other data processing packages.\n",
+ "* R and MRO packages\n",
+ "* Sparklyr\n",
+ ""
+ ],
+ "metadata": {
+ "azdata_cell_guid": "2fc8a069-115e-4d9b-bedc-5c55f79466b1"
+ }
},
{
"cell_type": "markdown",
- "source": "## Use Case 2: I can install pacakges from maven repo to my spark cluster\r\nMaven central is a source of lot of packages. A lot of spark ecosystem pacakges are availble there. These pacakages can be installed to your spark cluster using notebook cell configuration at the start of your spark session.\r\n",
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": "%%configure -f\n{\"conf\": {\"spark.jars.packages\": \"com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.1\"}}",
- "metadata": {
- "language": "scala"
- },
- "outputs": [
- {
- "output_type": "display_data",
- "data": {
- "text/plain": "",
- "text/html": "Current session configs: {'conf': {'spark.jars.packages': 'com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.50'}, 'kind': 'spark'}
"
- },
- "metadata": {}
- },
- {
- "output_type": "display_data",
- "data": {
- "text/plain": "",
- "text/html": "No active sessions."
- },
- "metadata": {}
- }
+ "source": [
+ "# Install Packages from a Maven Repository onto the Spark Cluster at Runtime\r\n",
+ "Maven packages can be installed onto your Spark cluster using notebook cell configuration at the start of your spark session. Before starting a spark session in Azure Data Studio, run the following code:\r\n",
+ "\r\n",
+ "```\r\n",
+ "%%configure -f` \\\r\n",
+ "{\"conf\": {\"spark.jars.packages\": \"com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.1\"}}\r\n",
+ "```\r\n",
+ ""
],
- "execution_count": 3
+ "metadata": {
+ "azdata_cell_guid": "a0fecc05-f094-4dda-9afe-0de8ddad87eb"
+ }
},
{
- "cell_type": "code",
- "source": "import com.microsoft.azure.eventhubs._",
- "metadata": {},
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "import com.microsoft.azure.eventhubs._\n"
- }
+ "cell_type": "markdown",
+ "source": [
+ "# Import .jar from HDFS for use at runtime\n",
+ "\n",
+ "Import jar at runtime through Azure Data Studio notebook cell configuration.\n",
+ "\n",
+ "```\n",
+ "%%configure -f\n",
+ "{\"conf\": {\"spark.jars\": \"/jar/mycodeJar.jar\"}}\n",
+ "```\n",
+ ""
],
- "execution_count": 5
+ "metadata": {
+ "azdata_cell_guid": "c5e65fa2-faf0-4e22-aac1-69d7ff8c9989"
+ }
},
{
"cell_type": "markdown",
- "source": "## Use Case 3: I have a local jar that i want to run in the spark cluster\r\nAs a user you may build your own customer pacakges that want to run as part of your spark jobs. These pacakges can be uploaded as HDFS and using a notebook configuration spark can consume these pacakges in a jar.\r\n\r\n\r\n",
- "metadata": {}
+ "source": [
+ "# Import .jar at runtime through Azure Data Studio notebook cell configuration\n",
+ "\n",
+ "```\n",
+ "%%configure -f\n",
+ "{\"conf\": {\"spark.jars\": \"/jar/mycodeJar.jar\"}}\n",
+ "```\n",
+ ""
+ ],
+ "metadata": {
+ "azdata_cell_guid": "6fc4085f-e142-4355-b215-148dbf6c5b86"
+ }
},
{
- "cell_type": "code",
- "source": "%%configure -f\r\n {\"conf\": {\"spark.jars\": \"/jar/mycodeJar.jar\"}}",
- "metadata": {},
- "outputs": [],
- "execution_count": 0
+ "cell_type": "markdown",
+ "source": [
+ "# Install Python Packages at Runtime for use with PySpark\n",
+ "\n",
+ "The following code can be used to install packages on each executor node at runtime. \\\n",
+ "**Note**: This installation is temporary, and must be performed each time a new Spark session is invoked.\n",
+ "\n",
+ "``` Python\n",
+ "import subprocess\n",
+ "\n",
+ "# Install TensorFlow\n",
+ "stdout = subprocess.check_output(\n",
+ " \"pip3 install tensorflow\",\n",
+ " stderr=subprocess.STDOUT,\n",
+ " shell=True).decode(\"utf-8\")\n",
+ "print(stdout)\n",
+ "```"
+ ],
+ "metadata": {
+ "azdata_cell_guid": "07944b55-7266-4fcd-8e9b-9fd6cb8cfef5"
+ }
},
{
- "cell_type": "code",
- "source": "import com.my.mycodeJar._",
- "metadata": {},
- "outputs": [],
- "execution_count": 0
+ "cell_type": "markdown",
+ "source": [
+ "# Submit local .jar or python file\r\n",
+ "One of the key scenarios for big data clusters is the ability to submit Spark jobs for SQL Server. The Spark job submission feature allows you to submit a local Jar or Py files with references to SQL Server 2019 big data cluster. It also enables you to execute a Jar or Py files, which are already located in the HDFS file system.\r\n",
+ "\r\n",
+ "* [Submit Spark jobs on SQL Server Big Data Clusters in Azure Data Studio](https://docs.microsoft.com/en-us/sql/big-data-cluster/spark-submit-job?view=sqlallproducts-allversions)\r\n",
+ "* [Submit Spark jobs on SQL Server Big Data Clusters in IntelliJ](https://docs.microsoft.com/en-us/sql/big-data-cluster/spark-submit-job-intellij-tool-plugin?view=sqlallproducts-allversions)\r\n",
+ "* [Submit Spark jobs on SQL Server big data cluster in Visual Studio Code](https://docs.microsoft.com/en-us/sql/big-data-cluster/spark-hive-tools-vscode?view=sqlallproducts-allversions)\r\n",
+ ""
+ ],
+ "metadata": {
+ "azdata_cell_guid": "7d1b55c0-1961-45f7-8449-a24a913106e4"
+ }
}
]
}
\ No newline at end of file