## Use AML Pipeline for Databricks Deployment

In this example we use the DatabricksSteps to construct an AML pipeline for user to deploy score script in Databricks.

In this notebook you will learn how to:
 1. Create Azure Machine Learning Workspace object.
 2. Create a Databricks compute target.
 3. Construct a DatabricksStep for the python score script in Databricks.
 4. Submit and Publish an AML pipeline to deploy the score scipt.

Before running this notebook:
 1. please install the azureml-sdk for your conda enviroment firstly.
 2. prepare the python score script for deployment. You need to write your own score script. And there is a simple example in https://github.com/linyli001/databricks_score/blob/master/score.py to reference. You can download the example score.py to your local path, or upload it to any dbfs path in your databricks.


### Check the Azure ML Core SDK Version to Validate Your Installation

In [1]:
import os
import azureml.core
from azureml.core.runconfig import JarLibrary
from azureml.core.compute import ComputeTarget, DatabricksCompute
from azureml.exceptions import ComputeTargetException
from azureml.core import Workspace, Run, Experiment
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import DatabricksStep
from azureml.core.datastore import Datastore
from azureml.data.data_reference import DataReference

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.6


# Initialize an Azure ML Workspace

An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:
  1. **workspace_name** - A name for your workspace. You can choose one.
  2. **subscription_id** - Your subscription id. Use the id value from the az account show command output above.
  3. **resource_group** - The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the Azure portal 
  4. **workspace_region** - Supported regions include eastus2, eastus,westcentralus, southeastasia, westeurope, australiaeast, westus2, southcentralus.

In [None]:
subscription_id = "<Your SubscriptionId>" #you should be owner or contributor
resource_group = "<Resource group - new or existing>" #you should be owner or contributor
workspace_name = "<workspace to be created>" #your workspace name
workspace_region = "<azureregion>" #your region

In [None]:
# Import the Workspace class and check the Azure ML SDK version.
from azureml.core import Workspace

ws = Workspace.create(name = workspace_name,
                      subscription_id = subscription_id,
                      resource_group = resource_group, 
                      location = workspace_region,                      
                      exist_ok=True)
ws.get_details()

In [None]:
from azureml.core import Workspace

ws = Workspace(workspace_name = workspace_name,
               subscription_id = subscription_id,
               resource_group = resource_group)

# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.
ws.write_config()

# Attach Databricks compute target
Next, you need to add your Databricks workspace to Azure Machine Learning as a compute target and give it a name. You will use this name to refer to your Databricks workspace compute target inside Azure Machine Learning.
1. **Compute Target** - Any name you given to tag the compute target
2. **Resource Group** - The resource group name of your Databricks workspace  
3. **Databricks Workspace Name** - The workspace name of your Databricks workspace  
4. **Databricks Access Token** - The access token you created in ADB

**The Databricks workspace need to be present in the same subscription as your AML workspace**

In [None]:
db_compute_name = "<DATABRICKS_COMPUTE_NAME>"  # Databricks compute name
db_resource_group = "<DATABRICKS_RESOURCE_GROUP>" # Databricks resource group
db_workspace_name = "<DATABRICKS_WORKSPACE_NAME>" # Databricks workspace name
db_access_token = "<DATABRICKS_ACCESS_TOKEN>" # Databricks access token
 
try:
    databricks_compute = ComputeTarget(workspace=ws, name=db_compute_name)
    print('Compute target {} already exists'.format(db_compute_name))
except ComputeTargetException:
    print('Compute not found, will use below parameters to attach new one')
    print('db_compute_name {}'.format(db_compute_name))
    print('db_resource_group {}'.format(db_resource_group))
    print('db_workspace_name {}'.format(db_workspace_name))
    print('db_access_token {}'.format(db_access_token))
 
    config = DatabricksCompute.attach_configuration(
        resource_group = db_resource_group,
        workspace_name = db_workspace_name,
        access_token= db_access_token)
    databricks_compute=ComputeTarget.attach(ws, db_compute_name, config)
    databricks_compute.wait_for_completion(True)

# Use Databricks from Azure Machine Learning Pipeline

To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. DatabricksStep supports to run the score file for notebook, python or jar. To learn more about the paramters please reference https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb, here we use the python script in this example.

Now you can create your own score.py in local compute path, or just download the score.py from given github. Be attention that to use the example score.py, you need to firstly upload your test data as well as prepared model to your Azure blob storage, and then score.py can help to load the data and model for prediction.

Below is the config parameters to run the example score.py in local path.

1. **databricks_step_name** - name of this databricks step, which is necessary.
2. **spark_version, node_type, num_workers, spark_env_variables** - the configs to create a new cluster in databricks, if using the existing one, use **existing_cluster_id** to replace.
3. **python_script_name** - for local script, this is the script name(relative to source_directory). In this case, it's "score.py".
4. **source_directory** - the local storage path for python script.
5. **databricks_compute** - Databricks target we just created.
6. **run_name**: Name in databricks for this run  

**PS**: **python_script_params** is the command line to run the python script. Below is the run parameters necessary for the example score.py. If use customized score, the paramters will change.

1. **asb_account** - Azure Blob account, can be wasbs://[your-container-name]@[your-storage-account-name].blob.core.windows.net/[your-directory-name]. This is the Azure blob holding test data and models.
2. **asb_key_name** - Azure Blob key config, can be fs.azure.account.key.[your-storage-account-name].blob.core.windows.net
3. **asb_key** - Private key to access your blob.
4. **data_path** - Relative path to your directory where test data is stored in Azure blob. Must be start without "/" 
5. **data_path** - Relative path to your directory where model is stored in Azure blob. Must be start without "/". And model format is PipelineModel.

In [None]:
### run python script example score.py in local machine path 

dbNbStep = DatabricksStep(
    name = "<Databricks Step Name>",
    spark_version = "<Spark version>",
    node_type = "<Node Type>",
    num_workers = "<Num Workers>",
    spark_env_variables = {'PYSPARK_PYTHON': '/databricks/python3/bin/python3'},
    python_script_name = "score.py",
    source_directory = "<Local Script Path>",
    python_script_params = ["--asb_account", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>",
                          "--asb_key_name", "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
                          "--asb_key", "<Private Key to Azure Blob>",
                          "--data_path", "<Relative data Path to Blob Directory>",
                          "--model_path", "<Relative model Path to Blob Directory>"
                         ],
    run_name = "<Run name for Databricks>",
    compute_target = databricks_compute,
    allow_reuse = False
)


### Construct the pipeline with create DatabricksStep and submit Pipeline   





In [None]:
steps = [dbNbStep]
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline_run = Experiment(ws, 'db_run_score').submit(pipeline)
pipeline_run.wait_for_completion()

### View the Run Details

In [None]:
pipeline_runfrom azureml.widgets import RunDetails
RunDetails(pipeline_run).show()

# Publish the pipeline

After you're satisfied with the outcome of the run, publish the pipeline so you can run it with different input values later. When you publish a pipeline, you get a REST endpoint. This endpoint accepts invoking of the pipeline with the set of parameters you have already incorporated by using PipelineParameter.

In [None]:
published_pipeline = pipeline_run.publish_pipeline(
    name="databricks_scoring", 
    description="Batch scoring using pipeline model", 
    version="1.0")