# Visual Designer - Scoring Pipeline

In this exercise we will be building a pipeline in Azure Machine Learning using the [Visual Designer](https://docs.microsoft.com/azure/machine-learning/concept-designer). Traditionally the Visual Designer is used for training and deploying models. With this exercise we will be using Visual Designer to build a batch scoring pipeline for a registered model trained in the earlier Modules of the workshop. Specifically we will use the diabetes Logistic Regression model that was trained and registered earlier. Below you can see a final picture of the scoring pipeline that will be built in this exercise.

The pipeline will use the output file from the <u>Visual Designer Data Prep Pipeline</u> exercise. It will use the diabetes.csv file in the <b>/1-bronze</b> folder, score the dataset against the diabetes ML model, and then load the resulting dataset to the <b>/2-silver</b> folder in the data lake.

![Picture of final scoring pipeline](./img/vdscorefinal.png)



## Step 1: Create new pipeline
In this step we will create the new pipeline.

In the Azure ML studio, navigate to <b>Designer</b> and press the <b>+</b> button under <b>New pipeline</b>

![Screenshot of AML Studio highlighting the steps described to create a new pipeline](./img/vdnewpipeline.png)

1. In <b>Settings</b> change the compute type to <b>Compute cluster</b> and select the appropriate compute cluster.
1. Name the pipeline in the <b>Draft name</b> field using the convention "pipeline-score-diabetes-\<userid\>-prod"

![Settings pane with compute settings and draft name fields highlighted](./img/vdnewpipelinescore.png)

1. Open <b>Data Input and Output</b> from the components menu.
2. Drag <b>Import Data</b> onto the canvas.
3. Change the <b>Data source</b> to <b>Datastore</b>
1. Select the \<workshop-datastore\> 
1. Enter the storage path to the <b>diabetes.csv</b> file in the <b>/1-bronze</b> folder of the data lake.
1. Validate by pressing <b>Preview schema</b>

![Import Data component settings](./img/vdimportbronzescore.png)

1. Open <b>Python Language</b> from the components menu.
1. Drag <b>Execute Python Script</b> onto the canvas.
1. Connect <b>Import Data</b> with the <b>Dataset1: DataFrameDirectory</b> input.
1. Copy past the following code in the <b>Python script</b> window. Replace \<userid\> in <b>Model.get_model_path()</b> with your userid.

```python
import pandas as pd
import numpy as np
from azureml.core import Model
from azureml.core import Workspace
import joblib

# The entry point function MUST have two input arguments.
# If the input port is not connected, the corresponding
# dataframe argument will be None.
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

    model_path = Model.get_model_path('diabetes_model_<userid>')
    model = joblib.load(model_path)

    x = dataframe1[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values

    yhat = model.predict(x)

    dataframe1['Diabetes'] = yhat

    return dataframe1
```
![Execute Python Script component settings](./img/vdpythonscriptscore.png)


1. Open <b>Data Input and Output</b> from the components menu.
1. Drag <b>Export Data</b> onto the canvas.
1. Connect <b>Execute Python Script</b> output <b>Result dataset: DataFrameDirectory</b> with the <b>Export Data</b> input.
1. Select \<workshop\> datastore and use the following path for the exported file. "/2-silver/diabetes/\<userid\>/diabetes.csv"
1. Select <b>csv</b> for the <b>File format</b>.

![Export Data component settings](./img/vdexportscores.png)

## Step 2: Submit and Publish pipeline
First submit the pipeline and ensure it runs as expected. Second publish the pipeline endpoint.

1. Press <b>Submit</b>
2. Choose <b>Create New</b> for Experiment.
3. Name the new experiment using this convention. "pipeline-score-diabetes-\<userid\>-prod"
4. Press the <b>Submit</b> button.
5. Monitor the run for completion.

![Set up pipeline run settings](./img/vdsubmitscore.png)

1. Verify <b>diabetes.csv</b> was created after the pipeline run in <b>/2-silver</b> folder.

![screenshot of Storage Explorer showing output files from pipeline run](./img/vdstorageexplorerscoredata.png)

1. Preview the diabetes.csv file and verify the <b>Diabetes</b> column is present with scores.

![Preview of diabetes.csv score file with Diabetes column highlighted](./img/vdscoredatapreview.png)

1. Open the pipeline and press the <b>Publish</b> button.
2. Choose <b>Create new</b> and name the pipeline endpoint the same as the pipeline draft.
3. Press the <b>Publish</b> button.

![The Set up published pipeline menu in the AML Studio Visual Designer](./img/vdpublishscorepipeline.png)

## The End

This score pipeline will be orchestrated using Azure Data Factory with data prep and training pipelines that are published in Module 3. 