# Azure Machine Learning - AutoML Pipeline Sample
## 03a - AutoML Regression Scoring / Local Explanations

This notebook explains how to score a new dataset (as part of a batch process) using a previously trained AutoML model. Moreover, we will leverage the SHAP library to generate local explanations which reflect which features have positively/negatively contributed to the predicted value (median home price).

The pipeline defined below takes in the name of the registered model, the training dataset (for feature explanation purposes), and a dataset to be scored. Following scoring and explainability analysis, this pipeline saves a result set to a CSV which is stored in the AML-linked blob store.

### Import required packages

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, command, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.constants import AssetTypes, InputOutputModes

### Get connection to Azure ML workspace

Update the variables below to reflect your Azure ML workspace

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()
    
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)
ml_client

subscription_id = ''
resource_group = ''
workspace_name = ''

datastore = ml_client.datastores.get_default() 

### Get handles to datastore path containing unscored data, along with the registered dataset used for model training

These will be passed as arguments to our pipeline below

In [None]:
scoring_data_path = f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{datastore.name}/paths/data/housing_scoring_data/"

raw_dataset = ml_client.data.get("Regression_HousingData", label="latest")

### Define pipeline

Here, our pipeline consists of a single step which performs the following activities:
- Load unscored data and training data
- Load model from Azure ML registry by name
- Generate predictions for unscored data and append to dataset
- Generate local model explanations these predictions and append to dataset
- Save final scored dataset to CSV in Azure ML-linked blob store

In [None]:
# Define pipeline
@pipeline(
    description="AutoML Regression Scoring Pipeline",
)
def automl_regression_scoring(
    regression_train_data, regression_validation_data, regression_test_data, model_base_name
):
    
    # define command function for preprocessing the model
    scoring_command_func = command(
        inputs=dict(
            training_data=Input(path=raw_dataset.id,
              type=AssetTypes.URI_FILE,
              mode=InputOutputModes.RO_MOUNT
              ),
            scoring_data=Input(path=scoring_data_path, type=AssetTypes.URI_FOLDER, model=InputOutputModes.RO_MOUNT),
            model_name='HomePricePredictionModel',
            target_column='MedHouseVal'
        ),
        outputs=dict(
            scored_data=Output(type=AssetTypes.URI_FOLDER),
        ),
        code="./scoring_regression.py",
        command="python scoring_regression.py "
        + "--training_data ${{inputs.training_data}} "
        + "--scoring_data ${{inputs.scoring_data}} "
        + "--model_name ${{inputs.model_name}} "
        + "--target_column ${{inputs.target_column}} "
        + "--scored_data ${{outputs.scored_data}} ",
        environment="evaluation-environment@latest",
        display_name='Score and Save Data'
    )
    scoring_node = scoring_command_func()

pipeline_scoring = automl_regression_scoring()

# set pipeline level compute
pipeline_scoring.settings.default_compute = "cpucluster"

### Submit pipeline job and await completion

Following execution of the pipeline below, we should be able to review our results which are stored in Azure blob

In [None]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_scoring, experiment_name="AutoML_Scoring_Test"
)
pipeline_job

# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

### Explore results

From the pipeline run, we can navigate to the outputs and review our saved scored data:

#### Run Summary (see Outputs, scored_data):
![Pipeline Results 1](img/aml_results1.png "Pipeline Run Summary")

#### Output Dataset (CSV):
![Pipeline Results 2](img/aml_results2.png "Saved Data")