# Challenge 1 - Basics of Azure ML

As part of this challenge you will get familar with the basic concepts of [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning/). Relevant links will provided in the Notebook and help you to solve the tasks.

Generally a very good source of information is the [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) for Azure Machine learning.

## 1. Import Azure ML Python Python SDK

In [None]:
import azureml.core
print("SDK version:", azureml.core.VERSION)

## 2. Authentication and initializing Azure Machine Learning Workspace

As a first step you have to authenticate against the Azure [Machine Learning Workspace](https://ml.azure.com/). This can be achieved in different ways:

1. **Interactive Login Authentication:** The interactive authentication is suitable for local experimentation on your own computer.
2. **Azure CLI Authentication:** Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once.
3. **Managed Service Identity (MSI) Authentication:** The MSI authentication is suitable for automated workflows, for example as part of Azure Devops build.
4. **Service Principal Authentication:** The Service Principal authentication is suitable for automated workflows, for example as part of Azure Devops build.

For now, we will use the interactive authentication, which is the default mode when using Azure ML SDK. When you connect to your workspace using `Workspace.from_config`, you will get an interactive login dialog.

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

Note the user you're authenticated as must have access to the subscription and resource group. If you receive an error
```
AuthenticationException: You don't have access to xxxxxx-xxxx-xxx-xxx-xxxxxxxxxx subscription. All the subscriptions that you have access to = ...
```
check that the you used correct login and entered the correct subscription ID.

Alternatively, you can also specify the details of your workspace.

In [None]:
'''
# Alternative login method

from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()

ws = Workspace(subscription_id='<your-subscription-id>',
               resource_group='<your-resource-group-name>',
               workspace_name='<your-workspace-name>',
               auth=interactive_auth)
'''

After we logged in, we can print the Worspace details.

**TASK**: Print the workspace details below. See here for the workspace object reference: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py

In [None]:
print("Workspace name: " + ws.name, 
      "Azure region: " + ws.location, 
      "Subscription id: " + ws.subscription_id, 
      "Resource group: " + ws.resource_group, sep = '\n')

## 3. Upload and register data

Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create Dataset from it. We will now upload the Iris data to the default datastore (blob) within your workspace.

By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.

In [None]:
# List all datastores registered in the current workspace
datastores = ws.datastores
for name, datastore in datastores.items():
    print(name, datastore.datastore_type)

For this challenge we will use the [default datastore](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data#get-datastores-from-your-workspace) that comes with the Azure Machine Learning Workspace.

**TASK**: Retrieve the default datastore for this workspace.

Hint: Same link as in the previous hint.

In [None]:
# get the default datastore
datastore = ws.get_default_datastore()
print(datastore.name, datastore.datastore_type, datastore.account_name, datastore.container_name, sep="\n")

Before we upload data, take a minute to familiarize yourself with the folder structure of the workshop. Switch back to the home page of the jupyter environment (should be still open in a previous browser tab) and look through the folders.

For instance, next to this notebook file, you will find a folder `train_dataset` which contains the training data that we will upload in the next step. Also look into the folders for the other challenges to see what there is.

**TASK**: Upload the file `./train-dataset/iris.csv` to the target path `train-dataset/tabular/` on the default datastore.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore

In [None]:
datastore.upload_files(files = ['./train_dataset/iris.csv'],
                       target_path = 'train_dataset/tabular/',
                       overwrite = True,
                       show_progress = True)

Then we will create and register a TabularDataset pointing to the path in the datastore. You can also create a Dataset from multiple paths. [Learn more](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets)

A Dataset can reference single or multiple files in your datastores or public urls. The files can be of any format. Dataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. The data remains in its existing location, so no extra storage cost is incurred. [Learn More](https://aka.ms/azureml/howto/createdatasets)

In [None]:
from azureml.core import Dataset

tabular_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'train_dataset/tabular/iris.csv')])

**TASK**: Register the dataset in our workspace.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py#methods

In [None]:
tabular_dataset = tabular_dataset.register(workspace=ws,
                                           name='iris_tabular',
                                           description='tabular iris training data',
                                           create_new_version = True)

**TASK**: Get and preview 3 rows from the dataset.

Hint: This is very similar to working with any other dataframe in python

In [None]:
# get and preview 3 rows of the dataset
tabular_dataset.take(3).to_pandas_dataframe()

Now we will register a dataset in the Azure Machine Learning Workspace as a file dataset. A file dataset can be mounted to the compute engine. When you mount a file system, you attach that file system to a directory (mount point) and make it available to the system. Because mounting load files at the time of processing, it is usually faster than download.
Note: mounting is only available for Linux-based compute (DSVM/VM, AMLCompute, HDInsights).

In [None]:
from azureml.core import Dataset

file_dataset = Dataset.File.from_files(path = [(datastore, 'train_dataset/tabular/iris.csv')])
file_dataset = file_dataset.register(workspace=ws,
                                     name='iris_file',
                                     description='file iris training data',
                                     create_new_version = True)

file_dataset.to_path()

## 3. Create Compute Engine

In this sample, we want to train a simple scikit-learn model on a remote compute engine on Azure. To do so, we first must create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target).

In this challenge, we want to use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Once this is created, you are ready to train on your remote compute.

#### **TASK:** Create a machine learning compute target.

Create an Azure Machine Learning Compute cluster and folow the steps one to four.
1. Check whether the cluster with the given name already exists.
2. Create the configuration (this step is local and only takes a second). Use the SKU `STANDARD_D2_V2` and a maximum of 4 nodes.
3. Create the cluster (this step will take about 20 seconds)
4. Provision the VMs to bring the cluster to the initial size. This step will take about 3-5 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "cpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4,
                                                           idle_seconds_before_scaledown=1800)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## 4. Create a project directory 

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.

In [None]:
TRAIN_FOLDER_NAME = 'train'
TRAIN_FILE_NAME = 'train.py'

In [None]:
import os
os.makedirs(os.path.join(".", TRAIN_FOLDER_NAME), exist_ok=True)

## 5. Create a training script 

Now you will need to create your training scripts in your project folder. This will be done in the next step. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.

If you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script.

In `train_iris.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML Run object within the script:

```python
from azureml.core.run import Run
run = Run.get_context()
```

Further within `train_iris.py`, we log the kernel and penalty parameters, and the highest accuracy the model achieves:

```python
run.log('Kernel type', np.string(args.kernel))
run.log('Penalty', np.float(args.penalty))

run.log('Accuracy', np.float(accuracy))
```

These run metrics will become particularly important when we begin hyperparameter tuning our model in the "Tune model hyperparameters" section.

**TASK**: The training script below misses to log a few of the metrics. Find the `???` and complete the script

Hint: Be careful when retrieving the metrics. The default is `average='binary'` and might not be the right fit.

In [None]:
%%writefile $TRAIN_FOLDER_NAME/$TRAIN_FILE_NAME

import argparse
import os

# importing necessary libraries
import numpy as np
import pandas as pd
import joblib

from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split

from azureml.core import Dataset, Run

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--kernel', type=str, default='rbf', help='Kernel type to be used in the algorithm')
    parser.add_argument('--penalty', type=float, default=1.0, help='Penalty parameter of the error term')
    parser.add_argument('--modelname', type=str, default='model.pkl', help='Name of the model file')
    args = parser.parse_args()
    
    run = Run.get_context()
    run.log('Kernel type', np.str(args.kernel))
    run.log('Penalty', np.float(args.penalty))

    # loading the iris dataset
    dataset = run.input_datasets['iris']
    try:
        df = dataset.to_pandas_dataframe()
    except:
        print('Dataset path: ', str(dataset))
        df = pd.read_csv(os.path.join(dataset))
    
    # split dataset
    x_col = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
    y_col = ['species']
    x_df = df.loc[:, x_col]
    y_df = df.loc[:, y_col]
    
    #dividing X,y into train and test data
    x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)
    data = {'train': {'X': x_train, 'y': y_train},
            'test': {'X': x_test, 'y': y_test}}
    
    # training a SVM classifier
    svm_model = SVC(kernel=args.kernel, C=args.penalty, gamma='scale').fit(data['train']['X'], data['train']['y'])
    svm_predictions = svm_model.predict(data['test']['X'])

    # model accuracy for X_test
    accuracy = svm_model.score(data['test']['X'], data['test']['y'])
    print('Accuracy of SVM classifier on test set: {:.2f}'.format(accuracy))
    run.log('Accuracy', np.float(accuracy))
    
    # precision for X_test
    precision = precision_score(svm_predictions, data["test"]["y"], average='weighted')
    print('Precision of SVM classifier on test set: {:.2f}'.format(precision))
    run.log('precision', precision)
    
    # recall for X_test
    recall = recall_score(svm_predictions, data["test"]["y"], average='weighted')
    print('Recall of SVM classifier on test set: {:.2f}'.format(recall))
    run.log('recall', recall)
    
    # f1-score for X_test
    f1 = f1_score(svm_predictions, data["test"]["y"], average='weighted')
    print('F1-Score of SVM classifier on test set: {:.2f}'.format(f1))
    run.log('f1-score', f1)
    
    # creating a confusion matrix
    cm = confusion_matrix(y_test, svm_predictions)
    
    cm_json =   {
       "schema_type": "confusion_matrix",
       "schema_version": "v1",
       "data": {
           "class_labels": x_col,
           "matrix": cm.tolist()
       }
    }
    print(cm_json)
    run.log_confusion_matrix('confusion matrix', cm_json)
    
    # files saved in the "outputs" folder are automatically uploaded into run history
    os.makedirs('outputs', exist_ok=True)
    joblib.dump(svm_model, 'outputs/' + args.modelname)
    run.log('Model Name', np.str(args.modelname))

if __name__ == '__main__':
    main()


## 6. Create an experiment

An *Experiment* is a logical container in an Azure ML Workspace that represents a collection of trials (individual model runs). It hosts run records which can include run metrics and output artifacts from your experiments.

**TASK**: Fill in the missing values below to create a new experiment in your workspace

In [None]:
from azureml.core import Experiment
exp = Experiment(workspace=ws, name='ch1-sklearn_sample')

## 7. Create Estimator

An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create a generic estimator for by specifying

- The name of the estimator object, est
- The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
- The training script name, train_titanic.py
- The input Dataset for training
- The compute target. In this case you will use the AmlCompute you created
- The environment definition for the experiment

**TASK**: Complete the estimator creation below.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py

In [None]:
from azureml.train.estimator import Estimator

script_params = {
    '--kernel': 'linear',
    '--penalty': 1.0
}

est = Estimator(source_directory=TRAIN_FOLDER_NAME, 
                entry_script=TRAIN_FILE_NAME,
                script_params=script_params,
                inputs=[tabular_dataset.as_named_input('iris')],
                compute_target=compute_target,
                pip_packages=['azureml-dataprep[pandas,fuse]', 'scikit-learn', 'pandas']) 

In [None]:
'''
# Alternatives
from azureml.train.sklearn import SKLearn

# with mounting the dataset
est = Estimator(source_directory=TRAIN_FOLDER_NAME, 
                entry_script=TRAIN_FILE_NAME,
                script_params=script_params,
                inputs=[file_dataset.as_named_input('iris').as_mount('tmp/dataset')],
                compute_target=compute_target,
                pip_packages=['azureml-dataprep[pandas,fuse]', 'scikit-learn', 'pandas'])

# with specific estimator for scikit-learn
est = SKLearn(source_directory=TRAIN_FOLDER_NAME,
              script_params=script_params,
              compute_target=compute_target,
              entry_script=TRAIN_FILE_NAME,
              pip_packages=['azureml-dataprep[pandas,fuse]', 'pandas'])
'''

## 8. Submit the job

Submit the estimator to the Azure ML experiment to kick off the execution.

**TASK**: Submit the experiment as a new run. 

While the experiment is running (after the Docker image was built and pushed), you can take a look at the compute target in the AzureML UI. Initially it will have 0 nodes running. To execute the experiment, you will see that the cluster is being scaled up to 1 node.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment%28class%29?view=azure-ml-py#methods

In [None]:
run = exp.submit(est)
run.wait_for_completion(show_output=True, wait_post_processing=True)

To cancel a run, you can call `run.cancel()`. However, if you want to do that, you need to interrupt the jupyter kernel first, since `run.wait_for_completion()` will only return once the run is completed or cancelled (doing this will not cancel the run execution itself).
Otherwise you can also cancel the run from the AzureML UI. Click on the link that is printed in the first output line of the above command to open the portal. There you will find a Cancel button as well.

In [None]:
#run.cancel()

Once the run is finished, we first get its status:

In [None]:
run.status

You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:

**TASK**: Retrieve the metrics of the run.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py#methods

In [None]:
run.get_metrics()

## 9. Tune model hyperparameters

Now that we've seen how to do a simple Scikit-learn training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities.

First, we will define the hyperparameter space to sweep over. Let's tune the `kernel` and `penalty` parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, `Accuracy`.

In [None]:
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
from azureml.train.hyperdrive.policy import MedianStoppingPolicy

param_sampling = RandomParameterSampling({
    "--kernel": choice('linear', 'rbf', 'poly', 'sigmoid'),
    "--penalty": uniform(0.5, 1.5)
    })

hyperdrive_run_config = HyperDriveConfig(estimator=est,
                                         hyperparameter_sampling=param_sampling, 
                                         primary_metric_name='Accuracy',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=6,
                                         max_concurrent_runs=4,
                                         policy=MedianStoppingPolicy())

Finally, lauch the hyperparameter tuning job.

**TASK**: Submit the hyperdrive run

Hint: Is is very similar to the experiment submission before

In [None]:
hyperdrive_run = exp.submit(hyperdrive_run_config)
hyperdrive_run.wait_for_completion(show_output=True, wait_post_processing=True)

**TASK**: While the hyperdrive is running, you can use the time to take a look at the newly created Azure Container Registry in the Azure Portal. It was created when you submitted the first remote run.
During the experiment submission, after the Docker image was built, it was pushed up into that registry. 

Try to find the pushed image in the registry and take a look at it.

Often times, finding the best hyperparameter values for your model can be an iterative process, needing multiple tuning runs that learn from previous hyperparameter tuning runs. Reusing knowledge from these previous runs will accelerate the hyperparameter tuning process, thereby reducing the cost of tuning the model and will potentially improve the primary metric of the resulting model. When warm starting a hyperparameter tuning experiment with Bayesian sampling, trials from the previous run will be used as prior knowledge to intelligently pick new samples, so as to improve the primary metric. Additionally, when using Random or Grid sampling, any early termination decisions will leverage metrics from the previous runs to determine poorly performing training runs. 

Azure Machine Learning allows you to warm start your hyperparameter tuning run by leveraging knowledge from up to 5 previously completed hyperparameter tuning parent runs. 

Additionally, there might be occasions when individual training runs of a hyperparameter tuning experiment are cancelled due to budget constraints or fail due to other reasons. It is now possible to resume such individual training runs from the last checkpoint (assuming your training script handles checkpoints). Resuming an individual training run will use the same hyperparameter configuration and mount the storage used for that run. The training script should accept the "--resume-from" argument, which contains the checkpoint or model files from which to resume the training run. You can also resume individual runs as part of an experiment that spends additional budget on hyperparameter tuning. Any additional budget, after resuming the specified training runs is used for exploring additional configurations.

For more information on warm starting and resuming hyperparameter tuning runs, please refer to the [Hyperparameter Tuning for Azure Machine Learning documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters) 

When all jobs finish, we can find out the one that has the highest accuracy.

**TASK**: Get the best run from the hyperdrive experiment

Hint: https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriverun

In [None]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_run.get_details()['runDefinition']['arguments'])

## 10. Register model

The last step in the training script wrote the file `model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this  directory is automatically uploaded to your workspace.  This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.

You can see files associated with that run.

**TASK**: Get all the file names associated with the best run.

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run%28class%29

In [None]:
best_run.get_file_names()

Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.

**TASK**: Fill in the missing values below to register the model

In [None]:
from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

model = best_run.register_model(model_name='sample-model',
                                model_path='outputs/model.pkl',
                                model_framework=Model.Framework.SCIKITLEARN,
                                model_framework_version='0.22.1',
                                datasets=[('Training dataset', tabular_dataset)],
                                resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5),
                                description='SVC classification for iris dataset.',
                                tags={'area': 'iris', 'type': 'svc'})

print('Name: ' + model.name, 'ID: ' + model.id, 'Version: ' + str(model.version), sep='\n')

Now, your model is ready for deployment.

## 11. Deployment

No-code model deployment is currently in preview and supports various frameworks and model types including Tensorflow SavedModel format, ONNX models and Scikit-learn models. No code model deployment is supported for all built-in scikit-learn model types.

The deployment will take a few minutes and will take place on an Azure Container Instance.

**TASK**: Fill in the missing values to deploy the model as a no-code webservice

In [None]:
import datetime
dt = datetime.datetime.now().strftime("d%H%M%S")

service_no_code = Model.deploy(workspace=ws,
                               name='ch1-service-nocode-' + dt,
                               models=[model],
                               overwrite=True)
service_no_code.wait_for_deployment(show_output=True)

In [None]:
# If deployment fails, then retry with:
#service_no_code.update(models=[model])

**TASK**: Get the logs for the web service

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice%28class%29

In [None]:
service_no_code.get_logs()

Convert this Webservice object into a JSON serialized dictionary, which lists all the details of the webservice.

In [None]:
service_no_code.serialize()

## 12. Test Service

The following code is an example of a Python client that can be used with the container.

**TASK**: Fill in the missing part to execute the request against the web service

Hint: Look in the same class as for the previous hint

In [None]:
import json

# Two sets of data to score, so we get two results back.
data = {'data':
        [
            [ 1,2,3,4 ],
            [ 10,9,8,7 ]
        ]
        }
# Convert to JSON string.
input_data = json.dumps(data)

# Make the request and display the response.
resp = service_no_code.run(input_data)
print(resp)

## 13. Package model to custom Docker image

In some cases, you might want to create a Docker image without directly deploying the model (if, for example, you plan to deploy to Azure App Service). Or you might want to download the image and run it on a local Docker installation. You might even want to download the files used to build the image, inspect them, modify them, and build the image manually.

Model packaging enables you to do these things. It packages all the assets needed to host a model as a web service and allows you to download either a fully built Docker image or the files needed to build one. There are two ways to use model packaging:

- Download a packaged model: Download a Docker image that contains the model and other files needed to host it as a web service.

- Generate a Dockerfile: Download the Dockerfile, model, entry script, and other assets needed to build a Docker image. You can then inspect the files or make changes before you build the image locally.

Both packages can be used to get a local Docker image.

First, we need to create a custom scoring script. Usually those are referred to as something like `score.py`

In [None]:
SCORE_FOLDER_NAME = 'deployment'
SCORE_FILE_NAME = 'score.py'

In [None]:
import os
os.makedirs(os.path.join(".", SCORE_FOLDER_NAME), exist_ok=True)

Take a moment to read through the script and the comments to understand the logic.

In [None]:
%%writefile $SCORE_FOLDER_NAME/$SCORE_FILE_NAME

import joblib
import numpy as np
import os

from azureml.monitoring import ModelDataCollector
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType

# The init() method is called once, when the web service starts up.
# Typically you would deserialize the model file, as shown here using joblib,
# and store it in a global variable so your run() method can access it later.
def init():
    global model
    global inputs_dc, prediction_dc
    # The AZUREML_MODEL_DIR environment variable indicates
    # a directory containing the model file you registered.
    model_filename = 'model.pkl'
    model_path = os.path.join(os.environ['AZUREML_MODEL_DIR'], model_filename)
    model = joblib.load(model_path)
    inputs_dc = ModelDataCollector("sample-model", designation="inputs", feature_names=["feat1", "feat2", "feat3", "feat4"])
    prediction_dc = ModelDataCollector("sample-model", designation="predictions", feature_names=["prediction"])

# The run() method is called each time a request is made to the scoring API.
#
# Shown here are the optional input_schema and output_schema decorators
# from the inference-schema pip package. Using these decorators on your
# run() method parses and validates the incoming payload against
# the example input you provide here. This will also generate a Swagger
# API document for your web service.
@input_schema('data', NumpyParameterType(np.array([[0.1, 1.2, 2.3, 3.4]])))
@output_schema(StandardPythonParameterType({'predict': [['Iris-virginica']]}))
def run(data):
    # Use the model object loaded by init().
    result = model.predict(data)
    inputs_dc.collect(data) #this call is saving our input data into Azure Blob
    prediction_dc.collect(result) #this call is saving our input data into Azure Blob

    # You can return any JSON-serializable object.
    return { 'predict': result.tolist() }

Apart from the scoring script, we also need to specify an Environment. This contains all the required pip or conda packages, that the `score.py` script needs to run properly. Those dependencies will be installed during the docker build step later.

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

environment = Environment(name='ch1-service-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[
    'azureml-defaults',
    'joblib',
    'numpy',
    'scikit-learn',
    'inference-schema',
    'inference-schema[numpy-support]',
    'azureml-monitoring'
])
environment.register(workspace=ws)

To bundle with scoring script and the Environment, we now create an Inference config:

**TASK**: Fill in the missing values to create the inference config

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script=SCORE_FILE_NAME,
                                   source_directory=SCORE_FOLDER_NAME,
                                   description='SVC classification for iris dataset.',
                                   environment=environment)

And lastly, we package the model. This means to build and push the docker image into our private container registry.

**TASK**: Fill in the missing values to package the model

Hint: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model

In [None]:
package = Model.package(ws, [model], inference_config)
package.wait_for_creation(show_output=True)

In [None]:
acr = package.get_container_registry()
print("Address:", acr.address)

Pull the package output to the local machine. This can only be used with a Docker image package. Docker must be running on the machine.

In [None]:
package.pull()

## 14. Deployment with inference_config

**TASK**: Fill in the missing values to deploy the model with the inference config from above

In [None]:
dt = datetime.datetime.now().strftime("d%H%M%S")

service_custom = Model.deploy(workspace=ws,
                       name='ch1-service-custom-' + dt,
                       models=[model],
                       inference_config=inference_config)
service_custom.wait_for_deployment(show_output=True)

In [None]:
# If deployment fails, then retry with:
#service_custom.update(models=[model], inference_config=inference_config)

In [None]:
service_custom.get_logs()

In [None]:
service_custom.serialize()

In [None]:
import json

# Two sets of data to score, so we get two results back.
data = {'data':
        [
            [ 1,2,3,4 ],
            [ 10,9,8,7 ]
        ]
        }
# Convert to JSON string.
input_data = json.dumps(data)

# Make the request and display the response.
resp = service_custom.run(input_data)
print(resp)

In [None]:
#service_custom.update(enable_app_insights=True)

## 15. Cleanup

To clean up (only yet if you don't want to do the bonus challenges below!), we now delete the two created web service endpoints.

In [None]:
service_no_code.delete()

In [None]:
service_custom.delete()

## 16. Bonus
In case you still have time left, here are a few more optional things you can try to implement in the notebook above:

- Create a new blob storage account in the Azure Portal. Then, register that new account as a new datastore in your AzureML workspace. [Hint](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data)
- In the web service test (the part where you call the web services with sample data using `service.run(input_data)`), try to use a standard HTTP request instead to call the service. Hint: Use the python `requests` package and the URL of the web service (`service.scoring_uri`).
- Find the cell above with the line `service_custom.update(enable_app_insights=True)`. Find out what it will do. Once you have executed it, look at the outputs it creates in the Azure Application Insights service. [Hint](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-enable-app-insights)