# Azure Machine Learning Service example
### Deploy and publish batch scoring pipeline

*This notebook shows you how to:*
- Create a batch scoring dummy experiment
- Publish the experiment as a pipeline
- Deploy the pipelie as a web service with HTTP endpoint (asynchronous API)
- Submit a job request via the HTTP endpoint with basic authentication

These is another similar [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification) published in Microsoft documentation.  

Difference between the existing batch scoring example and this example:

- This version focus in demonstrating the code for the pipeline and uses dummy dataset and a dummy model instead of a real model.  
- It is much easier to register and configure the assets (e.g. Dataset, Compute) through the Azure Machine Learning Service web interface, so the assumption is that they have already been created and the pipeline code will use **get** instead of **register** to get references to those assets in your workspace.
- This pipeline uses *PythonScriptStep* (no mini-batching) instead of *ParallelRunConfig* and *ParallelRunStep*

## Setup workspace
- Workspace created in Azure Machine Learning service - [howto](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=python)
- Updated the cell below with your workspace details and run the cell to connect to workspace

In [1]:
import json
import os
config_file = '_config.json'

subscription_id='xxxxx' # Subscription ID of the workspace
resource_group='xxxxx' # Resource group of the workspace
workspace_name = "xxxxxx" # Name of your workspace

if os.path.isfile(config_file):
    with open(config_file, 'r') as f:
        configs = json.load(f)
        subscription_id = configs['subscription_id']
        resource_group = configs['resource_group']
        workspace_name = configs['workspace_name']

from azureml.core import Workspace

ws = Workspace.get(name=workspace_name,
               subscription_id=subscription_id,
               resource_group=resource_group)


### Connect to Workspace

### Initialise
- Updated the asset names in the cell below (optional)
- Execute to initialise variables, create temp directory for storing files for upload

In [2]:
# Asset names 
model_dir = 'simple_model'
model_name = model_dir
model_file = 'model_v{}.json'
experiment_name = "pipeline_simple"
compute_name = "DS3-4c-14G"
environment_name = "test_minimal"
dataset_parent_path = "data"
dataset_name = experiment_name

# Temp local directory for storing created files that will be uploaded to AML for the pipeline
local_temp_dir = './_temp'

import os 
local_path_to_model = os.path.join(local_temp_dir, model_dir)
if not os.path.isdir(local_path_to_model):
    os.makedirs(local_path_to_model)
    
import azureml.core
print("SDK version:", azureml.core.VERSION)



SDK version: 1.20.0


### Register Dataset
1. Generate sample input dataset csv file by executing the next cell

In [3]:
%%writefile ./_temp/pipeline_simple_input.csv
id,gender,age
1,F,30
2,F,50
3,M,55
4,M,23

Overwriting ./_temp/pipeline_simple_input.csv


2. Use the Azure ML web interface to register the dataset
    - Load your AML workspace in the web browser
    - Click on 'Dataset' option
    - Use the register action to register this input csv as a dataset in your workspace, use the name 'pipeline_simple' as the dataset name

### Get reference to input dataset

In [4]:
from azureml.core.dataset import Dataset 
input_data = Dataset.get_by_name(ws, dataset_name)

### Create and register model 

In [5]:
import json

# Create model
class MyModel:
    
    def load_model(file_path):
        with open(file_path, 'r') as f:
            loaded_params = json.load(f)
            return MyModel(loaded_params)
    
    def __init__(self, params):
        self._params = params
    
    def predict(self, data):
        data['prediction'] = data.age < self._params['age_average']
        return data[['id', 'prediction']]

    

params = {'age_average':40}
local_model_path = os.path.join(local_path_to_model, model_file)

with open(local_model_path, 'w') as f:
    json.dump(params, f)
# Test model code
mymodel = MyModel.load_model(local_model_path)
print(str(mymodel._params))



{'age_average': 40}


### Get compute target

In [6]:
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.exceptions import ComputeTargetException

# checks to see if compute target already exists in workspace, else create it
try:
    compute_target = ComputeTarget(workspace=ws, name=compute_name)
except ComputeTargetException as e:
    config = AmlCompute.provisioning_configuration(vm_size="STANDARD_NC6",
                                                   vm_priority="lowpriority", 
                                                   min_nodes=0, 
                                                   max_nodes=1)

    compute_target = ComputeTarget.create(workspace=ws, name=compute_name, provisioning_configuration=config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

### Define environment
- Create requirements file with dependencies require for running your model.

In [7]:
%%writefile ./_temp/requirements.txt
pandas
numpy
azureml-core
azureml-dataset-runtime

Overwriting ./_temp/requirements.txt


In [9]:
from azureml.core import Environment
# Pipeline doesn't like using curated environment, create a new one 
env = Environment.from_pip_requirements(name = environment_name, file_path = os.path.join(local_temp_dir, 'requirements.txt'))

### Build & run pipeline
#### 1. Create scoring file
This is a python script for making prediction based on your model.  This [section](https://docs.microsoft.com/en-au/azure/machine-learning/how-to-deploy-existing-model#entry-script-scorepy) in Azure documentation explains how this file works.  Note how this scoring script uses azureml.core modules, hence we need the azureml-core azure-datatime-runtime included as dependencies when defining the environment previously

In [10]:
%%writefile ./_temp/batch_scoring.py

from azureml.core import Run
from azureml.core.model import Model
from azureml.core.dataset import Dataset

import os
from datetime import datetime
import argparse


import pandas as pd
import json


def init():
    global model
    
    class MyModel:

        def load_model(file_path):
            with open(file_path, 'r') as f:
                loaded_params = json.load(f)
                return MyModel(loaded_params)

        def __init__(self, params):
            self._params = params

        def predict(self, data):
            data['prediction'] = data.age < self._params['age_average']
            return data[['id', 'prediction']]
        
    # Read from parameters from argument, e.g. which model to use
    parser = argparse.ArgumentParser()
    parser.add_argument('--model_name', dest="model_name", required=True)
    args, _ = parser.parse_known_args()

    
    model_dir = 'simple_model'
    model_name = args.model_name
    print(str(datetime.now()) + ': init()')
    # Get the path where the deployed model can be found.
    print('model_name=' + str(model_name))
    
    #model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), model_name)
    model_path = Model.get_model_path(model_dir) + '/' + model_name + '.json'
    print('model_path=' + model_path)
    print('found_model_file=' + str(os.path.isfile(model_path)))
    model = MyModel.load_model(model_path)
    print(str(datetime.now()) + ': Model loaded')


def run(data):
    print(str(datetime.now()) + ': run()')
    output = model.predict(data)
    print(str(datetime.now()) + ': run completed()')
    return output

def local_test():
    init()
    input_data = pd.read_csv(os.path.join('_temp/pipeline_simple_input.csv'))
    print(run(input_data))


Writing ./_temp/batch_scoring.py


#### 2. Define the pipeline
- Here, we are building a pipeline with only one step.  A pipeline can contain multiple steps that perform post-processing.

In [11]:
from azureml.core import Datastore
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline, PipelineParameter

def_data_store = Datastore.get_default(ws)

def parallel_run_step():
    from azureml.pipeline.steps import ParallelRunConfig
    from azureml.pipeline.core import PipelineData

    parallel_run_config = ParallelRunConfig(
        environment=env,
        entry_script=os.path.join(local_temp_dir, "batch_scoring.py"),
        source_directory=".",
        output_action="append_row",
        append_row_file_name="parallel_run_step.txt",
        compute_target=compute_target,
        error_threshold=1,
        node_count=1,
        process_count_per_node=2
    )

    from azureml.pipeline.steps import ParallelRunStep
    from datetime import datetime

    parallel_step_name = "batchscoring-" + datetime.now().strftime("%Y%m%d%H%M")
    output_dir = PipelineData(name=dataset_name, datastore=def_data_store)
    model_name_param = PipelineParameter(name="model_arg", default_value=model_name)

    batch_score_step = ParallelRunStep(
        name=parallel_step_name,
        inputs=[input_data.as_named_input("input")],
        output=output_dir,
        arguments=["--model_name", model_name_param],
        parallel_run_config=parallel_run_config,
        allow_reuse=False
    )
    return batch_score_step
    
batch_score_step = parallel_run_step()
pipeline = Pipeline(workspace=ws, steps=[batch_score_step])

ParallelRunStep requires azureml-dataset-runtime[fuse,pandas] for tabular dataset.
Please add relevant package in CondaDependencies.


#### 3. Submit the pipeline as an experiment 

In [None]:
import time
t0 = time.time()
pipeline_run = Experiment(ws, experiment_name).submit(pipeline)
print(pipeline_run)
pipeline_run.wait_for_completion(show_output=True)
print('Completed in {}'.format(time.time() - t0))

Created step batchscoring-202101151600 [7d1d18b8][6cf11253-38c3-4064-8892-1fc492853955], (This step will run and generate new outputs)
Submitted PipelineRun 0fc7480b-06ae-4643-a05f-d8e9ad0d18c7


### Publish pipeline as a REST endpoint

In [None]:
published_pipeline = pipeline_run.publish_pipeline(
    name=experiment_name, description="Batching scoring", version="1.0")

published_pipeline

### Start a job via REST endpoint

In [None]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()

In [None]:
import requests

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": experiment_name,
                               "ParameterAssignments": {"model_name": 6}})
try:
    response.raise_for_status()
except Exception:    
    raise Exception("Received bad response from the endpoint: {}\n"
                    "Response Code: {}\n"
                    "Headers: {}\n"
                    "Content: {}".format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)