# Scoring Pipeline

In this notebook we create a pipeline for Scoring the 12,000 models that we build in the Training Pipeline. We set up the Pipeline for batch scoring. We again utitlize the [ParallelRunStep](https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallel_run_step.parallelrunstep?view=azure-ml-py) to parallelize the process. 

Batch inference (or batch scoring) provides cost-effective inference, with unparalleled throughput for asynchronous applications. Batch prediction pipelines can scale to perform inference on terabytes of production data. Batch prediction is optimized for high throughput, fire-and-forget predictions for a large collection of data.

# Prerequisites 

This example runs on an Azure Machine Learning Notebook VM. We are calling models that have already been trained and registered to the Workspace. If you have already run the Environment Setup and Training Pipeline notebooks or you have an AML Notbook set up with Models registered to the Workspace you are all set. 

## Set up the Workspace, Datastore, Experiment and Compute

As we did in the Training Pipeline notebook, we need to call the Workspace and set up an Experiment. We also want to create variables for the datastore and compute cluster. 

### Connect to the workspace

Creat a workspace object. Workspace.from_config() reads the file config.json and loads the details into an object named ws. 

In [None]:
from azureml.core import Workspace 

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

#ws = Workspace(subscription_id="bbd86e7d-3602-4e6d-baa4-40ae2ad9303c", resource_group="ManyModelsSA", workspace_name="ManyModelsSAv1")
#ws.get_details()

### Create or Attach existing compute resource
By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.

**Creation of compute takes approximately 5 minutes. If the AmlCompute with that name is already in your workspace the code will skip the creation process.**

In [None]:
# define the compute cluster and the data store
compute = AmlCompute(ws, 'cpu-cluster')
import os
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-cluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout.
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

### Call the Datastore containing the Orange Juice sales data
From the Generate Data Notebook, we uploaded the csv's for each Store and Brand comination. Use the .get_default_datastore() to save the datastore we uploaded the files into. 

In [None]:
dstore = ws.get_default_datastore()

### Create a FileDataset


In [None]:
# This is the code to run on all 10,000 models 
#allfiledst = Dataset.get_by_name(ws, name='Allfiledatasets') 
#allfiledstinput = allfiledst.as_named_input('trainallmodels')

In [None]:
# This cell reads in 3 datasets 
from azureml.pipeline.core import Pipeline, PipelineData

dataset1 = Dataset.File.from_files(path = (dstore, '3modelsdata/Store2_dominicks.csv'))
dataset2 = Dataset.File.from_files(path = (dstore, '3modelsdata/Store5_tropicana.csv'))
dataset3 = Dataset.File.from_files(path = (dstore, '3modelsdata/Store8_minute.maid.csv'))

output_dir = PipelineData(name="3_models", 
                          datastore=dstore, 
                          output_path_on_compute="3models/")


## Using Registered Models to make batch predictions
To use the model to make batch predictions, you need an **entry script** and a list of **dependencies**:

#### An entry script
This script accepts requests, scores the requests by using the model, and returns the results.
- __init()__ - Typically this function loads the model into a global object. This function is run only once at the start of batch processing per worker node/process. init method can make use of following environment variables (ParallelRunStep input):
    1.	AZUREML_BI_OUTPUT_PATH – output folder path
- __run(mini_batch)__ - The method to be parallelized. Each invocation will have one minibatch.<BR>
__mini_batch__: Batch inference will invoke run method and pass either a list or Pandas DataFrame as an argument to the method. Each entry in min_batch will be - a filepath if input is a FileDataset, a Pandas DataFrame if input is a TabularDataset.<BR>
__run__ method response: run() method should return a Pandas DataFrame or an array. For append_row output_action, these returned elements are appended into the common output file. For summary_only, the contents of the elements are ignored. For all output actions, each returned output element indicates one successful inference of input element in the input mini-batch.
    User should make sure that enough data is included in inference result to map input to inference. Inference output will be written in output file and not guaranteed to be in order, user should use some key in the output to map it to input.
    

#### Dependencies
Helper scripts or Python/Conda packages required to run the entry script or model.

The deployment configuration for the compute target that hosts the deployed model. This configuration describes things like memory and CPU requirements needed to run the model.

These items are encapsulated into an inference configuration and a deployment configuration. The inference configuration references the entry script and other dependencies. You define these configurations programmatically when you use the SDK to perform the deployment. You define them in JSON files when you use the CLI.


## Build and Run the batch inferece pipeline
Now that the data, models, and compute resources are set up, we can put together a pipeline for scoring. 
### Set up the environment to run the script
Specify the conda dependencies for your script. This will allow us to install packages and configure the environment. 

In [3]:
from azureml.core import Environment
from azureml.core.runconfig import CondaDenpendencies, DEFAULT_CPU_IMAGE

# set up the batch environment settings
batch_conda_deps = CondaDependencies.create(pip_packages=['sklearn','pmdarima'])

batch_env = Environment(name="manymodels_environment")
batch_env.python.conda_dependencies = batch_conda_deps
batch_env.docker.enabled = True
batch_env.docker.base_image = DEFAULT_CPU_IMAGE

### Create the configuration to wrap the inference script 
In the [ParallelRunConfig](https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallelrunconfig?view=azure-ml-py), you will want to determine the number of workers and nodes appropriate for your use case. The _workercount_ is based off the number of cores on the VM. The _nodecount_ will determine the number of nodes to use. Increasing the node count should help to speed up the process. 

In [12]:
from azureml.contrib.pipeline.steps import ParallelRunStep, ParallelRunConfig 

workercount = 3
nodecount = 1
timeout = 3000

tags1 = {}
tags1['nodes'] = nodecount
tags1['workers-per-node'] = workercount
tags1['timeout'] = timeout 

parallel_run_config = ParallelRunConfig(
    source_directory = './scripts',
    entry_script = 'score.py',
    mini_batch_size = '1',
    run_invocation_timeout = timeout, 
    error_threshold = 10,
    output_action = 'summary_only', 
    environment = batch_env, 
    process_count_per_node = workercount, 
    compute_target = compute, 
    node_count = nodecount
)

### Create the ParallelRunStep
This is where we will call the entry script, environment configuration, and parameters. This [ParallelRunStep](https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallel_run_step.parallelrunstep?view=azure-ml-py) is the main step in our pipeline. 

In [13]:
# Note the inputs are set up for running 3 models currently. 
datasetname = 'store'
output_dir = PipelineData(name = 'scoringOutput', 
                         datastore = dstore, 
                         output_path_on_compute = 'scoringOutput/')

parallelrun_step = ParallelRunStep(
    name="many-models-scoring",
    parallel_run_config=parallel_run_config,
    inputs=[dataset1.as_named_input(datasetname), dataset2.as_named_input(datasetname), dataset3.as_named_input(datasetname)],  
    output=output_dir,
    models= model_list, # this is just for logging
    arguments=['--n_predictions', 6],
    allow_reuse = False
)

### Submit and Run the Pipeline
Create an Experiment to track the runs of the pipeline. Then, you can run the pipeline and review the output. 

In [2]:
# set up the experiment
experiment = Experiment(ws, 'scoring-pipeline-AP')

pipeline = Pipeline(workspace = ws, steps=[parallelrun_step])

run = experiment.submit(pipeline, tags=tags1)

## Review the Output from the Pipeline
This pipeline returns a dataframe with 8 weeks of predicitons for each Store and Brand combination. You can view the results of that dataframe from the following code. The entry script also contains code to upload the csv's individually to Blob as each process runs. 

In [None]:
import os 

prediction_run = next(run.get_children())
prediction_output = prediction_run.get_output_data("3models")
prediction_output

prediction_output.download(local_path="training_results")


for root, dirs, files in os.walk("training_results"):
    for file in files:
        if file.endswith('parallel_run_step.txt'):
            result_file = os.path.join(root,file)
            
df = pd.read_csv(result_file, delimiter=" ", header=None) 
df.head()

## Cleanup compute resources 
For re-occurning jobs, keeing the compute resources running may be beneficial. The compute notde will scale down to 0 when not in use. For a single run job, we want to clean up the compute resources. 

In [None]:
# uncomment below and run if compute resources are no longer needed 
# compute.delete()

## Scoring Script 

In [18]:
%%writefile ./scripts/score.py


import pandas as pd
import os
import uuid
import argparse
import datetime
import numpy as np
from sklearn.externals import joblib
from joblib import dump, load
import pmdarima as pm
import time
from datetime import timedelta
from sklearn.metrics import mean_squared_error, mean_absolute_error 
import pickle
import logging 

# Import the AzureML packages 
from azureml.core.model import Model
from azureml.core import Experiment, Workspace, Run
from azureml.core import ScriptRunConfig

# Import the helper script 
from entry_script_helper import EntryScriptHelper


# Get the information for the current Run
thisrun = Run.get_context()

# Set the log file name
LOG_NAME = "user_log"

# Parse the arguments passed in the PipelineStep through the arguments option 
parser = argparse.ArgumentParser("split")
parser.add_argument("--n_test_set", type=int, help="input number of predictions")
parser.add_argument("--timestamp_column", type=str, help="model name")

args, unknown = parser.parse_known_args()

print("Argument 1(n_test_set): %s" % args.n_test_set)
print("Argument 2(timestamp_column): %s" % args.timestamp_column)


def init():
    EntryScriptHelper().config(LOG_NAME)
    logger = logging.getLogger(LOG_NAME)
    output_folder = os.path.join(os.environ.get("AZ_BATCHAI_INPUT_AZUREML", ""), "temp/output")
    logger.info(f"{__file__}.output_folder:{output_folder}")
    logger.info("init()")    
    return

def run(data):
    print("begin run ")
    logger = logging.getLogger(LOG_NAME)
    os.makedirs('./outputs', exist_ok=True)
    
    predictions = pd.DataFrame()
    
    logger.info('making predictions...')
    
    for file in data: 
    #for idx, file in enumerate(data): # add the enumerate for the 12,000 files 
        u1 = uuid.uuid4()
        mname='arima'+str(u1)[0:16]

        with thisrun.child_run(name=mname) as childrun:
            for w in range(0,5):
                thisrun.log(mname,str(w))
            
            date1=datetime.datetime.now()
            logger.info('starting ('+file+') ' + str(date1))
            childrun.log(mname,'starttime-'+str(date1))
            
            # 0. Unpickle Model 
            model_name = 'arima_'+str(data).split('/')[-1][:-4]  
            print(model_name)
            model_path = Model.get_model_path(model_name)         
            model = joblib.load(model_path)
            
            # 1. Make Predictions 
            prediction_list, conf_int = model.predict(args.n_test_set, return_conf_int = True)
            print("MAKING PREDICTIONS")
            
             
            # 2. Split the data for test set 
            data = pd.read_csv(file,header=0)
            data = data.set_index(args.timestamp_column)             
            max_date = datetime.datetime.strptime(data.index.max(),'%Y-%m-%d')
            split_date = max_date - timedelta(days=7*args.n_test_set)
            data.index = pd.to_datetime(data.index)
            test = data[data.index > split_date]
                
            test['Predictions'] = prediction_list
            print(test.head())
            
            # 3. Calculating Accuracy Metrics            
            metrics = []
            mse = mean_squared_error(test['Quantity'], test['Predictions'])
            rmse = np.sqrt(mse)
            mae = mean_absolute_error(test['Quantity'], test['Predictions'])
            act, pred = np.array(test['Quantity']), np.array(test['Predictions'])
            mape = np.mean(np.abs((act - pred)/act)*100)

            metrics.append(mse)
            metrics.append(rmse)
            metrics.append(mae)
            metrics.append(mape)

            print(metrics)
            # add in a log for accuracy metrics 
            logger.info('accuracy metrics')
            logger.info(metrics)
            
            # 4. Save the output back to blob storage 
            ws1 = childrun.experiment.workspace
            output_path = os.path.join('./outputs/', model_name)
            test.to_csv(path_or_buf=output_path+'.csv', index = False)
            dstore = ws1.get_default_datastore()
            dstore.upload_files([output_path+'.csv'], target_path='oj_predictions', overwrite=False, show_progress=True)
            
            # 5. Append the predictions to return a dataframe if desired 
            predictions = predictions.append(test)
            
            # 6. Return metrics for logging
            date2=datetime.datetime.now()
            logger.info('ending ('+str(file)+') ' + str(date2))

            childrun.log(mname,'endtime-'+str(date2))
            childrun.log(mname,'auc-1')
        
    return predictions

Overwriting ./scripts/score.py
