<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Local AutoML Model with ACI Deployment for Predicting Sentence Similarity

This notebook demonstrates how to use Azure AutoML locally to automate machine learning model selection and tuning and how to use Azure Container Instance (ACI) for deployment. We utilize the STS Benchmark dataset to predict sentence similarity and utilize AutoML's text preprocessing features.

## Table of Contents
1. [Introduction](#1.-Introduction)  
    * 1.1 [What is Azure AutoML?](#1.1-What-is-Azure-AutoML?)  
    * 1.2 [Modeling Problem](#1.2-Modeling-Problem)  
    
    
2. [Data Preparation](#2.-Data-Preparation)  


3. [Create AutoML Run](#3.-Create-AutoML-Run)    
    * 3.1 [Link to or create a Workspace](#3.1-Link-to-or-create-a-Workspace)  
    * 3.2 [Create AutoMLConfig object](#3.2-Create-AutoMLConfig-object)
    * 3.3 [Run Experiment](#3.3-Run-Experiment)
    
    
4. [Deploy Sentence Similarity Model](#4.-Deploy-Sentence-Similarity-Model)  
    4.1 [Retrieve the Best Model](#4.1-Retrieve-the-Best-Model)  
    4.2 [Register the Fitted Model for Deployment](#4.2-Register-the-Fitted-Model-for-Deployment)   
    4.3 [Create Scoring Script](#4.3-Create-Scoring-Script)   
    4.4 [Create a YAML File for the Environment](#4.4-Create-a-YAML-File-for-the-Environment)  
    4.5 [Create a Container Image](#4.5-Create-a-Container-Image)    
    4.6 [Deploy the Image as a Web Service to Azure Container Instance](#4.6-Deploy-the-Image-as-a-Web-Service-to-Azure-Container-Instance)  
    4.7 [Test Deployed Model](#4.7-Test-Deployed-Model)  

### 1.1 What is Azure AutoML?

Automated machine learning (AutoML) is a capability of Microsoft's Azure Machine Learning service. The goal of AutoML is to "improve the productivity of data scientists and democratize AI" [1] by allowing for the rapid development and deployment of machine learning models. To acheive this goal, AutoML automates the process of selecting a ML model and tuning the model. All the user is required to provide is a dataset (suitable for a classification, regression, or time-series forecasting problem) and a metric to optimize in choosing the model and hyperparameters. The user is also given the ability to set time and cost constraints for the model selection and tuning.

[1]https://azure.microsoft.com/en-us/blog/new-automated-machine-learning-capabilities-in-azure-machine-learning-service/

![](https://nlpbp.blob.core.windows.net/images/automl.PNG)

The AutoML model selection and tuning process can be easily tracked through the Azure portal or directly in python notebooks through the use of widgets. AutoML quickly selects a high quilty machine learning model tailored for your prediction problem. In this notebook, we walk through the steps of preparing data, setting up an AutoML experiment, and evaluating the results of our best model. More information about running AutoML experiments in Python can be found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train). 

### 1.2 Modeling Problem

The regression problem we will demonstrate is predicting sentence similarity scores on the STS Benchmark dataset. The [STS Benchmark dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#STS_benchmark_dataset_and_companion_dataset) contains a selection of English datasets that were used in Semantic Textual Similarity (STS) tasks 2012-2017. The dataset contains 8,628 sentence pairs with a human-labeled integer representing the sentences' similarity (ranging from 0, for no meaning overlap, to 5, meaning equivalence). The sentence pairs will be embedded using AutoML's built-in preprocessing, so we'll pass the sentences directly into the model.

In [1]:
# Set the environment path to find NLP
import sys
sys.path.append("../../")
import time
import os
import pandas as pd
import shutil
import numpy as np
import torch
import sys
from scipy.stats import pearsonr
from scipy.spatial import distance
from sklearn.externals import joblib
import json

# Import utils
from utils_nlp.azureml import azureml_utils
from utils_nlp.dataset import stsbenchmark
from utils_nlp.dataset.preprocess import (
    to_lowercase,
    to_spacy_tokens,
    rm_spacy_stopwords,
)
from utils_nlp.common.timer import Timer

# Tensorflow dependencies for Google Universal Sentence Encoder
import tensorflow as tf
import tensorflow_hub as hub
tf.logging.set_verbosity(tf.logging.ERROR) # reduce logging output

# AzureML packages
import azureml as aml
import logging
from azureml.telemetry import set_diagnostics_collection
set_diagnostics_collection(send_diagnostics=True)
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails
from azureml.train.automl.run import AutoMLRun
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.image import ContainerImage
from azureml.core.conda_dependencies import CondaDependencies

print("System version: {}".format(sys.version))
print("Azure ML SDK Version:", aml.core.VERSION)
print("Pandas version: {}".format(pd.__version__))
print("Tensorflow Version:", tf.VERSION)



Turning diagnostics collection on. 
System version: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Azure ML SDK Version: 1.0.43
Pandas version: 0.23.4
Tensorflow Version: 1.13.1


In [2]:
BASE_DATA_PATH = '../../data'

# 2. Data Preparation

## STS Benchmark Dataset

As described above, the STS Benchmark dataset contains 8.6K sentence pairs along with a human-annotated score for how similiar the two sentences are. We will load the training, development (validation), and test sets provided by STS Benchmark and preprocess the data (lowercase the text, drop irrelevant columns, and rename the remaining columns) using the utils contained in this repo. Each dataset will ultimately have three columns: _sentence1_ and _sentence2_ which contain the text of the sentences in the sentence pair, and _score_ which contains the human-annotated similarity score of the sentence pair.

In [3]:
# Load in the raw datasets as pandas dataframes
train_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="train")
dev_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="dev")
test_raw = stsbenchmark.load_pandas_df(BASE_DATA_PATH, file_split="test")

100%|██████████████████████████████████████████████████████████████████████████████████| 401/401 [00:01<00:00, 258KB/s]


Data downloaded to ../../data\raw\stsbenchmark


100%|██████████████████████████████████████████████████████████████████████████████████| 401/401 [00:01<00:00, 294KB/s]


Data downloaded to ../../data\raw\stsbenchmark


100%|██████████████████████████████████████████████████████████████████████████████████| 401/401 [00:01<00:00, 252KB/s]


Data downloaded to ../../data\raw\stsbenchmark


In [4]:
# Clean each dataset by lowercasing text, removing irrelevant columns,
# and renaming the remaining columns
train_clean = stsbenchmark.clean_sts(train_raw)
dev_clean = stsbenchmark.clean_sts(dev_raw)
test_clean = stsbenchmark.clean_sts(test_raw)

In [5]:
# Convert all text to lowercase
train = to_lowercase(train_clean)
dev = to_lowercase(dev_clean)
test = to_lowercase(test_clean)

In [6]:
print("Training set has {} sentences".format(len(train)))
print("Development set has {} sentences".format(len(dev)))
print("Testing set has {} sentences".format(len(test)))

Training set has 5749 sentences
Development set has 1500 sentences
Testing set has 1379 sentences


In [7]:
train.head(5)

Unnamed: 0,score,sentence1,sentence2
0,5.0,a plane is taking off.,an air plane is taking off.
1,3.8,a man is playing a large flute.,a man is playing a flute.
2,3.8,a man is spreading shreded cheese on a pizza.,a man is spreading shredded cheese on an uncoo...
3,2.6,three men are playing chess.,two men are playing chess.
4,4.25,a man is playing the cello.,a man seated is playing the cello.


# 3. Create AutoML Run

AutoML can be used for classification, regression or timeseries experiments. Each experiment type has corresponding machine learning models and metrics that can be optimized (see [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train)) and the options will be delineated below. As a first step we connect to an existing workspace or create one if it doesn't exist.

## 3.1 Link to or create a Workspace

In [8]:
ws = azureml_utils.get_or_create_workspace(
    subscription_id="<SUBSCRIPTION_ID>",
    resource_group="<RESOURCE_GROUP>",
    workspace_name="<WORKSPACE_NAME>",
    workspace_region="<WORKSPACE_REGION>"
)

Performing interactive authentication. Please follow the instructions on the terminal.




Interactive authentication successfully completed.


In [None]:
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

## 3.2 Create AutoMLConfig object
Next, we specify the parameters for the AutoMLConfig class. 

**task**  
AutoML supports the following base learners for the regression task: Elastic Net, Light GBM, Gradient Boosting, Decision Tree, K-nearest Neighbors, LARS Lasso, Stochastic Gradient Descent, Random Forest, Extremely Randomized Trees, XGBoost, DNN Regressor, Linear Regression. In addition, AutoML also supports two kinds of ensemble methods: voting (weighted average of the output of multiple base learners) and stacking (training a second "metalearner" which uses the base algorithms' predictions to predict the target variable). Specific base learners can be included or excluded in the parameters for the AutoMLConfig class (whitelist_models and blacklist_models) and the voting/stacking ensemble options can be specified as well (enable_voting_ensemble and enable_stack_ensemble)

**preprocess**  
AutoML also has advanced preprocessing methods, eliminating the need for users to perform this manually. Data is automatically scaled and normalized but an additional parameter in the AutoMLConfig class enables the use of more advanced techniques including imputation, generating additional features, transformations, word embeddings, etc. (full list found [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-portal-experiments#preprocess)). Note that algorithm-specific preprocessing will be applied even if preprocess=False. 

**primary_metric**  
The regression metrics available are the following: Spearman Correlation (spearman_correlation), Normalized RMSE (normalized_root_mean_squared_error), Normalized MAE (normalized_mean_absolute_error), and R2 score (r2_score) 

**Constraints:**  
There is a cost_mode parameter to set cost prediction modes (see options [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlconfig?view=azure-ml-py)). To set constraints on time there are multiple parameters including experiment_exit_score (target score to exit the experiment after achieving), experiment_timeout_minutes (maximum amount of time for all combined iterations), and iterations (total number of different algorithm and parameter combinations to try).

In [9]:
automl_settings = {
    "task": 'regression', #type of task: classification, regression or forecasting
    "debug_log": 'automated_ml_errors.log',
    "path": './automated-ml-regression',
    "iteration_timeout_minutes" : 15, #How long each iteration can take before moving on
    "iterations" : 50, #Number of algorithm options to try
    "primary_metric" : 'spearman_correlation', #Metric to optimize
    "preprocess" : True, #Whether dataset preprocessing should be applied
    "verbosity":logging.ERROR}

In [10]:
X_train = train.drop("score", axis=1).values
y_train = train['score'].values.flatten()
X_validation = dev.drop("score", axis=1).values
y_validation = dev['score'].values.flatten()

# local compute
automated_ml_config = AutoMLConfig(
     X = X_train,
     y = y_train,
     X_valid = X_validation,
     y_valid = y_validation,
     **automl_settings)

## 3.3 Run Experiment

Run the experiment locally and inspect the results using a widget

In [11]:
experiment=Experiment(ws, 'automated-ml-regression')
local_run = experiment.submit(automated_ml_config, show_output=True)

Running on local machine
Parent Run ID: AutoML_ad20c29f-7d03-4079-8699-3133d24d3631
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0 

The results of the completed run can be visualized in two ways. First, by using a RunDetails widget as shown in the cell below. Second, by accessing the [Azure portal](https://portal.azure.com), selecting your workspace, clicking on _Experiments_ and then selecting the name and run number of the experiment you want to inspect. Both these methods will show the results and duration for each iteration (algorithm tried), a visualization of the results, and information about the run including the compute target, primary metric, etc.

In [None]:
# Inspect the run details using the provided widget
RunDetails(local_run).show()

![](https://nlpbp.blob.core.windows.net/images/autoMLwidget.PNG)

# 4. Deploy Sentence Similarity Model

## 4.1 Retrieve the Best Model
Now we can identify the model that maximized performance on a given metric (spearman correlation in our case) using the get_output method which returns the best run and fitted model across all iterations. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration. The object returned by AutoML is a Pipeline class which chains together multiple steps in a machine learning workflow in order to provide a "reproducible mechanism for building, evaluating, deploying, and running ML systems" (see [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb) for additional information about Pipelines). 

The different steps that make up the pipeline can be accessed through `fitted_model.named_steps` and information about data preprocessing is available through `fitted_model.named_steps['datatransformer'].get_featurization_summary()`

In [13]:
best_run, fitted_model = local_run.get_output()

## 4.2 Register the Fitted Model for Deployment
If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered.

In [14]:
description = 'AutoML Model'
tags = {'area': "nlp", 'type': "sentence similarity automl"}
name = 'automl'
model = local_run.register_model(description = description, tags = tags)

print(local_run.model_id) 

Registering model AutoMLad20c29f7best
AutoMLad20c29f7best


## 4.3 Create Scoring Script

In [15]:
%%writefile score.py
import pickle
import json
import numpy
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model


def init():
    global model
    model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

def run(rawdata):
    try:
        data = json.loads(rawdata)['data']
        data = numpy.array(data)
        result = model.predict(data)
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result":result.tolist()})

Overwriting score.py


In [16]:
# Substitute the actual model id in the script file.
script_file_name = 'score.py'

with open(script_file_name, 'r') as cefr:
    content = cefr.read()

with open(script_file_name, 'w') as cefw:
    cefw.write(content.replace('<<modelid>>', local_run.model_id))

## 4.4 Create a YAML File for the Environment

To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, autoenv.yml, which specifies the dependencies from the run.

In [17]:
experiment = Experiment(ws, 'automated-ml-regression')
ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)

In [18]:
best_iteration = int(best_run.id.split("_")[-1]) #get the appended iteration number for the best model
dependencies = ml_run.get_run_sdk_dependencies(iteration = best_iteration)

No issues found in the SDK package versions.


In [19]:
dependencies

{'azureml-train-automl': '1.0.43.1',
 'azureml-automl-core': '1.0.43',
 'azureml': '0.2.7',
 'azureml-widgets': '1.0.43.1',
 'azureml-train': '1.0.43',
 'azureml-train-restclients-hyperdrive': '1.0.43',
 'azureml-train-core': '1.0.43',
 'azureml-telemetry': '1.0.43',
 'azureml-sdk': '1.0.43',
 'azureml-pipeline': '1.0.43',
 'azureml-pipeline-steps': '1.0.43',
 'azureml-pipeline-core': '1.0.43',
 'azureml-dataprep': '1.1.5',
 'azureml-dataprep-native': '13.0.0',
 'azureml-core': '1.0.43.1',
 'azureml-contrib-brainwave': '1.0.33'}

Add dependencies in the yaml file from the above cell. You must specify the version of "azureml-sdk[automl]" while creating the yaml file.

In [20]:
myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],
                                 pip_packages=['azureml-sdk[automl]==1.0.43.*'], 
                                 python_version = '3.6.8')

conda_env_file_name = 'automlenv.yml'
myenv.save_to_file('.', conda_env_file_name)

'automlenv.yml'

## 4.5 Create a Container Image

In [21]:
image_config = ContainerImage.image_configuration(execution_script = script_file_name,
                                                  runtime = "python",
                                                  conda_file = conda_env_file_name,
                                                  description = "Image with automl model",
                                                  tags = {'area': "nlp", 'type': "sentencesimilarity automl"})

image = ContainerImage.create(name = "automl-image",
                              # this is the model object
                              models = [model],
                              image_config = image_config,
                              workspace = ws)

image.wait_for_creation(show_output = True)

Creating image
Running..................................................
Succeeded
Image creation operation finished for image automl-image:8, operation "Succeeded"


If the above step fails then use below command to see logs

In [None]:
print(image.image_build_log_uri) 

## 4.6 Deploy the Image as a Web Service to Azure Container Instance

In [22]:
#Set the web service configuration
aci_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 8)

In [23]:
# deploy image as web service
aci_service_name ='aci-automl-service'
aci_service = Webservice.deploy_from_image(workspace = ws, 
                                           name = aci_service_name,
                                           image = image,
                                           deployment_config = aci_config)

aci_service.wait_for_deployment(show_output = True)
print(aci_service.state)

Creating service
Running.......................
SucceededACI service creation operation finished, operation "Succeeded"
Healthy


Fetch logs to debug incase of failures.

In [None]:
print(aci_service.get_logs())

## 4.7 Test Deployed Model
We test the web sevice by passing data. The run method expects input in json format. Run() method retrieves API keys behind the scenes to make sure that call is authenticated. 

In [24]:
test_y = test['score'].values.flatten()
test_x = test.drop("score", axis=1).values.tolist()

data = {'data': test_x}
data = json.dumps(data)

In [25]:
# Set up a Timer to see how long the model takes to predict
t = Timer()

In [26]:
t.start()
score = aci_service.run(input_data = data)
t.stop()
print("Time elapsed: {}".format(t))

result = json.loads(score)
try:
    output = result["result"]
    print('Number of samples predicted: {0}'.format(len(output)))
except:
    print(result['error'])

Time elapsed: 2.7085
Number of samples predicted: 1379


Finally, we'll calculate the Pearson Correlation on the test set.

**What is Pearson Correlation?**

Our evaluation metric is Pearson correlation ($\rho$) which is a measure of the linear correlation between two variables. The formula for calculating Pearson correlation is as follows:  

$$\rho_{X,Y} = \frac{E[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X \sigma_Y}$$

This metric takes a value in [-1,1] where -1 represents a perfect negative correlation, 1 represents a perfect positive correlation, and 0 represents no correlation. We utilize the Pearson correlation metric as this is the metric that [SentEval](http://nlpprogress.com/english/semantic_textual_similarity.html), a widely-used evaluation toolkit for evaluation sentence representations, uses for the STS Benchmark dataset.

In [27]:
print(pearsonr(output, test_y)[0])

0.6038286237427414
