# Sentiment Analysis 

This notebook demonstrates how to build a sentiment classifier using fastAI NLP pipeline on Azure ML service. 

Let's import the required Azure ML Packages and defines the needed constants...

In [None]:
%load_ext autoreload
%autoreload 2

import os
import json

from azureml.core import (Workspace, 
                          Experiment,
                          RunConfiguration,
                          VERSION)

from azureml.core.compute import ComputeTarget,AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.core.conda_dependencies import CondaDependencies
from azureml.widgets import RunDetails
from azureml.train.dnn import PyTorch
from azureml.train.estimator import Estimator

from azureml.core.model import InferenceConfig,Model
from azureml.core.webservice import LocalWebservice,AciWebservice




print("SDK version:", VERSION)

In [None]:
SUBSCRIPTION_ID = ""
RESOURCE_GROUP = ""
WORKSPACE_NAME = ""

EXPERIMENT_NAME ="SentimentAnalysis"
CLUSTER_NAME = "gpucluster"

PROJECT_DIR = os.getcwd()
DATASET_DIR = os.path.join(PROJECT_DIR,'data')
TRAIN_DIR = os.path.join(PROJECT_DIR,'code','train')
INFERENCE_DIR = os.path.join(PROJECT_DIR,'code','score')

## Initialize Azure ML workspace

We initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object to the Azure ML workspace 

In [None]:
ws = Workspace(subscription_id=SUBSCRIPTION_ID, 
               resource_group=RESOURCE_GROUP, 
               workspace_name=WORKSPACE_NAME
              )
    
ws.write_config()

## Upload dataset to datastore

To make data accessible for remote training, we'll upload the dataset to the [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data).

In [None]:
default_store = default_datastore=ws.datastores["workspaceblobstore"]

data_reference = default_store.upload(src_dir=DATASET_DIR,
                     target_path='sentiment_analysis', 
                     overwrite=True,
                     show_progress=True)
print(data_reference)

## Initialize Azure ML compute

Here we set the remote compute that we'll be used for training, if the cluster name provided is not already provisionned in the workspace, it will be created.

In [None]:
try:
    cluster = ComputeTarget(ws, CLUSTER_NAME)
    print(CLUSTER_NAME, "found")
    
except ComputeTargetException:
    print(CLUSTER_NAME, "not found, provisioning....")
    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',max_nodes=2)

    
    cluster = ComputeTarget.create(ws, CLUSTER_NAME, provisioning_config)

cluster.wait_for_completion(show_output=True)

## Initialize training estimator 

- We intialize a pytorch estimator and configure the script paramaters with expected arguments
- We define the conda environment file with fastAI library.

Note, we save the conda file twice one for each directory (training & scoring)

In [None]:
cd = CondaDependencies()
#cd.add_pip_package('matplotlib')
#cd.add_channel(channel = 'pytorch')
#cd.add_channel(channel = 'fastai')
cd.add_pip_package('fastai')

cd.save_to_file(conda_file_path='env.yml',
                base_directory=TRAIN_DIR)
cd.save_to_file(conda_file_path='env.yml',
                base_directory=INFERENCE_DIR)

In [None]:
script_params = {'--input_dir':data_reference,
                '--lm_lr':5e-3,
                '--clf_lr':1e-5,
                '--momentum_1':0.9,
                '--momentum_2':0.7
                }

estimator = Estimator(source_directory=TRAIN_DIR,
                    script_params = script_params,
                    conda_dependencies_file ='env.yml',
                    compute_target=cluster,
                    entry_script='train.py',
                    use_gpu=True)

## Create experiment and submit run for execution

Now we are ready to start training the model

In [None]:
experiment = Experiment(ws, name=EXPERIMENT_NAME)
run = experiment.submit(estimator)
RunDetails(run).show()

## Download & register model to workspace

In the training script, we save the model to the built-in *outputs* folder that Azure ML auto-upload to the run. 

Here we download the model and register in the workspace

In [None]:
model_path = os.path.join('outputs', 'classifier.pth')
run.download_file(model_path, output_file_path=model_path)

In [None]:
model = Model.register(workspace=ws,
                       model_name='sa_classifier', 
                       model_path=model_path,
                       description = "Sentiment analysis classifier")
print(model.name, model.version, sep = '\t')

## Deploy Web service for inference

Now we are ready to operationalize the model, AML will proceed with building docker image with the score.py file to serve prediction and the conda environment file for the packages dependencies and deploy the webservice endpoint to Azure container instance.

For more information on operationalization in Azure ML https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where

In [None]:
inference_config = InferenceConfig(runtime= "python", 
                                   entry_script=os.path.join(INFERENCE_DIR,"score.py"),
                                   conda_file=os.path.join(INFERENCE_DIR,"env.yml")
                                  )

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

service = Model.deploy(workspace=ws, 
                       name="sentiment-analysis-image", 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=deployment_config
                      )

service.wait_for_deployment(True)

## Test deployed web service

In [None]:
import json

test_sample = json.dumps({'data': ["That was an awesome experience, I will watch it again!"]})
service.run(test_sample)