# Deploy as web service

Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI.

To build the correct environment for ACI, provide the following:

- A scoring script to show how to use the model
- An environment file to show what packages need to be installed
- A configuration file to build the ACI
- The model you trained before

In the previous module, we trained a machine learning model.

Now, we're ready to deploy the model as a web service in cloud, leveraging Microsoft Azure Container Instances (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself.

### Deployment workflow

The process of deploying a model is similar for all compute targets:

1. Train a model.
2. Register the model.
3. Create an image configuration.
4. Create the image.
5. Deploy the image to a compute target.
6. Test the deployment


The following code is based on the official Microsoft Azure Machine Learning documentation tutorial, https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml

We need to setup our dev environment in Azure first, by following the steps listed in the official documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace) and [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started).

In [None]:
import azureml
from azureml.core import Workspace, Run
from azureml.core.model import Model
from azureml.core.image import ContainerImage
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice

In [None]:
# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

You can choose to setup your workspace directly from the Azure Portal, or running the code below.

In [None]:
azure_subscription_id = 'YOUR-SUBSCRIPTION-ID'
azure_resource_group  = 'ps-fastai-rg'
azure_mlworkspace_name  = 'ps-fastai'

In [None]:
# Create Azure Machine Learning Workspace
ws = Workspace.create(name=azure_mlworkspace_name,
                      subscription_id=azure_subscription_id, 
                      resource_group=azure_resource_group,
                      create_resource_group=True,
                      location='westeurope' # Or other supported Azure region   
                     )

# Save the configuration file
ws.write_config()

If you created the Workspace from the Azure Portal, you can get a reference to it by running the following cell:

In [None]:
try:
    ws = Workspace(subscription_id = azure_subscription_id, resource_group = azure_resource_group, workspace_name = azure_mlworkspace_name)
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

In [None]:
ws.get_details()

### Connect to workspace
Once you have the configuration file, Workspace can be loaded using the following code:

In [None]:
# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

### Register the model

In [None]:
model_path = '../datasets/20news/models/final_for_prod.pth'
model_name = "ps-fastai-nlp-classification"

In [None]:
model = Model.register(model_path = model_path,
                       model_name = model_name,
                       tags = {"key": "0.1"},
                       description = "Pluralsight Fast.AI NLP Classification Model",
                       workspace = ws)

### Retrieve the model

In [None]:
model = Model.list(ws, name=model_name)[0]

### Create scoring script

Create the scoring script, called `score.py`, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:

The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started.

The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

The following script is based on https://github.com/fastai/fastai/blob/master/courses/dl2/imdb_scripts/predict_with_classifier.py

In [None]:
%%writefile ./score.py
from fastai.text import *
from azureml.core.model import Model
from html.parser import HTMLParser

class HTMLTextExtractor(html.parser.HTMLParser):
    def __init__(self):
        super(HTMLTextExtractor, self).__init__()
        self.result = [ ]

    def handle_data(self, d):
        self.result.append(d)

    def get_text(self):
        return ''.join(self.result)
    
    def error(self, message):
        return

def html_to_text(html):
    s = HTMLTextExtractor()    
    try:
        s.feed(html)
        return s.get_text()
    except:
        return html

def custom_tagstrip(x:str) -> str:
    "Remove all html tags in `x`."
    return html_to_text(x)

def load_model(classifier_filename):
    """Load the classifier and related metadata"""
    
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    
    if torch.cuda.is_available():
        print('USING CUDA-GPU')
    else:
        print('USING CPU')
    
    state = torch.load(Path(classifier_filename).open('rb'), map_location=device)
    
    if set(state.keys()) == {'model', 'model_params', 'vocab', 'classes'}:
        model_state = state['model']
        model_params = state['model_params']
        itos = state['vocab']
        classes = state['classes']
    else:
        raise RuntimeError("Invalid model provided.")
        
    # Turn it into a string to int mapping (which is what we need)
    stoi = collections.defaultdict(lambda:0, {str(v):int(k) for k,v in enumerate(itos)})
    
    # Get model reference from parameters (even if they are not used at runtime)
    model = get_rnn_classifier(bptt=model_params['bptt'],
                               max_seq=model_params['max_len'],
                               #model_params['n_class'],#removed in 1.0.41
                               vocab_sz=model_params['vocab_size'], 
                               emb_sz=model_params['emb_sz'],
                               n_hid=model_params['nh'],
                               n_layers=model_params['nl'],
                               pad_token=model_params['pad_token'],
                               layers=model_params['layers'],
                               drops=model_params['ps'],
                               input_p=model_params['dps'][0],
                               weight_p=model_params['dps'][1],
                               embed_p=model_params['dps'][2],
                               hidden_p=model_params['dps'][3],
                               qrnn=model_params['qrnn'])

    # Load the trained classifier
    model.load_state_dict(model_state)
    
    # Put the classifier into evaluation mode
    model.reset()
    model.eval()

    return stoi, classes, model

def predict_text(stoi, model, lang, text):
    """Do the actual prediction on the text using the model and mapping files passed"""

    # Predictions are done on arrays of input.
    # We only have a single input, so turn it into a 1x1 array
    texts = [text]

    # Tokenize using the FastAI wrapper around spaCy
    pre_rules = [custom_tagstrip] + defaults.text_pre_rules
    tokens = Tokenizer(lang=lang, pre_rules=pre_rules, n_cpus=1).process_all(texts)

    # Turn into integers for each word
    encoded = np.array([[stoi[o] for o in p] for p in tokens], dtype=np.int64)
    
    # Turn this array into a tensor
    data = torch.from_numpy(encoded)

    # Do the predictions
    predictions = model(data)
    
    # Get class probability from classifier predictions
    res = F.softmax(predictions[0], -1).detach().cpu().numpy()
    
    return res[0]

def init():
    global stoi
    global classes
    global model
    
    # Retrieve the path to the model file using the model name
    model_path = Model.get_model_path(model_name='ps-fastai-nlp-classification')
    stoi, classes, model = load_model(model_path)

def run(raw_data):
    deser_obj = json.loads(raw_data)
    
    if not set(deser_obj.keys()) == {'lang', 'text' }:
        return { "error": "invalid data" }
    
    lang = deser_obj['lang']
    text = deser_obj['text']
    
    # Make prediction  
    scores = predict_text(stoi, model, lang, text)
    pred_class = np.argmax(scores)
    
    # You can return any data type as long as it is JSON-serializable
    # We have to cast numpy data types (non-serializable) to standard types
    # See: https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable
    return { "label": classes[pred_class], "label_index": int(pred_class), "label_score": float(scores[pred_class]), "all_scores": scores.tolist() }

### Create environment file

Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `pytorch`, `fastai` and `azureml-sdk`.

In [None]:
myenv = CondaDependencies()
myenv.set_python_version("3.6.6")
myenv.add_pip_package("torch==1.0.0")
#myenv.add_pip_package("https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl")
myenv.add_pip_package("torchvision==0.2.1")
myenv.add_pip_package("fastai==1.0.42")

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

Test conda environment locally (in Anaconda Prompt):

`> conda env create -f nbs\myenv.yml`

 ### Create an image configuration

Deployed models are packaged as an image. The image contains the dependencies needed to run the model.

For Azure Container Instance or Azure Kubernetes Service the azureml.core.image.ContainerImage class is used to create an image configuration. The image configuration is then used to create a new Docker image.

For details, see: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.image.containerimage?view=azure-ml-py

In [None]:
image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                  runtime = "python",
                                                  conda_file = "myenv.yml",
                                                  docker_file="Dockerfile",
                                                  enable_gpu=True,
                                                  description = "Image with Fast.AI NLP classification model with GPU",
                                                  tags = {"data": "20newsgroup", "type": "classification"}
                                                 )

### Create the image

Once you have created the image configuration, you can use it to create an image. This image is stored in the container registry for your workspace.

Once created, you can deploy the same image to multiple services. Images are versioned automatically when you register multiple images with the same name. For example, the first image registered as `myimage` is assigned an ID of `myimage:1`. The next time you register an image as `myimage`, the ID of the new image is `myimage:2`.

In [None]:
%%time
# Create the image from the image configuration
image = ContainerImage.create(name = "myimage", 
                              models = [model], #this is the model object
                              image_config = image_config,
                              workspace = ws
                              )
image.wait_for_creation(show_output=True)

In case of errors, you can get logs with the following code:

In [None]:
# if you already have the image object handy
print(image.image_build_log_uri)

If you only know the name of the image:

In [None]:
ws.images

In [None]:
# if you only know the name of the image (note there might be multiple images with the same name but different version number)
print(ws.images['myimage'].image_build_log_uri)

### Deploy the image

When you get to deployment, the process is slightly different depending on the compute target that you deploy to. Use the information in the following sections to learn how to deploy to:

- Azure Container Instances

### Deploy in ACI

Estimated time to complete: about 7-8 minutes

Configure the image and deploy. The following code goes through these steps:

1. Build an image using:
   - The scoring file (score.py)
   - The environment file (myenv.yml)
   - The model file
2. Register that image under the workspace.
3. Send the image to the ACI container.
4. Start up a container in ACI using the image.
5. Get the web service HTTP endpoint.


In [None]:
aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               tags = {"data": "20newsgroup", "type": "classification"}, 
                                               description = 'Fast.AI NLP Classification GPU')

In [None]:
image = ws.images["myimage"]

In [None]:
print(image)

In [None]:
%%time
service_name = 'aci-fastai-1'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                            image = image,
                                            name = service_name,
                                            workspace = ws)
service.wait_for_deployment(show_output = True)
print(service.state)

In [None]:
service.get_logs()

In [None]:
print(image.image_location)

Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.

In [None]:
print(service.scoring_uri)