# Deploy as Web Service

There are various way to put a model in production, we'll see how to make it available as a web service hosted in cloud, leveraging **Microsoft Azure Machine Learning Services** and their **Azure Container Instances (ACI)**. 

The following code is based on the official Microsoft Azure Machine Learning documentation tutorial:  
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml

## Prepare Model for Production

In [1]:
from fastai.text import *

In [2]:
DATA_PATH = Path('../datasets/20news')
DATA_PATH.mkdir(exist_ok=True)

In [3]:
bs = 32

In [4]:
drop_mult = 0.5

In [5]:
data_clas = TextClasDataBunch.load(DATA_PATH, 'tmp_clas', bs=bs)

In [6]:
learn = text_classifier_learner(data_clas, drop_mult=drop_mult)
learn.load('final')

RNNLearner(data=TextClasDataBunch;

Train: LabelList
y: CategoryList (11314 items)
[Category alt.atheism, Category alt.atheism, Category alt.atheism, Category alt.atheism, Category alt.atheism]...
Path: .
x: TextList (11314 items)
[Text xxbos xxmaj from : mathew < mathew@mantis.co.uk > 
 xxmaj subject : xxmaj alt . xxmaj atheism xxup faq : xxmaj atheist xxmaj resources 
 xxmaj summary : xxmaj books , addresses , music -- anything related to atheism 
 xxmaj keywords : xxup faq , atheism , books , music , fiction , addresses , contacts 
 xxmaj expires : xxmaj thu , 29 xxmaj apr 1993 xxunk xxup gmt 
 xxmaj distribution : world 
 xxmaj organization : xxmaj mantis xxmaj consultants , xxmaj cambridge . xxup uk . 
 xxmaj supersedes : < xxunk > 
 xxmaj lines : 290 

 xxmaj archive - name : atheism / resources 
 xxmaj alt - atheism - archive - name : resources 
 xxmaj last - modified : 11 xxmaj december 1992 
 xxmaj version : 1.0 

  xxmaj atheist xxmaj resources 

  xxmaj addresses of xxmaj at

The following parameters should match the one used to train the model.  
*In this specific example* most of them are fastai default values, so we can get them from fastai library source code.  
In the specific, fastai.text learner internal implementation and related modules.

In [7]:
drop_mult=0.5
dps = default_dropout['classifier'] * drop_mult
bptt=70
emb_sz=400
nh=1150
nl=3
pad_token=1
qrnn=False
max_len=70*20
lin_ftrs = [50]
ps = [0.1]
vocab_size = len(data_clas.vocab.itos)
n_class = data_clas.c
layers = [emb_sz*3] + lin_ftrs + [n_class]
ps = [dps[4]] + ps

In [8]:
torch.save(
    { "model": learn.model.state_dict(), 
      "model_params": {
          "drop_mult": drop_mult,
          "dps": dps,
          "bptt": bptt,
          "emb_sz": emb_sz,
          "nh": nh,
          "nl": nl,
          "pad_token": pad_token,
          "qrnn": qrnn,
          "max_len": max_len,
          "lin_ftrs": lin_ftrs,
          "ps": ps,
          "vocab_size": vocab_size,
          "n_class": n_class,
          "layers": layers,
          "ps": ps},
      "vocab": data_clas.vocab.itos,
      "classes": data_clas.classes
    }, DATA_PATH/'models'/'final_for_prod.pth')

In fastai v1.0, there is a built-in way to perform a similar production export for supported learners.

In [None]:
learn.export()

In [None]:
learn = load_learner(path)

In [None]:
pred_class, pred_idx, outputs = learn.predict("text to predict")

## Scoring script

In [9]:
%%writefile ./score_cmd.py
from fastai.text import *
from html.parser import HTMLParser

class HTMLTextExtractor(html.parser.HTMLParser):
    def __init__(self):
        super(HTMLTextExtractor, self).__init__()
        self.result = [ ]

    def handle_data(self, d):
        self.result.append(d)

    def get_text(self):
        return ''.join(self.result)
    
    def error(self, message):
        return

def html_to_text(html):
    s = HTMLTextExtractor()    
    try:
        s.feed(html)
        return s.get_text()
    except:
        return html

def custom_tagstrip(x:str) -> str:
    "Remove all html tags in `x`."
    return html_to_text(x)

def load_model(classifier_filename):
    """Load the classifier and related metadata"""
    
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    
    state = torch.load(Path(classifier_filename).open('rb'), map_location=device)
    
    if set(state.keys()) == {'model', 'model_params', 'vocab', 'classes'}:
        model_state = state['model']
        model_params = state['model_params']
        itos = state['vocab']
        classes = state['classes']
    else:
        raise RuntimeError("Invalid model provided.")
            
    # Turn it into a string to int mapping (which is what we need)
    stoi = collections.defaultdict(lambda:0, {str(v):int(k) for k,v in enumerate(itos)})
    
    # Get model reference from parameters (even if they are not used at runtime)
    model = get_rnn_classifier(bptt=model_params['bptt'],
                               max_seq=model_params['max_len'],
                               vocab_sz=model_params['vocab_size'], 
                               emb_sz=model_params['emb_sz'],
                               n_hid=model_params['nh'],
                               n_layers=model_params['nl'],
                               pad_token=model_params['pad_token'],
                               layers=model_params['layers'],
                               drops=model_params['ps'],
                               input_p=model_params['dps'][0],
                               weight_p=model_params['dps'][1],
                               embed_p=model_params['dps'][2],
                               hidden_p=model_params['dps'][3],
                               qrnn=model_params['qrnn'])

    # Load the trained classifier
    model.load_state_dict(model_state)
    
    # Put the classifier into evaluation mode
    model.reset()
    model.eval()

    return stoi, classes, model

def predict_text(stoi, model, lang, text):
    """Do the actual prediction on the text using the model and mapping files passed"""

    # Predictions are done on arrays of input.
    # We only have a single input, so turn it into a 1x1 array
    texts = [text]

    # Tokenize using the fastai wrapper around spaCy
    pre_rules = [custom_tagstrip] + defaults.text_pre_rules
    tokens = Tokenizer(lang=lang, pre_rules=pre_rules, n_cpus=1).process_all(texts)

    # Turn into integers for each word
    encoded = np.array([[stoi[o] for o in p] for p in tokens], dtype=np.int64)
    
    # Turn this array into a tensor
    data = torch.from_numpy(encoded)

    # Do the predictions
    predictions = model(data)
    
    # Get class probability from classifier predictions
    res = F.softmax(predictions[0], -1).detach().cpu().numpy()
    
    return res[0]

def init():
    global stoi
    global classes
    global model
    
    # Retrieve the path to the model file using the model name
    model_path = "../datasets/20news/models/final_for_prod.pth"
    stoi, classes, model = load_model(model_path)

def run(raw_data):
    deser_obj = raw_data
    lang = deser_obj['lang']
    text = deser_obj['text']
    
    # Make prediction  
    scores = predict_text(stoi, model, lang, text)
    pred_class = np.argmax(scores)
    
    print(f"Class: {classes[pred_class]} ({scores[pred_class]})")
    
    # You can return any data type as long as it is JSON-serializable
    # We have to cast numpy data types (non-serializable) to standard types
    return { "label": classes[pred_class], "label_index": int(pred_class), "label_score": float(scores[pred_class]), "all_scores": scores.tolist() }

if __name__ == '__main__':
    init()
    run({"lang": sys.argv[1], "text": sys.argv[2]})

Writing ./score_cmd.py


We can test it by launching from the command line:

`python score_cmd.py en "Example text to classify"`

## Setup Azure ML Workspace

In [10]:
import azureml
from azureml.core import Workspace, Run
from azureml.core.model import Model
from azureml.core.image import ContainerImage
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core.webservice import Webservice
from azureml.core.webservice import AciWebservice

In [11]:
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.17


You can choose to setup your workspace directly from the Azure Portal, or running the code below.

In [12]:
azure_subscription_id = ''
azure_resource_group  = 'ps-fastai-rg'
azure_mlworkspace_name  = 'ps-fastai'

In [13]:
# Create Azure Machine Learning Workspace
ws = Workspace.create(name=azure_mlworkspace_name,
                      subscription_id=azure_subscription_id, 
                      resource_group=azure_resource_group,
                      create_resource_group=True,
                      location='westeurope' # Or other supported Azure region   
                     )

# Save the configuration file
ws.write_config()

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Performing interactive authentication. Please follow the instructions on the terminal.


Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"
You have logged in. Now let us find all the subscriptions to which you have access...


Interactive authentication successfully completed.




Deploying KeyVault with name psfastaikeyvaultvkxfwkqb.
Deploying ContainerRegistry with name psfastaiacrfevijamb.
Deployed ContainerRegistry with name psfastaiacrfevijamb.
Deploying AppInsights with name psfastaiinsightsepvuqiau.
Deployed AppInsights with name psfastaiinsightsepvuqiau.
Deploying StorageAccount with name psfastaistoragedwmtfiys.
Deployed KeyVault with name psfastaikeyvaultvkxfwkqb.
Deployed StorageAccount with name psfastaistoragedwmtfiys.
Deploying Workspace with name ps-fastai.
Deployed Workspace with name ps-fastai.
Wrote the config file config.json to: D:\Users\Gianni\Projects\ps-fastai\nbs\aml_config\config.json


In [None]:
ws = Workspace.from_config()

If you created the Workspace from the Azure Portal, you can get a reference to it by running the following cell:

In [None]:
try:
    ws = Workspace(subscription_id = azure_subscription_id, resource_group = azure_resource_group, workspace_name = azure_mlworkspace_name)
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

In [None]:
ws.get_details()

### Connect to Workspace

In [14]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

Found the config file in: D:\Users\Gianni\Projects\ps-fastai\nbs\aml_config\config.json
ps-fastai	westeurope	ps-fastai-rg	westeurope


### Register the Model

In [15]:
model_path = '../datasets/20news/models/final_for_prod.pth'
model_name = "ps-fastai-nlp-classification"

In [16]:
model = Model.register(model_path = model_path,
                       model_name = model_name,
                       tags = {"key": "0.1"},
                       description = "Pluralsight Fast.AI NLP Classification Model",
                       workspace = ws)

Registering model ps-fastai-nlp-classification


### Retrieve the Model

In [None]:
model = Model.list(ws, name=model_name)[0]

## Create Scoring Script for AML Services

In [17]:
%%writefile ./score.py
from fastai.text import *
from azureml.core.model import Model
from html.parser import HTMLParser

class HTMLTextExtractor(html.parser.HTMLParser):
    def __init__(self):
        super(HTMLTextExtractor, self).__init__()
        self.result = [ ]

    def handle_data(self, d):
        self.result.append(d)

    def get_text(self):
        return ''.join(self.result)
    
    def error(self, message):
        return

def html_to_text(html):
    s = HTMLTextExtractor()    
    try:
        s.feed(html)
        return s.get_text()
    except:
        return html

def custom_tagstrip(x:str) -> str:
    "Remove all html tags in `x`."
    return html_to_text(x)

def load_model(classifier_filename):
    """Load the classifier and related metadata"""
    
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    
    if torch.cuda.is_available():
        print('USING CUDA-GPU')
    else:
        print('USING CPU')
    
    state = torch.load(Path(classifier_filename).open('rb'), map_location=device)
    
    if set(state.keys()) == {'model', 'model_params', 'vocab', 'classes'}:
        model_state = state['model']
        model_params = state['model_params']
        itos = state['vocab']
        classes = state['classes']
    else:
        raise RuntimeError("Invalid model provided.")
        
    # Turn it into a string to int mapping (which is what we need)
    stoi = collections.defaultdict(lambda:0, {str(v):int(k) for k,v in enumerate(itos)})
    
    # Get model reference from parameters (even if they are not used at runtime)
    model = get_rnn_classifier(bptt=model_params['bptt'],
                               max_seq=model_params['max_len'],
                               vocab_sz=model_params['vocab_size'], 
                               emb_sz=model_params['emb_sz'],
                               n_hid=model_params['nh'],
                               n_layers=model_params['nl'],
                               pad_token=model_params['pad_token'],
                               layers=model_params['layers'],
                               drops=model_params['ps'],
                               input_p=model_params['dps'][0],
                               weight_p=model_params['dps'][1],
                               embed_p=model_params['dps'][2],
                               hidden_p=model_params['dps'][3],
                               qrnn=model_params['qrnn'])

    # Load the trained classifier
    model.load_state_dict(model_state)
    
    # Put the classifier into evaluation mode
    model.reset()
    model.eval()

    return stoi, classes, model

def predict_text(stoi, model, lang, text):
    """Do the actual prediction on the text using the model and mapping files passed"""

    # Predictions are done on arrays of input.
    # We only have a single input, so turn it into a 1x1 array
    texts = [text]

    # Tokenize using the fastai wrapper around spaCy
    pre_rules = [custom_tagstrip] + defaults.text_pre_rules
    tokens = Tokenizer(lang=lang, pre_rules=pre_rules, n_cpus=1).process_all(texts)

    # Turn into integers for each word
    encoded = np.array([[stoi[o] for o in p] for p in tokens], dtype=np.int64)
    
    # Turn this array into a tensor
    data = torch.from_numpy(encoded)

    # Do the predictions
    predictions = model(data)
    
    # Get class probability from classifier predictions
    res = F.softmax(predictions[0], -1).detach().cpu().numpy()
    
    return res[0]

def init():
    global stoi
    global classes
    global model
    
    # Retrieve the path to the model file using the model name
    model_path = Model.get_model_path(model_name='ps-fastai-nlp-classification')
    stoi, classes, model = load_model(model_path)

def run(raw_data):
    deser_obj = json.loads(raw_data)
    
    if not set(deser_obj.keys()) == {'lang', 'text' }:
        return { "error": "invalid data" }
    
    lang = deser_obj['lang']
    text = deser_obj['text']
    
    # Make prediction  
    scores = predict_text(stoi, model, lang, text)
    pred_class = np.argmax(scores)
    
    # You can return any data type as long as it is JSON-serializable
    # We have to cast numpy data types (non-serializable) to standard types
    return { "label": classes[pred_class], "label_index": int(pred_class), "label_score": float(scores[pred_class]), "all_scores": scores.tolist() }

Writing ./score.py


## Create Environment Files

In [18]:
myenv = CondaDependencies()
myenv.set_python_version("3.6.6")
myenv.add_pip_package("torch==1.0.0")
#myenv.add_pip_package("https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl")
myenv.add_pip_package("torchvision==0.2.1")
myenv.add_pip_package("fastai==1.0.42")

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

In [19]:
%%writefile ./Dockerfile
ARG buildtime_scoring_var=30000
ENV SCORING_TIMEOUT_MS=$buildtime_scoring_var
RUN apt-get -y update && apt-get install -y gcc

Writing ./Dockerfile


 ## Create an Image Configuration

For details, see: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.image.containerimage?view=azure-ml-py

In [20]:
image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                  runtime = "python",
                                                  conda_file = "myenv.yml",
                                                  docker_file="Dockerfile",
                                                  enable_gpu=False,
                                                  description = "Image with Fast.AI NLP classification model",
                                                  tags = {"data": "20newsgroups", "type": "classification"}
                                                 )

## Create the Image

In [21]:
%%time
image = ContainerImage.create(name = "myimage", 
                              models = [model],
                              image_config = image_config,
                              workspace = ws
                              )
image.wait_for_creation(show_output=True)

Creating image
Running...............................................................................
SucceededImage creation operation finished for image myimage:1, operation "Succeeded"
Wall time: 7min 40s


In [26]:
print(image.image_build_log_uri)

https://psfastaistoragedwmtfiys.blob.core.windows.net/azureml/ImageLogs/3c345850-aea3-43ac-a33e-24bfba43f887/build.log?sv=2017-04-17&sr=b&sig=zdFz3yRHKFYKPf0DdyKveMheGbvOIc4bxlSZ3YrWDAQ%3D&st=2019-03-03T22%3A55%3A23Z&se=2019-04-02T23%3A00%3A23Z&sp=rl


In [None]:
ws.images

In [None]:
print(ws.images['myimage:1'].image_build_log_uri)

## Deploy the image in ACI

In [22]:
aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               tags = {"data": "20newsgroups", "type": "classification"}, 
                                               description = 'fastai NLP Classification')

In [None]:
image = ws.images["myimage"]

In [None]:
image = ContainerImage(ws, id="myimage:1")

In [23]:
print(image)

ContainerImage(workspace=<azureml.core.workspace.Workspace object at 0x000002009EE44550>, name=myimage, id=myimage:1, tags={'data': '20newsgroups', 'type': 'classification'}, properties={}, version=1)


In [24]:
%%time
service_name = 'aci-fastai-1'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                            image = image,
                                            name = service_name,
                                            workspace = ws)
service.wait_for_deployment(show_output = True)
print(service.state)

Creating service
Running..................................
SucceededACI service creation operation finished, operation "Succeeded"
Healthy
Wall time: 3min 4s


In [25]:
print(service.scoring_uri)

http://40.119.155.183:80/score


### Troubleshooting

In [27]:
log = service.get_logs()

In [28]:
log

'2019-03-03T23:04:27,593708017+00:00 - iot-server/run \n2019-03-03T23:04:27,593856821+00:00 - gunicorn/run \n2019-03-03T23:04:27,593309507+00:00 - rsyslog/run \n2019-03-03T23:04:27,593251005+00:00 - nginx/run \nok: run: gunicorn: (pid 13) 0s\nok: run: nginx: (pid 12) 0s\nok: run: rsyslog: (pid 14) 0s\nok: run: rsyslog: (pid 14) 0s\nok: run: rsyslog: (pid 14) 0s\nEdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...\n2019-03-03T23:04:28,243427425+00:00 - iot-server/finish 1 0\n2019-03-03T23:04:28,244947663+00:00 - Exit code 1 is normal. Not restarting iot-server.\n{"timestamp": "2019-03-03T23:04:28.922907Z", "message": "Starting gunicorn 19.6.0", "host": "wk-caas-b24fe58052c54fe9b89f241c88e4a7d2-12a73d597d9972c695213d", "path": "/opt/miniconda/lib/python3.6/site-packages/gunicorn/glogging.py", "tags": "%(module)s, %(asctime)s, %(levelname)s, %(message)s", "level": "INFO", "logger": "gunicorn.error", "msg": "Starting gunicorn %s", "stack_info": null}\n{"timestamp": "

In [29]:
log.rstrip().split('\n')

['2019-03-03T23:04:27,593708017+00:00 - iot-server/run ',
 '2019-03-03T23:04:27,593856821+00:00 - gunicorn/run ',
 '2019-03-03T23:04:27,593309507+00:00 - rsyslog/run ',
 '2019-03-03T23:04:27,593251005+00:00 - nginx/run ',
 'ok: run: gunicorn: (pid 13) 0s',
 'ok: run: nginx: (pid 12) 0s',
 'ok: run: rsyslog: (pid 14) 0s',
 'ok: run: rsyslog: (pid 14) 0s',
 'ok: run: rsyslog: (pid 14) 0s',
 'EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...',
 '2019-03-03T23:04:28,243427425+00:00 - iot-server/finish 1 0',
 '2019-03-03T23:04:28,244947663+00:00 - Exit code 1 is normal. Not restarting iot-server.',
 '{"timestamp": "2019-03-03T23:04:28.922907Z", "message": "Starting gunicorn 19.6.0", "host": "wk-caas-b24fe58052c54fe9b89f241c88e4a7d2-12a73d597d9972c695213d", "path": "/opt/miniconda/lib/python3.6/site-packages/gunicorn/glogging.py", "tags": "%(module)s, %(asctime)s, %(levelname)s, %(message)s", "level": "INFO", "logger": "gunicorn.error", "msg": "Starting gunicorn %s",

In [30]:
print(image.image_location)

psfastaiacrfevijamb.azurecr.io/myimage:1


`az container logs --resource-group <resource-group> --name <containergroup> --container-name <container>`

https://docs.microsoft.com/en-us/azure/container-instances/container-instances-get-logs

https://docs.microsoft.com/en-us/azure/container-instances/container-instances-troubleshooting