<center>
    <img src="https://cellstrat2.s3.amazonaws.com/PlatformAssets/bluewhitelogo.svg" alt="logo" width="200"/>
    <h1>⚡CellStrat Hub API</h1>
    <h2>🧰MLOps Hands-On Workshop🔧</h2>
    <h3>🚀Deploying a Zero-Shot Text Classifier Model📜</h3>
</center>

# Building the Model

In [1]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

### Load Pretrained Model

In [3]:
# Load pretrained model from the internet (by default it cache's it locally)
tokenizer = AutoTokenizer.from_pretrained('valhalla/distilbart-mnli-12-3')
model = AutoModelForSequenceClassification.from_pretrained('valhalla/distilbart-mnli-12-3')

### Preprocess Inputs

In [5]:
text = """Last week I upgraded my iOS version and ever since then my phone has been overheating 
whenever I use your app.
"""

In [6]:
classes = ['mobile', 'website', 'billing', 'account access']

In [8]:
def preprocess(text, classes, hypothesis_template="This text is about {}"):
    '''Preprocesses a single input text to align with each class'''
    # create the hypotheses for each class
    hypotheses = [hypothesis_template.format(c) for c in classes]
    
    # preprocess the inputs
    inputs = tokenizer(
        [text] * len(classes), 
        hypotheses, 
        return_tensors='pt',
        truncation='only_first',
        padding=True
    )['input_ids']
    
    return inputs

In [9]:
inputs = preprocess(text, classes, hypothesis_template)

### Make Predictions

In [14]:
with torch.inference_mode():
    logits = model(inputs).logits
logits

tensor([[-0.8225,  1.8000, -0.6673],
        [ 0.5615,  2.5067, -2.6900],
        [ 1.1835,  2.4916, -3.3967],
        [-0.1443,  2.9384, -2.6310]])

In [20]:
def post_process(logits, classes):
    '''Post-processes the model output to get the entailment logits and get the class prediction'''
    # get the index of entailment
    idx = model.config.label2id['entailment']
    # apply softmax over the entailment logits
    probabilities = torch.softmax(logits[:, idx], dim=0).tolist()
    
    output = []
    for i, prob in enumerate(probabilities):
        output.append(
            (classes[i], round(prob, 4))
        )
    
    return output

In [21]:
post_process(logits, classes)

[('mobile', 0.7474),
 ('website', 0.0989),
 ('billing', 0.0488),
 ('account access', 0.1049)]

### Save Model and Tokenizer for Offline Inference

Save a local copy of the model for deployment

In [None]:
model.save_pretrained('model_files')
tokenizer.save_pretrained('model_files')

# Deployment

### 1. Initialize Hub API Project
Open a terminal and run the following command,
```
hub init zero-shot-text-clf
```

![hub init](https://cellstrat-public.s3.amazonaws.com/workshop-files/hub-init-zstc.png)

Let's look at each one of those files,
1. `Dockerfile` - Every Hub API deployment package is essentially a docker image which contains the source code and the required libraries when built and deployed. Generally, its all automatically setup for almost all use cases so you don't need to change anything. _But if you are already familiar with docker you can modify the container and optimize it further._
2. `hub_config.json` - This contains some basic configuration of the project like the name of the project and its version. This file is what defines a project as a Hub API project. You don't need to change anything there as well.
3. `model/` - This folder is the place where all our model weights and other large files go. The contents of this folder are stored in a separate network storage and not part for the docker image so the image size remains as minimal as possible. The contents of this folder are available in the `MODEL_DIR` environment variable in your source code.
4. `src/` - This is where all your source code goes in and it already has a few boilerplate files generated.
    - `main.py` - This is the main python file which gets executed whenever a request is made to your model. We will explore it more as we go further.
    - `requirements.txt` - This is where you will list out the libraries required for your model to run.
    - `utils.py` - This is just a module which contains some utility methods for common operations which you might need when working with things like base64 encoded images. You can add your own utilities in this file.

### 2. Integration

#### i. Copy the files from `model_files/` to `zero-shot-text-clf/model/` folder in Hub API project

#### ii. Add your prediction code to the `zero-shot-text-clf/src/main.py` file
```python
import os
from hub import hub_handler
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

MODEL_DIR = os.getenv("MODEL_DIR")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_DIR)

def preprocess(text, classes, hypothesis_template="This text is about {}"):
    '''Preprocesses a single input text to align with each class'''
    # create the hypotheses for each class
    hypotheses = [hypothesis_template.format(c) for c in classes]
    
    # preprocess the inputs
    inputs = tokenizer(
        [text] * len(classes), 
        hypotheses, 
        return_tensors='pt',
        truncation='only_first',
        padding=True
    )['input_ids']
    
    return inputs

def post_process(logits, classes):
    '''Post-processes the model output to get the entailment logits and get the class prediction'''
    # get the index of entailment
    idx = model.config.label2id['entailment']
    # apply softmax over the entailment logits
    probabilities = torch.softmax(logits[:, idx], dim=0).tolist()
    
    output = []
    for i, prob in enumerate(probabilities):
        output.append(
            (classes[i], round(prob, 4))
        )
    
    return output

@hub_handler
def inference_handler(inputs, _):
    '''The main inference function which gets triggered when the API is invoked
    Args:
        inputs (dict): The payload the model recieves of the following format:
                {
                    "text": "The text to be classified here",
                    "classes": ['class 1', 'class 2', 'and so on...']
                }
    Returns:
        dict: Model output of the following format  
                [('class 1', 0.7474),
                 ('class 2', 0.1989),
                 ('class 3', 0.0488)]
    '''

    # Preprocess inputs
    model_inputs = preprocess(inputs['text'], inputs['classes'])
    
    # Make predictions
    with torch.inference_mode():
        logits = model(model_inputs).logits
        
    # Postprocess the predictions
    output = post_process(logits, inputs['classes'])

    return output

```

#### iii. Add the libraries in `zero-shot-text-clf/src/requirements.txt`
```
torch
transformers
```

### 3. Build and Deploy

Change directory into the `zero-shot-text-clf` project folder in the terminal and then run the following commands,
```bash
hub build
hub deploy
```

### Test the Deployed API

In [33]:
import os
import json
import requests
import base64

# Paste your key and username here
API_KEY = "YOUR API KEY"

# The API endpoint for your Hub API project of format https://api.cellstrathub.com/{USERNAME}/{API_NAME}
endpoint = "YOUR ENDPOINT"

headers = {
  "x-api-key": API_KEY,
  "Content-Type": "application/json"
}

We will start by making a `GET` request to load the model in memory. This request takes a minimum of 20 seconds which is a fixed to give enough time for the model to load asynchronously. You need to run this only once per session.

In [26]:
requests.get(endpoint, headers=headers).json()

{'statusCode': 200,
 'headers': {'Content-Type': 'application/json',
  'Access-Control-Allow-Origin': '*'},
 'body': 'Model Loaded in Memory'}

In [29]:
text = """Tesla's autonomous driving capability has inspired hair-raising antics on the road. 
Now the company is deploying an algorithm to determine whether customers have shown sufficiently 
sound judgement to use its “Full Self-Driving” software. What's new: Starting this week, the beta-test 
version of Tesla's latest self-driving update will be available only to drivers who have 
demonstrated safe driving. The beta program previously was open to about 2,000 drivers.
"""

In [30]:
classes = ['technology', 'finance', 'sports', 'business']

Now we will make a `POST` request for inference where we send our inputs in the body and then get the response back with the predictions.

In [32]:
payload = {
    'text': text,
    'classes': classes
}

# Send the POST request
response = requests.post(endpoint, headers=headers, data=json.dumps(payload)).json()

if response.get('statusCode') == 200:
    # Parse the output
    output = response['body']['output']
else:
    output = response

output

[['technology', 0.4171],
 ['finance', 0.1195],
 ['sports', 0.1811],
 ['business', 0.2823]]