# Running a Hugging Face Model on Raspberry Pi
In this notebook, we will download a model from **[Hugging Face](https://huggingface.co)** using the `transformers` library, save it locally to desktop, and automatically containerize it using **[Chassis.ml](https://chassis.ml)**. 

Then, we will show a few examples of how to make gRPC API calls to this containerized model. In the first approach, we will download the Docker container directly to our Raspberry Pi, use the model protofile to generate gRPC client-side Python code, and execute that code directly from this notebook (SSH remote port-forwarding will be required, but instructions included below). In the second approach, we will use **[Modzy's](https://modzy.com)** edge feature to deploy the model to the Pi and leverage the Modzy APIs to make inference calls to the model.

### Environment Set Up

1. Create a virtual environment (venv, conda, or other preferred virtual environment) with Python 3.6 or newer.
2. Pip install at a minimum the following requirements:

```pip install torch transformers[torch] numpy chassisml modzy-sdk```

### Hardware

This tutorial is broken up into two parts: (1) containerizing a Hugging Face model, and (2) making API calls to the container. For part 1, all you will need to do is run the top half of this notebook, which you can do right on your laptop, Google colab, cloud environment, or your other preferred environment. In the second part, we will download the model container directly to our Raspberry Pi. To follow along in part 2, you can use your own Raspberry Pi or you can run your model container on your laptop or other infrastructure of choice if no Pi or edge device is available.    

###  Additional Resources

Below are links to the several Python libraries used, Hugging Face model, built Docker container, ....
* **[Chassis.ml](https://chassis.ml)**
* **[Transformers](https://huggingface.co/docs/transformers/main/en/installation)**
* **[TinyBERT HF Model](https://huggingface.co/gokuls/BERT-tiny-emotion-intent?text=I+like+you.+I+love+you)**
* **[TinyBERT Docker Container](https://hub.docker.com/repository/docker/modzy/tinybert-arm)**

# 1. Containerize Hugging Face Model
In the first section of this notebook, we will download a [TinyBERT](https://huggingface.co/gokuls/BERT-tiny-emotion-intent?text=I+like+you.+I+love+you) model from Hugging Face and use the Chassis Python library to automatically build a machine learning model container.

## Hugging Face
Download and test model from transformers library

In [1]:
# import packages
import os
import torch
import chassisml
import numpy as np
from transformers import BertTokenizer, BertForSequenceClassification

In [2]:
# download TinyBERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained("gokuls/BERT-tiny-emotion-intent")
model = BertForSequenceClassification.from_pretrained("gokuls/BERT-tiny-emotion-intent")

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/389 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/973 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

In [4]:
# save model locally so we can use/access it with Chassisml package
tokenizer.save_pretrained("./tiny-bert-model")
model.save_pretrained("./tiny-bert-model")

In [7]:
# define labels from model config
labels = model.config.id2label

In [8]:
# run sample inference on TinyBERT model
text = "I cannot wait to go skiing this winter!"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    softmax = torch.nn.functional.softmax(logits, dim=1).detach().cpu().numpy()
indices = np.argsort(softmax)[0][::-1]
results = {
    "data": {
        "result": {
            "classPredictions": [{"class": labels[i], "score": softmax[0][i]} for i in indices]
        }
    }
}
print(results)

{'data': {'result': {'classPredictions': [{'class': 'LABEL_1', 'score': 0.99906284}, {'class': 'LABEL_2', 'score': 0.00026847195}, {'class': 'LABEL_0', 'score': 0.0002184094}, {'class': 'LABEL_3', 'score': 0.00019786197}, {'class': 'LABEL_5', 'score': 0.00013340256}, {'class': 'LABEL_4', 'score': 0.00011909042}]}}}


## Chassisml
Automatically build a container from this model

In [11]:
# load model to memory
tinybert_tokenizer = BertTokenizer.from_pretrained("./tiny-bert-model")
tinybert_model = BertForSequenceClassification.from_pretrained("./tiny-bert-model")
mapped_labels = {"LABEL_0": 'sadness',"LABEL_1": 'joy',"LABEL_2": 'love',"LABEL_3": 'anger',"LABEL_4": 'fear',"LABEL_5": 'surprise'}

In [12]:
# define process function that will serve as our inference function
def process(input_bytes):
    # decode and preprocess data bytes
    text = input_bytes.decode()
    inputs = tinybert_tokenizer(text, return_tensors="pt")
    
    # run preprocessed data through model
    with torch.no_grad():
        logits = tinybert_model(**inputs).logits
        softmax = torch.nn.functional.softmax(logits, dim=1).detach().cpu().numpy()
        
    # postprocess 
    indices = np.argsort(softmax)[0][::-1]
    results = {
        "data": {
            "result": {
                "classPredictions": [{"class": mapped_labels[labels[i]], "score": softmax[0][i]} for i in indices]
            }
        }
    }
    
    return results

In [17]:
# initialize Chassis client
chassis_client = chassisml.ChassisClient(os.getenv("CHASSIS_URL"))

In [18]:
# create Chassis model
chassis_model = chassis_client.create_model(process_fn=process)

# test Chassis model locally (can pass filepath, bufferedreader, bytes, or text here):
sample_filepath = './input.text'
results = chassis_model.test(sample_filepath)
print(results)

b'{"data":{"result":{"classPredictions":[{"class":"joy","score":0.9988540410995483},{"class":"sadness","score":0.0006223577074706554},{"class":"love","score":0.00022895698202773929},{"class":"surprise","score":0.0001073237945092842},{"class":"anger","score":0.0001029252671287395},{"class":"fear","score":8.438550867140293e-05}]}}}'


In [19]:
# manually construct conda environment to pass to Chassis job. This step is optional, but we will define this env variable to minimize requirements installed in container
env = {
    "name": "huggingface-chassis",
    "channels": ['conda-forge'],
    "dependencies": [
        "python=3.8.5",
        {
            "pip": [
                "chassisml",
                "torch",
                "transformers[torch]",
                "numpy"
            ] 
        }
    ]
}

In [20]:
# publish model to Dockerhub
response = chassis_model.publish(
    model_name="TinyBERT ARM",
    model_version="1.0.0",
    registry_user=os.getenv("DOCKER_USER"),
    registry_pass=os.getenv("DOCKER_PASS"),
    conda_env=env,
    arm64=True
)

Starting build job... Ok!


In [None]:
# wait for job to complete and print out final status
job_id = response.get('job_id')
final_status = chassis_client.block_until_complete(job_id)

# 2. Make gRPC API Calls to Container
In this section, we will make inference calls to our model container taking two approaches:
* In the first approach, we will simply run the model container and leverage this **[gRPC tutorial](https://grpc.io/docs/languages/python/quickstart/)** to generate client-side Python code to make gRPC calls to our Docker container.
* In the second approach, we will leverage **[Modzy's](https://modzy.com)** Edge feature and APIs to deploy our model to our Raspberry Pi and make inference calls.

## Python gRPC API direct to container
Use auto-generated gRPC client code to make API calls directly to a running container

In [19]:
# import auto generated Python client code to make gRPC API calls to running container
import json
import logging
from typing import Dict
import grpc
from auto_generated.model2_template.model_pb2 import InputItem, RunRequest, RunResponse, StatusRequest
from auto_generated.model2_template.model_pb2_grpc import ModzyModelStub
logging.basicConfig(level=logging.INFO)
LOGGER = logging.getLogger(__name__)

In [20]:
# define run function that wraps the auto-generated RPC calls into a single function call
HOST = "localhost"

def run(model_input):
    def create_input(input_text: Dict[str, bytes]) -> InputItem:
        input_item = InputItem()
        for input_filename, input_contents in input_text.items():
            input_item.input[input_filename] = input_contents
        return input_item

    def unpack_and_report_outputs(run_response: RunResponse):
        for output_item in run_response.outputs:
            if "error" in output_item.output:
                output = output_item.output["error"]
            else:
                output = output_item.output["results.json"]
            LOGGER.info(f"gRPC client received: {json.loads(output.decode())}")

    port = 45000
    LOGGER.info(f"Connecting to gRPC server on {HOST}:{port}")
    with grpc.insecure_channel(f"{HOST}:{port}") as grpc_channel:
        grpc_client_stub = ModzyModelStub(grpc_channel)
        try:
            grpc_client_stub.Status(StatusRequest())  # Initialize the model
        except Exception:
            LOGGER.error(
                f"It appears that the Model Server is unreachable. Did you ensure it is running on {HOST}:{port}?"
            )
            return

        LOGGER.info(f"Sending single input.")
        run_request = RunRequest(inputs=[create_input(model_input)])
        single_response = grpc_client_stub.Run(run_request)
        unpack_and_report_outputs(single_response)

In [21]:
# generate inputs and send them through run function
text1 = b"Today is a beautiful day"
text2 = b"I am very sad"
text3 = b"The haunted house was terrifying"

for text in [text1, text2, text3]:
    test_inputs = {"input.txt": text}
    run(test_inputs)

INFO:__main__:Connecting to gRPC server on localhost:45000
INFO:__main__:Sending single input.
INFO:__main__:gRPC client received: {'data': {'result': {'classPredictions': [{'class': 'joy', 'score': 0.9952616691589355}, {'class': 'sadness', 'score': 0.003636266803368926}, {'class': 'love', 'score': 0.0004556376370601356}, {'class': 'fear', 'score': 0.00033212138805538416}, {'class': 'anger', 'score': 0.00017305587243754417}, {'class': 'surprise', 'score': 0.00014126564201433212}]}, 'explanation': None, 'drift': None}}
INFO:__main__:Connecting to gRPC server on localhost:45000
INFO:__main__:Sending single input.
INFO:__main__:gRPC client received: {'data': {'result': {'classPredictions': [{'class': 'sadness', 'score': 0.9991483688354492}, {'class': 'anger', 'score': 0.00029894441831856966}, {'class': 'joy', 'score': 0.0002292738063260913}, {'class': 'fear', 'score': 0.00018330290913581848}, {'class': 'love', 'score': 7.628439198015258e-05}, {'class': 'surprise', 'score': 6.3891959143802

## Modzy Edge APIs
Leverage Modzy's Edge feature and inference APIs to scale production inferencing across multiple Pi devices

In [22]:
import random
from modzy.edge.client import EdgeClient
client = EdgeClient("localhost",55000)

MODEL_ID = "uxchm260wz"
MODEL_VERSION = "1.0.0"

with open("test.ft.txt", "r", encoding="utf-8") as f:
    text = f.readlines()[:2500] 
    
# clean reviews before feeding to model
text_cleaned = [t.split("__label__")[-1][2:].replace("\n", "") for t in text]
print(f"{len(text_cleaned)} reviews pulled")

# randomly select n number of reviews
reviews = random.choices(text_cleaned, k=10)
print(f"Randomly sampled {len(reviews)} reviews")

# submit n jobs for random baseline
for review in reviews:
    job = client.submit_text(MODEL_ID, MODEL_VERSION, {"input.txt": review})
    final_job_details = client.block_until_complete(job)
    results = client.get_results(job)
    print(f"Job {job} complete")
    print(results)

1000 reviews pulled
Randomly sampled 10 reviews
Job job-2HMlhHIfsvYVOocLzr1ScFLAXhh complete
{'jobIdentifier': 'job-2HMlhHIfsvYVOocLzr1ScFLAXhh', 'accountIdentifier': 'local', 'submittedAt': '2022-11-10T17:55:50.267273022Z', 'total': 1, 'completed': 1, 'finished': True, 'results': {'job': {'status': 'SUCCESSFUL', 'startTime': '2022-11-10T17:55:50.267273022Z', 'updateTime': '2022-11-10T17:55:50.933817805Z', 'endTime': '2022-11-10T17:55:50.931753650Z', 'results.json': {'data': {'drift': None, 'explanation': None, 'result': {'classPredictions': [{'class': 'joy', 'score': 0.9976446032524109}, {'class': 'love', 'score': 0.0014795303577557206}, {'class': 'surprise', 'score': 0.0004638918035198003}, {'class': 'sadness', 'score': 0.0001844268263084814}, {'score': 0.0001166945367003791, 'class': 'fear'}, {'score': 0.0001107103016693145, 'class': 'anger'}]}}}}}}
Job job-2HMlhMivGYFiSJN0RFVS9IwVvR8 complete
{'jobIdentifier': 'job-2HMlhMivGYFiSJN0RFVS9IwVvR8', 'accountIdentifier': 'local', 'submit