<a href="https://colab.research.google.com/github/plaban1981/MLOPS_Tools/blob/main/AIbro_Inference_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Welcome to AIbro Inference Demo!**
In this demo, we will show how you can deploy an AI model in 2 minutes. All you need is a formatted ML model repo and an ML application scenario.

<img src="https://drive.google.com/uc?export=view&id=1Tp4w9bd3Yf3_e1gf1_CdY5aNwZ48mwvm" width="600" height="500" />

## Step 1: Install AIbro

In [1]:
!pip install aibro
!sudo apt-get -o Dpkg::Options::="--force-confmiss" install --reinstall netbase # this command is only needed if you meet error: "OSError: protocol not found". Colab is in this case.
!apt-get install python3.7-dev python3.7-venv # this command is only need for Colab

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting aibro
  Downloading aibro-1.1.5-py3-none-any.whl (33 kB)
Collecting attrs==21.2.0
  Downloading attrs-21.2.0-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.6 MB/s 
[?25hCollecting colorama==0.4.4
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting build==0.3.1.post1
  Downloading build-0.3.1.post1-py2.py3-none-any.whl (13 kB)
Collecting tqdm==4.60.0
  Downloading tqdm-4.60.0-py2.py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 4.8 MB/s 
[?25hCollecting websocket-client==0.58.0
  Downloading websocket_client-0.58.0-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 9.1 MB/s 
[?25hCollecting pytest-mock==3.6.1
  Downloading pytest_mock-3.6.1-py3-none-any.whl (12 kB)
Collecting args==0.1.0
  Downloading args-0.1.0.tar.gz (3.0 kB)
Collecting bleach==3.3.0
  Downloading bleach-3.

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  netbase
0 upgraded, 1 newly installed, 0 to remove and 49 not upgraded.
Need to get 12.7 kB of archives.
After this operation, 45.1 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 netbase all 5.4 [12.7 kB]
Fetched 12.7 kB in 0s (117 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: una

## Step 2: Prepare a formatted model repo

Source: [https://github.com/AIpaca-Inc/Aibro_examples](https://github.com/AIpaca-Inc/Aibro-examples).

The repo should be structured in the following format:

repo <br/>
&nbsp;&nbsp;&nbsp;&nbsp;|\_\_&nbsp;[predict.py](#predict-py)<br/>
&nbsp;&nbsp;&nbsp;&nbsp;|\_\_&nbsp;[model](#39-model-39-and-39-data-39-folders)<br/>
&nbsp;&nbsp;&nbsp;&nbsp;|\_\_&nbsp;[data](#39-model-39-and-39-data-39-folders)<br/>
&nbsp;&nbsp;&nbsp;&nbsp;|\_\_&nbsp;[requirement.txt](#requirement-txt)<br/>
&nbsp;&nbsp;&nbsp;&nbsp;|\_\_&nbsp;[other artifacts](#other-artifacts)<br/>

### **predict.py**

This is the entry point of AIbro.

predict.py should contain two methods:

1. _load_model()_: this method should load and return your machine learning model from the "model" folder. An transformer-based Portuguese to English translator is used in this example repo.

```python
def load_model():
    # Portuguese to English translator
    translator = tf.saved_model.load('model')
    return translator
```

2. _run()_: this method used model as the input, load data from the "data" folder, predict, then return the inference result.

```python
def run(model):
    fp = open("./data/data.json", "r")
    data = json.load(fp)
    sentence = data["data"]

    result = {"data": model(sentence).numpy().decode("utf-8")}
    return result
```

**test tip**: predict.py() should be able to return an inference result by:

```python
run(load_model())
```

### **"model" and "data" folders**

There is no format restriction on the "model" and "data" folder as long as the input and output of load_model() and run() from predict.py are correct.

### **requirement.txt**

Before start deploying the model, packages from requirement.txt are installed to setup the environment.

### **Other Artifacts**

All other files/folders.


In [2]:
!git clone https://github.com/AIpaca-Inc/Aibro-examples

Cloning into 'Aibro-examples'...
remote: Enumerating objects: 106, done.[K
remote: Counting objects: 100% (87/87), done.[K
remote: Compressing objects: 100% (61/61), done.[K
remote: Total 106 (delta 27), reused 76 (delta 17), pack-reused 19[K
Receiving objects: 100% (106/106), 20.93 MiB | 32.43 MiB/s, done.
Resolving deltas: 100% (28/28), done.


## Step 3: Test the Repo by Dryrun

Dryrun locally validates the repo structure and tests if inference result can be successfully returned.

In [28]:
from aibro.inference import Inference
Inference.deploy(
    "./Aibro-examples/tensorflow_transformer",
    dryrun=True,
)

[32mDRYRUN TEST: passed
[0m


'DRYRUN TEST: passed'

## Step 4: Create an inference API with one-line code
Assume the formatted model repo is saved at path "./aibro_repo", we can now use it to create an inference job. The model name should be unique respect to all current [active inference jobs](https://aipaca.ai/inference_jobs) under your profile.

In this example, we deployed a public custom model from "./aibro_repo" called "my_fancy_transformer" on machine type "c5.large.od" and used access token for authentication.

Once the deployment finished, an API URL is returned with the syntax: </br>

- **https://api.aipaca.ai/v1/{username}/{client_id}/{model_name}/predict** </br>

**{client_id}**: if your inference job is public, **{client_id}** is filled by "public". Otherwise, **{client_id}** should be filled by one of your [clients' ID](#add-clients).

In [29]:
from aibro.inference import Inference

In [30]:
api_url = Inference.deploy(
    model_name = "my_fancy_transformer",
    machine_id_config = "c5.large.od",
    artifacts_path = "./Aibro-examples/tensorflow_transformer",
    client_ids = [] # if no clients are specified, the inference job becomes public
)

Already authenticated!
Please open https://aipaca.ai/inference_jobs to track job status.
[LAUNCHING]: Starting public inference job: inf_8db4b49e-e5eb-47f7-aad6-79ab24841f37
[LAUNCHING]: Requesting {'standby': 'c5.large.od'} to be ready...
[LAUNCHING]: Started a standby c5.large.od.
[LAUNCHING]: c5.large.od server successfully requested, launching and building...
[LAUNCHING]: [ Getting server ready in around: 1 minute ]
Your {'standby': 'c5.large.od'} instances are now ready 🎉

[SENDING]: Serializing your artifacts...
[?25l

[SENDING]: |>>>>>>>>> | 98.56 % 18.72 / 19.00 MiB [avg: 3.2MiB/s]

[?25h

[SENDING]: |>>>>>>>>> | 98.57 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.58 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.58 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.59 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.60 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.60 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.61 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.62 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.62 % 18.73 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.63 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.63 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.64 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.65 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.65 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]: |>>>>>>>>> | 98.66 % 18.74 / 19.00 MiB [avg: 3.2MiB/s][SENDING]

[32mYour Inference API URL: http://api.aipaca.ai/v1/Plaban81/public/my_fancy_transformer/predict
[0m


In [32]:
Inference.list_clients("my_fancy_transformer")

Already authenticated!
[32mCurrent client ids: []
[0m


[]

In [33]:
api_uri

'http://api.aipaca.ai/v1/Plaban81/public/my_fancy_transformer/predict'

## How to Test an Aibro API

In [38]:
import requests
import json
 
review = {"data": "India"}
 
prediction = requests.post(
   "http://api.aipaca.ai/v1/Plaban81/public/my_fancy_transformer/predict",
   data=review,
)
 
result = prediction.text
 
print(result)

{'data': 'this is the first book i did .'}


## Step 5: Test a Aibro API with curl
Copy your API URL into `{{api_url}}`. For instance, my `api_url` is http://api.aipaca.ai/v1/yuqil725/public/my_fancy_transformer/predict

The syntax when using `curl` depends on the file type in the `data` folder.

| Data Type | syntax                                                                                                       |
| --------- | ------------------------------------------------------------------------------------------------------------ |
| json      | curl -X POST {{aibro url}} -d '{"your": "data"}'<br/>curl -X POST {{aibro url}} -F file=@'path/to/json/file' |
| txt       | curl -X POST {{aibro url}} -d 'your data'<br/>curl -X POST {{aibro url}} -F file=@'path/to/txt/file'         |
| csv       | curl -X POST {{aibro url}} -F file=@'path/to/csv/file'                                                       |
| others    | curl -X POST {{aibro url}} -F file=@'path/to/zip/file'                                                       |

You may have observed some patterns from the syntax lookup table above:

- If the data type is `json` or `txt`, you could use `-d` flag to post the string data directly.
- If the data type is one of `json`, `txt`, or `csv`, you could use `-F` flag to post the data file by path.
- If the data type is not one of `json`, `txt`, or `csv`, you could zip the entire `data` folder then post the data file by the zip path.

_Tips_: if your inference time is over one minute, it is recommended to either reduce the data size or increase the `--keepalive-time` value when using `curl`.

In [36]:
!curl -X POST {{api_url}} -d '{"data": "Olá"}'

curl: (3) [globbing] nested brace in column 2


## Step 6: Limit API Access to Specific Clients (Optional)

As the API owner, you probably don't receive overwhelming API requests from everywhere. To avoid this trouble, you could give every client an unique client id, which is going to used in API endpoint (as the shown syntax in the step 4). If no client id was added, this inference job would be public by default.

In [40]:
from aibro.inference import Inference
Inference.update_clients(
    job_id = "inf_ec49d03f-67ba-44e8-ac3c-c6bc81ca630c",
    add_client_ids = ["client_3", "client_4"]
)

Already authenticated!
Update client succeeded!
[32mCurrent client ids: ['client_1', 'client_2', 'client_3', 'client_4']
[0m


['client_1', 'client_2', 'client_3', 'client_4']

In [41]:
api_url

'http://api.aipaca.ai/v1/Plaban81/public/my_fancy_transformer/predict'

### Run a Private Prediction

You could fill in {client_id} by either "client_1" or "client_2" now. "public" is not going to work any more.

In [10]:
!curl -d '{"data": "Olá"}' -X POST {{api_url}}

curl: (3) [globbing] nested brace in column 2


In [None]:
!curl -d '{"data": "Olá"}' -X POST {{api_url}}

{"data":"hello , hello , hello ,"}


## Step 7: Complete Job

Once the inference job is no longer used, to avoid unnecessary cost, please remember to close it by `Inference.complete()`.

In [47]:
Inference.complete(job_id="inf_ec49d03f-67ba-44e8-ac3c-c6bc81ca630c")

Already authenticated!
Inference job inf_ec49d03f-67ba-44e8-ac3c-c6bc81ca630c, with model my_fancy_transformer, successfully completed.


In [48]:
Inference.complete(job_id="inf_8db4b49e-e5eb-47f7-aad6-79ab24841f37")

Already authenticated!
Inference job inf_8db4b49e-e5eb-47f7-aad6-79ab24841f37, with model my_fancy_transformer, successfully completed.


In [49]:
Inference.complete(job_id="inf_35a39b71-49d7-4561-87db-aca2ab5042c1")

Already authenticated!
Inference job inf_35a39b71-49d7-4561-87db-aca2ab5042c1, with model my_fancy_transformer, successfully completed.
