# MLServer Quick-Start Guide

This guide will help you get started creating machine learning microservices with MLServer 
in about 10 minutes.

## Installation

The first step is to install `mlserver`, the `spacy` library and the language model it will use.

In [None]:
!pip install mlserver spacy wikipedia-api
!python -m spacy download en_core_web_lg

## Set Up

![setup](../images/mlserver_setup.png)

At its core, MLServer requires that users give it 3 things, a `model-settings.json` file with 
information about the model, a `settings.json` file with information related to the server you 
are about to set up, and a `.py` with the load-predict recipe for your model (as shown in the 
picture above). At a later step, whenever you are ready to package all of the components of your 
server, you will also need to provide it with a `requirements.txt` file with the dependencies 
of your server.

Let's create a directory for our model.

In [None]:
!mkdir -p ../models_hub/quick-start/similarity_model

Next, we'll create a service that allows us to compare the similarity between two documents. Before 
we do so, let's develop our use case using `spacy`, a natural language processing library built with 
production in mind.

In [None]:
import spacy

In [None]:
nlp = spacy.load("en_core_web_lg")

Now that we have our model loaded, let's look at the similarity of the abstracts of (Barbieheimer)[https://en.wikipedia.org/wiki/Barbenheimer] 
using the Wikipedia API to see how similar these two actually are.

In [None]:
import wikipediaapi

In [None]:
wiki_wiki = wikipediaapi.Wikipedia('MyMovieEval (example@example.com)', 'en')

In [None]:
barbie = wiki_wiki.page('Barbie_(film)').summary
oppenheimer = wiki_wiki.page('Oppenheimer_(film)').summary

print(barbie)
print()
print(oppenheimer)

Now that we have our two summaries, let's compare them using spacy.

In [None]:
doc1 = nlp(barbie)
doc2 = nlp(oppenheimer)

In [None]:
doc1.similarity(doc2)

Notice that both summaries have information about the other movie, about "films" in general, 
and about the dates each aired (which is the same). The reality is that the model hasn't seen 
any of these movies so it might be generalizing to the context of each article, "movies," 
rather than their content, "dolls as humans and the atomic bomb."

You should play around with different pages and see if what you get back is coherent with 
what you would expect.

Time to create a machine learning API for our use-case. 😎

## Building a Service

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/my_model.py

from mlserver.codecs import decode_args
from mlserver import MLModel
from typing import List
import numpy as np
import spacy

class MyKulModel(MLModel):

    async def load(self):
        self.model = spacy.load("en_core_web_lg")
    
    @decode_args
    async def predict(self, docs: List[str]) -> np.ndarray:

        doc1 = self.model(docs[0])
        doc2 = self.model(docs[1])

        return np.array(doc1.similarity(doc2))

Now that we have our model up and running, the last piece of the puzzle is to tell MLServer a bit of info 
about the model. In particular, it wants to know its name, and how to implement it. The former can be anything 
you want (and it will be part of the URL of your API), and the latter will be `name_of_py_file.class_with_your_model`. Let's 
add this file to the directory with out model.

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/model-settings.json

{
    "name": "doc-sim-model",
    "implementation": "my_model.MyKulModel"
}

Now that everything is in place, we can start serving predictions locally to test how things would play 
out for our future users. We'll initiate our server via the command line, and later on we'll see how to 
do the same via Python files. Here's where we are at right now in the process of developing microservices 
with MLServer.

![start](../images/start_service.png)

Open up a terminal and run the following command.
```bash
mlserver start models_hub/quick-start/similarity_model/
```

Note: Make sure to activate your environment before you run the command above. If you run the command above 
from your notebook (e.g. `!mlserver start ../models_hub/quick-start/similarity_model/`), you will have to send 
the request below from another notebook or terminal.

## Testing our Service

Time to test our service. We'll use the `requests` library and set up the payload we'll send to our service.

In [None]:
import requests

In [None]:
inference_request = {
    "inputs": [
        {
          "name": "docs",
          "shape": [2],
          "datatype": "BYTES",
          "parameters": {
              "content_type": "str"
            },
          "data": [barbie, oppenheimer]
        }
    ]
}

In [None]:
r = requests.post('http://0.0.0.0:8080/v2/models/doc-sim-model/infer', json=inference_request)

In [None]:
r.json()

In [None]:
round(r.json()['outputs'][0]['data'][0] * 100, 4)

Let's decompose what just happened.

The `URL` for our service might seem a bit odd if you've never heard of the V2/Open Inference Protocol. This 
protocol is a set of specifications that allows machine learning models to be shared and deployed in a 
standardized way. This protocol enables the use of machine learning models on a variety of platforms and 
devices without requiring changes to the model or its code. The V2 protocol is useful because it allows for 
faster and more efficient deployment of machine learning models, making it easier to integrate machine 
learning into a wide range of applications.

All URLs you create will MLServer will have the same structure.

![v2](../images/urlv2.png)

This kind of protocol is neither good nor bad but rather a standard to keep everyone on the same page. If you 
think about driving globally, your country has to apply a standard for driving on a particular side of the 
street, and this ensures everyone stays on the left (or the right depending on where you are at). What this does 
for you and the rest of the people around you, is that it saves you from having to wonder where the next driver 
is going to come out from and, instead, it lets's you focus getting to where you're going to without much worrying.

Lastly, the `inference_request` we created follows the OIP, and as such, we need to build the request in a specific format.
- `name`:
- `shape`:
- `datatype`:
- `parameters`:
- `data`:

## Creating Model Replicas

Say you need to meet the demand of a high number of users and one model is will not suffice, or is not using 
all of the resources of the instance it was allocated on. What we can do in this case is to create multiple 
replicas of our model to increase its throughput. To do so we need to turn to the `settings.json` file and 
add the number of independent model we want to have to the parameter `parallel_workers=3`.

In [None]:
%%write

## Packaging our Service

![](../images/serving_2.png)

For the next step we are going to package our model and service into a docker image that we can reuse or
share with colleagues. This step requires that you have docker installed and running in your PC, so if you 
need to set that up you can do so here.

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/requirements.txt

mlserver
spacy==3.6.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.6.0/en_core_web_lg-3.6.0-py3-none-any.whl

In [None]:
!mlserver build ../models_hub/quick-start/similarity_model/ -t 'fancy_ml_service'

In [None]:
!docker images

Let's test that our image works as intended with the following command. Make sure you have closed your 
previous server by using `CTRL + C` in your terminal.

```bash
docker run -it --rm -p 8080:8080 fancy_ml_service
```

Now that you have a fully functioning set of microservices, you can deploy your container via the diverse 
set of offerings available through different cloud providers, on your company's Kubernetes cluster if they 
have one up and running, or anywhere else where you can bring in a docker image to.

To learn more about MLServer and the different ways in which you can use it, head over to the examples section 
or the user guide, and to keep up to date with what we are up to at Seldon, make sure you join our Slack community.