# MLServer Quick-Start Guide

This guide will help you get started creating machine learning microservices with MLServer 
in about 10 minutes. Our use case is to create a service that helps us compare the similarity 
between two documents. Think about whenever you are comparing which book, news article, blog post, 
tutorial (not to sound meta) to read next, wouldn't it be great to have a way to compare with 
similar ones that you have already likes (without having to rely on a recommendation's system)? 
That's what we'll focus on this tutorial, a document similarity service. 📜 + 📃 = 😎👌🔥

## 01 Dependencies

The first step is to install `mlserver`, the `spacy` library, and the language model `spacy` will need 
for our use case. We will also download the Wikipedia API library to test our use case.

In [None]:
!pip install mlserver spacy wikipedia-api
!python -m spacy download en_core_web_lg

Note that the two commands above can be run in a notebook, hence the exclamation mark `!` at the beginning. If you 
are working from the command line, make sure you remove the `!`.

## 02 Set Up

![setup](../images/mlserver_setup.png)

At its core, MLServer requires that users give it 3 things, a `model-settings.json` file with 
information about the model, an (optional) `settings.json` file with information related to the server you 
are about to set up, and a `.py` file with the load-predict recipe for your model (as shown in the 
picture above). At a later step, whenever you are ready to package all of the components of your 
server into a docker image, you will also need to provide it with a `requirements.txt` file with 
containing dependencies of your server, but it is not necessary to have one to test your server 
locally. We'll get there in a few minutes.

Let's create a directory for our model.

In [None]:
!mkdir -p ../models_hub/quick-start/similarity_model

Before we create a service that allows us to compare the similarity between two documents (our use case 
for this tutorial), it is good practice to test that our solution works first, especially if (like in with 
our use case) we're using a pre-trained model and/or pipeline. To test our use case, we'll be using 
[`spacy`](https://spacy.io/), a natural language processing library built with production in mind, and 
then we'll move on to building a microservice with MLServer.

In [None]:
import spacy

In [None]:
nlp = spacy.load("en_core_web_lg")

Now that we have our model loaded, let's look at the similarity of the abstracts of [Barbieheimer](https://en.wikipedia.org/wiki/Barbenheimer) 
using the Wikipedia API to see how similar these two movies actually are.

To do this, we will be using the Wikipedia API Python library to find the summary for each 
of the movies. The main requirement of the API is that we pass in to main class, `Wikipedia()`, 
a project name, an email and the language we want information to be returned in. After that, 
we can search the for the movie summaries we want by passing the title of the movie to the 
`.page()` method and accessing the summary part with the `.summary` attribute.

Feel free to change the movies for other documents or topics you are interested in.

In [None]:
import wikipediaapi

In [None]:
wiki_wiki = wikipediaapi.Wikipedia('MyMovieEval (example@example.com)', 'en')

In [None]:
barbie = wiki_wiki.page('Barbie_(film)').summary
oppenheimer = wiki_wiki.page('Oppenheimer_(film)').summary

print(barbie)
print()
print(oppenheimer)

Now that we have our two summaries, let's compare them using spacy.

In [None]:
doc1 = nlp(barbie)
doc2 = nlp(oppenheimer)

In [None]:
doc1.similarity(doc2)

Notice that both summaries have information about the other movie, about "films" in general, 
and about the dates each aired (which is the same). The reality is that the model hasn't seen 
any of these movies so it might be generalizing to the context of each article, "movies," 
rather than their content, "dolls as humans and the atomic bomb."

You should play around with different pages and see if what you get back is coherent with 
what you would expect.

Time to create a machine learning API for our use-case. 😎

## 03 Building a Service

In the context of Software as a Service (SaaS), a "service" refers to a software application 
or platform that is delivered over the internet, typically through a web browser or mobile 
app. The service provides a set of features and functionality that can be accessed by users 
on demand, and MLServer allows us to do that for machine learning models by leveraging the 
functionalities of libraries such as `asyncio`, `multiprocessing`, and `FastAPI`, among others. 

A "client," on the other hand, refers to an You, Me, Us, individuals, or organization that 
use the SaaS service. Clients typically pay a subscription fee to access the service and use 
it for their own purposes, but we'll leave the commercial bit to you for after you complete 
this tutorial. 😎

To create a service with MLServer, we will define a class with two async functions, one that 
loads the data and another one to run inference (i.e. predict) with. The former will load the 
`spacy` model we tested in the last section, and the latter will take in a list with the two 
documents we want to compare. Lastly, our function will return a `numpy` array with a single 
value, our similarity score. We'll write the file to our `similarity_model` directory and call 
it `my_model.py`. 

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/my_model.py

from mlserver.codecs import decode_args
from mlserver import MLModel
from typing import List
import numpy as np
import spacy

class MyKulModel(MLModel):

    async def load(self):
        self.model = spacy.load("en_core_web_lg")
    
    @decode_args
    async def predict(self, docs: List[str]) -> np.ndarray:

        doc1 = self.model(docs[0])
        doc2 = self.model(docs[1])

        return np.array(doc1.similarity(doc2))

Now that we have our model up and running, the last piece of the puzzle is to tell MLServer a bit of info 
about the model. In particular, it wants (or needs) to know the name of the model and how to implement 
it. The former can be anything you want (and it will be part of the URL of your API), and the latter will 
follow the recipe of `name_of_py_file_with_your_model.class_with_your_model`.

Let's create the `model-settings.json` file MLServer is expecting inside our `similarity_model` directory 
and add the name and the implementation of our model to it.

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/model-settings.json

{
    "name": "doc-sim-model",
    "implementation": "my_model.MyKulModel"
}

Now that everything is in place, we can start serving predictions locally to test how things would play 
out for our future users. We'll initiate our server via the command line, and later on we'll see how to 
do the same via Python files. Here's where we are at right now in the process of developing microservices 
with MLServer.

![start](../images/start_service.png)

As you can see in the image, our server will be initialized with three entry points, one for HTTP requests, 
another for gRPC, and another for the metrics. To learn more about the powerful metrics feature of MLServer, 
please visit the relevant docs page [here](https://mlserver.readthedocs.io/en/latest/user-guide/metrics.html). 
To learn more about gRPC, please see this tutorial [here](https://realpython.com/python-microservices-grpc/).

To start our service, open up a terminal and run the following command.

```bash
mlserver start models_hub/quick-start/similarity_model/
```

Note: If this is a fresh terminal, make sure you activate your environment before you run the command above. 
If you run the command above from your notebook (e.g. `!mlserver start ../models_hub/quick-start/similarity_model/`), 
you will have to send the request below from another notebook or terminal since the cell will continue to run 
until you turn it off.

## 04 Testing our Service

Time to become a client of our service and test it. For this, we'll set up the payload we'll send 
to our service and use the `requests` library to [POST](https://www.baeldung.com/cs/http-get-vs-post) our request.

In [None]:
import requests

In [None]:
inference_request = {
    "inputs": [
        {
          "name": "docs",
          "shape": [2],
          "datatype": "BYTES",
          "parameters": {
              "content_type": "str"
            },
          "data": [barbie, oppenheimer]
        }
    ]
}

In [None]:
r = requests.post('http://0.0.0.0:8080/v2/models/doc-sim-model/infer', json=inference_request)

In [None]:
r.json()

In [None]:
print(f"Our movies are {round(r.json()['outputs'][0]['data'][0] * 100, 4)}% similar")

Let's decompose what just happened.

The `URL` for our service might seem a bit odd if you've never heard of the V2/Open Inference Protocol (OIP). This 
protocol is a set of specifications that allows machine learning models to be shared and deployed in a 
standardized way. This protocol enables the use of machine learning models on a variety of platforms and 
devices without requiring changes to the model or its code. The OIP is useful because it allows us
to integrate machine learning into a wide range of applications in a standard way.

All URLs you create will MLServer will have the same structure.

![v2](../images/urlv2.png)

This kind of protocol is neither good nor bad but rather a standard to keep everyone on the same page. If you 
think about driving globally, your country has to apply a standard for driving on a particular side of the 
road, and this ensures everyone stays on the left (or the right depending on where you are at). Adopting this 
means that you won't have to wonder where the next driver is going to come out when you go out to run an errand, 
instead, you can focus on getting to where you're going to without much worrying.

Let's describe what each of the components of our `inference_request` does.
- `name`: this maps one-to-one to the name of the parameter in your `predict()` function.
- `shape`: represents the shape of the elements in our `data`. In our case, it is a list with `[2]` strings.
- `datatype`: the different data types expected by the server, e.g., str, numpy array, pandas dataframe, bytes, etc.
- `parameters`: allows us to specify the `content_type` beyond the data types 
- `data`: the inputs to our predict function. These will be passed on automatically to the parameter 
when we use the `@decode_args` decorator on top of our `.predict()` function.

To learn more about the OIP and how MLServer content types work, please have a looks at their 
[docs page here](https://mlserver.readthedocs.io/en/latest/user-guide/content-type.html).

## 05 Creating Model Replicas

Say you need to meet the demand of a high number of users and one model might not be enough, or is not using 
all of the resources of the instance it was allocated on. What we can do in this case is to create multiple 
replicas of our model to increase the throughput of the requests that come in. This can be particularly useful 
at the peak times of our server. To do this we need to tweak the Settings of our server via the `settings.json` 
file. In it, we'll add the number of independent model we want to have to the parameter `"parallel_workers": 3`.

Let's stop our server, change the settings of it, start it again, and test it.

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/settings.json

{
    "parallel_workers": 3
}

![multiplemodels](../images/multiple_models.png)

As you can see in the output of the terminal, we now have 3 models running in parallel. The reason you might see 4 
is because, by default, MLServer will print the name of the initialized model if it is one or more, and it will also 
print one for each model replica specified in the settings.

Let's get a few more [twin films examples](https://en.wikipedia.org/wiki/Twin_films) to test our server. Get 
as creative as you'd like. 💡

In [None]:
deep_impact    = wiki_wiki.page('Deep_Impact_(film)').summary
armageddon     = wiki_wiki.page('Armageddon_(1998_film)').summary

antz           = wiki_wiki.page('Antz').summary
a_bugs_life    = wiki_wiki.page("A_Bug's_Life").summary

the_dark_night = wiki_wiki.page('The_Dark_Knight').summary
mamma_mia      = wiki_wiki.page('Mamma_Mia!_(film)').summary

In [None]:
def get_sim_score(movie1, movie2):
    response = requests.post(
        'http://0.0.0.0:8080/v2/models/doc-sim-model/infer', 
        json={
            "inputs": [
                {
                "name": "docs",
                "shape": [2],
                "datatype": "BYTES",
                "parameters": {
                    "content_type": "str"
                    },
                "data": [movie1, movie2]
                }
            ]
        })
    return response.json()['outputs'][0]['data'][0]

Let's first test that the function works as intended.

In [None]:
get_sim_score(deep_impact, armageddon)

Now let's map three POST requests at the same time.

In [None]:
results = list(
    map(get_sim_score, (deep_impact, antz, the_dark_night), (armageddon, a_bugs_life, mamma_mia))
)
results

In [None]:
for movie1, movie2 in zip((deep_impact, antz, the_dark_night), (armageddon, a_bugs_life, mamma_mia)):
    print(get_sim_score(movie1, movie2))

## 06 Packaging our Service

![serving3](../images/serving_2.png)

For the last step of this quick start guide, we are going to package our model and service into a 
docker image that we can reuse in another project, or share it with colleagues immediately. This step 
requires that we have docker installed and configured in our PCs, so if you need to set that up 
you can do so by following the documentation [here](https://docs.docker.com/get-docker/).

The first step is to create a `requirements.txt` file with all of our dependencies and add it to 
the directory we've been using for our service (`similarity_model`).

In [None]:
%%writefile ../models_hub/quick-start/similarity_model/requirements.txt

mlserver
spacy==3.6.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.6.0/en_core_web_lg-3.6.0-py3-none-any.whl

The next step is to build a docker image with our model, its dependencies and our server. If you've never heard 
of docker images before, here's a short description.

> A Docker image is a lightweight, standalone, and executable package that includes everything needed to run a piece of software, including code, libraries, dependencies, and settings. It's like a carry-on bag for your application, containing everything it needs to travel safely and run smoothly in different environments. Just as a carry-on bag allows you to bring your essentials with you on a trip, a Docker image enables you to transport your application and its requirements across various computing environments, ensuring consistent and reliable deployment.

In [None]:
!mlserver build ../models_hub/quick-start/similarity_model/ -t 'fancy_ml_service'

We can check that our image was successfully build not only by looking at the logs of the previous 
command but also with the `docker images` command.

In [None]:
!docker images

Let's test that our image works as intended with the following command. Make sure you have closed your 
previous server by using `CTRL + C` in your terminal.

```bash
docker run -it --rm -p 8080:8080 fancy_ml_service
```

Now that you have a packaged and fully-functioning microservice with our model, we could deploy it container 
via the diverse set of offerings available through different cloud providers (e.g. AWS Lambda, Google Cloud Run, 
ect.), on your company's Kubernetes cluster (if they have one up and running), or anywhere else where you 
can bring in a docker image with you to run in some virtual machine.

To learn more about MLServer and the different ways in which you can use it, head over to the 
[examples](https://mlserver.readthedocs.io/en/latest/examples/index.html) section 
or the [user guide](https://mlserver.readthedocs.io/en/latest/user-guide/index.html). To learn about 
some of the deployment options available, head over to the docs [here](https://mlserver.readthedocs.io/en/stable/user-guide/deployment/index.html).

To keep up to date with what we are up to at Seldon, make sure you join our 
[Slack community](https://join.slack.com/t/seldondev/shared_invite/zt-vejg6ttd-ksZiQs3O_HOtPQsen_labg).