## Preparing Models

Save and version trained ML model with BentoML's model store

### Save A Trained Model

A trained ML model instance needs to be saved with BentoML API, in order to serve it with BentoML. For most cases, it will be just one line added to your model training pipeline, invoking a save_model call

In [1]:
import bentoml

from sklearn import svm
from sklearn import datasets

# load training data set
iris = datasets.load_iris()
x, y = iris.data, iris.target

# train the model
clf = svm.SVC(gamma='scale')
clf.fit(x, y)

saved_model = bentoml.sklearn.save_model("iris_clf", clf)
print(f"Model saved: {saved_model}")

Model saved: Model(tag="iris_clf:deocjxszyw3cllg6")


We recommend **always save the model with BentoML as soon as it finished training and validation**. By putting the save_model call to the end of your training pipeline, all your finalized models can be managed in one place and ready for inference.

Optionally, you may attach custom labels, metadata, or custom_objects to be saved alongside your model in the model store, e.g.:

In [None]:
bentoml.pytorch.save_model(
    "demo_mnist",    # model name in the local model store
    trained_model,   # model instance being saved
    labels={         # user-defined labels for managing models in Yatai
        "owner": "nlp_team",
        "stage": "dev"
    },
    meetadata={    # user-defined additional metadata
        "acc": acc,
        "cv_stats": cv_stats,
        "dataset_version": "20220101",
    },
    custom_object={    # save additional user-defined python object
        "tokenizer": tokenizer_object,
    }
)

- labels: user-defined labels for managing models, e.g. team=nlp, stage=dev.

- metadata: user-defined metadata for storing model training context information or model evaluation metrics, e.g. dataset version, training parameters, model scores.

- custom_objects: user-defined additional python objects, e.g. a tokenizer instance, preprocessor function, model configuration json, serialized with cloudpickle. Custom objects will be serialized with cloudpickle.

### Retrieve a saved model

To load the model instance back into memory, use the framework-specific `load_model` method.

In [2]:
import bentoml
from sklearn.base import BaseEstimator

model : BaseEstimator = bentoml.sklearn.load_model("iris_clf:latest")

For retrieving the model metadata or custom objects, use the `get` method:

In [4]:
import bentoml
bento_model: bentoml.Model = bentoml.models.get("iris_clf:latest")

print(bento_model.tag)
print(bento_model.path)
print(bento_model.custom_objects)
print(bento_model.info.metadata)
print(bento_model.info.labels)

my_runner: bentoml.Runner = bento_model.to_runner()

iris_clf:deocjxszyw3cllg6
/Users/yjkim/bentoml/models/iris_clf/deocjxszyw3cllg6
{}
{}
{}


`bentoml.models.get` returns a bentoml.Model instance, which is a reference to a saved model entry in the BentoML model store. The bentoml.Model instance then provides access to the model info and the to_runner API for creating a Runner instance from the model.

### Managing Models

Saved models are stored in BentoML’s model store, which is a local file directory maintained by BentoML. Users can view and manage all saved models via the `bentoml models` CLI command:

List

```bash
> bentoml models list

Tag                        Module           Size        Creation Time        Path
iris_clf:2uo5fkgxj27exuqj  bentoml.sklearn  5.81 KiB    2022-05-19 08:36:52  ~/bentoml/models/iris_clf/2uo5fkgxj27exuqj
iris_clf:nb5vrfgwfgtjruqj  bentoml.sklearn  5.80 KiB    2022-05-17 21:36:27  ~/bentoml/models/iris_clf/nb5vrfgwfgtjruqj
```

Get
```bash
> bentoml models get iris_clf:latest

name: iris_clf
version: 2uo5fkgxj27exuqj
module: bentoml.sklearn
labels: {}
options: {}
metadata: {}
context:
    framework_name: sklearn
    framework_versions:
      scikit-learn: 1.1.0
    bentoml_version: 1.0.0
    python_version: 3.8.12
signatures:
    predict:
      batchable: false
api_version: v1
creation_time: '2022-05-19T08:36:52.456990+00:00'
```

Delete
```bash
> bentoml models delete iris_clf:latest -y

INFO [cli] Model(tag="iris_clf:2uo5fkgxj27exuqj") deleted
```

### Model Import and Export
Models saved with BentoML can be exported to a standalone archive file outside of the model store, for sharing models between teams or moving models between different build stages. For example:

```bash
> bentoml models export iris_clf:latest .

Model(tag="iris_clf:2uo5fkgxj27exuqj") exported to ./iris_clf-2uo5fkgxj27exuqj.bentomodel
```

```bash
 > bentoml models import ./iris_clf-2uo5fkgxj27exuqj.bentomodel

Model(tag="iris_clf:2uo5fkgxj27exuqj") imported
```

### Push and Pull with Yatai

**Yatai** provides a centralized Model repository that comes with flexible APIs and Web UI for managing all models (and **Bentos**) created by your team. It can be configured to store model files on cloud blob storage such as AWS S3, MinIO or GCS.

Once your team have Yatai setup, you can use the bentoml models push and bentoml models pull command to get models to and from Yatai:

```bash
> bentoml models push iris_clf:latest

Successfully pushed model "iris_clf:2uo5fkgxj27exuqj"                                       
```
```bash
> bentoml models pull iris_clf:latest

Successfully pulled model "iris_clf:2uo5fkgxj27exuqj"
```

### Model Management API
Besides the CLI commands, BentoML also provides equivalent **Python APIs** for managing models:

In [14]:
# Get
bento_model: bentoml.Model = bentoml.models.get('iris_clf:latest')

print(bento_model.path)
print(bento_model.info.metadata)
print(bento_model.info.labels)

/Users/yjkim/bentoml/models/iris_clf/deocjxszyw3cllg6
{}
{}


In [16]:
# List
models = bentoml.models.list()
print(models)

[Model(tag="iris_clf:deocjxszyw3cllg6", path="/Users/yjkim/bentoml/models/iris_clf/deocjxszyw3cllg6"), Model(tag="iris_clf:qfy5hzsrc6g5c6cp", path="/Users/yjkim/bentoml/models/iris_clf/qfy5hzsrc6g5c6cp")]


In [19]:
# Import/Export
path = '/Users/yjkim/bentoml/practice/'

# Export
bentoml.models.export_model('iris_clf:latest', path + 'iris.bentomodel')

# Import
bentoml.models.import_model(path + 'iris.bentomodel')

'/Users/yjkim/bentoml/practice/iris.bentomodel'

In [None]:
# Push/Pull
bentoml.models.push("iris_clf:latest")
bentoml.models.pull("iris_clf:latest")

In [None]:
# delete
bentoml.models.delete('iris_clf:latest')

### Using Model Runner

The way to run model inference in the context of a bentoml.Service, is via a Runner. The Runner abstraction gives BentoServer more flexibility in terms of how to schedule the inference computation, how to dynamically batch inference calls and better take advantage of all hardware resource available.

In [22]:
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

The runner instance can then be used for creating a bentoml.Service:

In [23]:
import numpy as np
from bentoml.io import NumpyNdarray

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    
    return result

To test out the runner interface before writing the Service API callback function, you can create a local runner instance outside of a Service:

In [24]:
# Create a Runner instance:
iris_clf_runner = bentoml.sklearn.get('iris_clf:latest').to_runner()

# Initialize the runner in current process, this is meant for development and testing only
iris_clf_runner.init_local()

# This should yield the same result as the loaded model
iris_clf_runner.predict.run([[5.9, 3., 5.1, 1.8]])

'Runner.init_local' is for debugging and testing only.


array([2])

### Model Signatures

A model signature represents a method on a model object that can be called. This information is used when creating BentoML runners for this model.

From the example above, the iris_clf_runner.predict.run call will pass through the function input to the model’s predict method, running from a remote runner process.

For many other ML frameworks, the model object’s inference method may not be called predict. Users can customize it by specifying the model signature during save_model:

In [None]:
 bentoml.pytorch.save_model(
     "demo_mnist",   # model name in the local model store
     trained_model,  # model instance being saved
     signatures={    # model signatures for runner inference
         "classify": {
             "batchable": False,
         }
     }
 )
    
runner = bentoml.pytorch.get("demo_mnist:latest").to_runner()
runner.init_local()
runner.classify.run( MODEL_INPUT )

A special case here is Python’s magic method `__call__`. Similar to the Python language convention, the call to runner.run will be applied to the model’s `__call__` method:

In [None]:
bentoml.pytorch.save_model(
    "demo_mnist",
    trained_model,
    signatures={
        "__call__": {
            "batchable": False
        },
    }
)

runner = bentoml.pytorch.get("demo_mnist:latest").to_runner()
runner.init_local()
runner.run( MODEL_INPUT )

### Batching

For model inference calls that supports taking a batch input, it is recommended to enable bathcing for the target model signature. In which case, `runner#run` calls made from multiple Service workers can be dynamically merged to a larger batch and run as one inference call in the runner worker. Here’s an example:

In [None]:
bentoml.pytorch.save_model(
    "demo_mnist",
    trained_model,
    signatures={
        "__call__": {
            "batchable": True,
            "batch_dim": 0,
        }
    }
)

runner = bentoml.pytorch.get("demo_mnist:latest").to_runner()
runner.init_local()
runner.run( MODEL_INPUT )

The `batch_dim` parameter determines the dimension(s) that contain multiple data when passing to this run method. The default `batch_dim`, when left unspecified, is 0.

For example, if you have two inputs you want to run prediction on, `[1, 2]` and `[3, 4]`, if the array you would pass to the predict method would be `[[1, 2], [3, 4]]`, then the batch dimension would be 0. If the array you would pass to the predict method would be `[[1, 3], [2, 4]]`, then the batch dimension would be 1. For example:

For online serving workloads, adaptive batching is a critical component that contributes to the overall performance. If throughput and latency are important to you, learn more about other Runner options and batching configurations in the Using Runners and Adaptive Batching doc.