# Using Runners

In BentoML, Runner represents a unit of computation that can be executed on a remote Python worker and scales independently.

Runner allows bentoml.Service to parallelize multiple instances of a bentoml.Runnable class, each on its own Python worker. When a BentoServer is launched, a group of runner worker processes will be created, and `run` method calls made from the `bentoml.Service` code will be scheduled among those runner workers.

Runner also supports Adaptive Batching. For a bentoml.Runnable configured with batching, multiple run method invocations made from other processes can be dynamically grouped into one batch execution in real-time. This is especially beneficial for compute intensive workloads such as model inference, helps to bring better performance through vectorization or multi-threading.



## Pre-built Model Runners

BentoML provides pre-built Runners implemented for each ML framework supported. These pre-built runners are carefully configured to work well with each specific ML framework. They handle working with GPU when GPU is available, set the number of threads and number of workers automatically, and convert the model signatures to corresponding Runnable methods.

```python
trained_model = train()

bentoml.pytorch.save_model(
    "demo_mnist",    # model name in the local model store
    trained_model,   # model instance being saved
    signatures={     # model signatures for runner inference
        "predict": {
            "batchable": True,
            "batch_dim": 0,
        }
    }
)

runner = bentoml.pytorch.get("demo_mnist:lateset").to_runner()
runner.init_local()
runner.predict.run( MODEL_INPUT )
```

## Custom Runner

### Creating a Runnable

Runner can be created from a bentoml.Runnable class. By implementing a `Runnable` class, users can create Runner instances that runs custom logic. 

```python
import time
import typing as t
from typing import TYPE_CHECKING
from statistics import mean

import nltk
from utils import exponential_buckets
from nltk.sentiment import SentimentIntensityAnalyzer

import bentoml
from bentoml.io import JSON
from bentoml.io import Text

if TYPE_CHECKING:
    from bentoml._internal.runner.runner import RunnerMethod
    
    class RunnerImpl(bentoml.Runner):
        is_positive: RunnerMethod

inference_duration = bentoml.metrics.Histogram(
    name="inference_duration",
    documentation='Duration of inference',
    labelnames=['nltk_version', 'sentiment_cls'],
    bukects=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75,
             1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),
)

polarity_counter = bentoml.metrics.Counter(
    name='polarity_total',
    documentation='Count total number of analysis by polarity scores',
    labelnames=['polarity']
)

class NLTKSentimentAnalysisRunnable(bentoml.Runnable):
    SUPPORTED_RESOURCES = ("cpu",)
    SUPPORTS_CPU_MULTI_THREADING = False

    def __init__(self):
        self.sia = SentimentIntensityAnalyzer()

    @bentoml.Runnable.method(batchable=False)
    def is_positive(self, input_text: str) -> bool:
        start = time.perf_counter()
        scores = [self.sia.polarity_scores(sentence)["compound"] for sentence 
                  in nltk.sent_tokenize(input_text)]
        inference_duration.labels(
            nltk_version=nltk.__version__, 
            sentiment_cls=self.sia.__class__.__name__
        ).observe(time.perf_counter() - start)
        
        return mean(scores) > 0

nltk_runner = t.cast(
    "RunnerImpl", bentoml.Runner(NLTKSentimentAnalysisRunnable, name="nltk_sentiment")
)

svc = bentoml.Service("sentiment_analyzer", runners=[nltk_runner])

@svc.api(input=Text(), output=JSON())
async def analysis(input_text: str) -> dict[str, bool]:
    is_positive = await nltk_runner.is_positive.async_run(input_text)
    polarity_counter.labels(polarity=is_positive).inc()
    return {"is_positive": is_positive}
```

The `bentoml.Runnable.method` decorator is used for creating `RunnableMethod`- the decorated method will be exposed as the runner interface for accessing remotely. RunnableMethod can be configured with a signature, which is defined same as the Model Signatures.

### Reusable Runnable
Runnable class can also take `__init__` parameters to customize its behavior for different scenarios. The same Runnable class can also be used to create multiple runners and used in the same service

```python
import bentoml
import torch

class MyModelRunnable(bentoml.Runnable):
    SUPPORTED_RESOURCES = ("nvidia.com/gpu",)
    SUPPORTS_CPU_MULTI_THREADING = True
    
    def __init__(self, model_file):
        self.model = torch.load_model(model_file)
        
    @bentoml.Runnable.method(batchable=True, batch_dim=0)
    def predict(self, input_tensor):
        return self.model(input_tensor)
    
my_runner_1 = bentoml.Runner(
    MyModelRunnable,
    name='mu_runner_1',
    runnable_init_params={
        "model_file": "./saved_model_1.pt",
    }
)

my_runner_2 = bentoml.Runner(
    MyModelRunnable,
    name='mu_runner_2',
    runnable_init_params={
        "model_file": "./saved_model_2.pt",
    }
)

svc = bentoml.Service(__name__, runners=[mu_runner_1, mu_runner_2])
```