(air-serving-guide)=

# Serve Ray AIR Predictor with `ModelWrapper`

[Ray Serve](rayserve) is the recommended tool deploy models trained with AIR.

The core concept is called `ModelWrapper`. `ModelWrapper` takes a [predictor](ray.ml.predictor.Predictor) class and a [checkpoint](ray.ml.checkpoint) and transforms them to live HTTP endpoint. 

We'll start with a simple quick start demo showcase where does ModelWrapper fits in Ray AIR.

Let's first make sure Ray AIR is installed. For the quick start, we'll also use Ray AIR to train and serve a very simple XGBoost model.

In [None]:
!pip install "ray[air]" xgboost scikit-learn

You can find the preprocessor and trainer in the [key concepts walk-through](air-key-concepts).

In [1]:
import ray
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from ray.ml.train.integrations.xgboost import XGBoostTrainer
from ray.ml.preprocessors import StandardScaler

data_raw = load_breast_cancer()
dataset_df = pd.DataFrame(data_raw["data"], columns=data_raw["feature_names"])
dataset_df["target"] = data_raw["target"]
train_df, test_df = train_test_split(dataset_df, test_size=0.3)
train_dataset = ray.data.from_pandas(train_df)
valid_dataset = ray.data.from_pandas(test_df)
test_dataset = ray.data.from_pandas(test_df.drop("target", axis=1))

# Define preprocessor
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)

# Define trainer
trainer = XGBoostTrainer(
    scaling_config={
        "num_workers": 1
    },
    label_column="target",
    params={
    "tree_method": "approx",
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "max_depth": 2,
},
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor,
    num_boost_round=5,
)

result = trainer.fit()

2022-06-02 18:44:27,520	INFO services.py:1483 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_b5f1b_00000,TERMINATED,127.0.0.1:54313,5,7.12531,0.191591,0.035176,0.220995


[2m[33m(raylet)[0m 2022-06-02 18:44:31,942	INFO context.py:70 -- Exec'ing worker with command: exec /Users/simonmo/miniconda3/bin/python /Users/simonmo/Desktop/ray/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=53715 --object-store-name=/tmp/ray/session_2022-06-02_18-44-24_587688_53629/sockets/plasma_store --raylet-name=/tmp/ray/session_2022-06-02_18-44-24_587688_53629/sockets/raylet --redis-address=None --storage=None --temp-dir=/tmp/ray --metrics-agent-port=64635 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:59662 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=2087164853
[2m[33m(raylet)[0m 2022-06-02 18:44:35,216	INFO context.py:70 -- Exec'ing worker with command: exec /Users/simonmo/miniconda3/bin/python /Users/simonmo/Desktop/ray/ray/python/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=53715 --object-store-name=/tmp/ray/session_2022-06-

Result for XGBoostTrainer_b5f1b_00000:
  date: 2022-06-02_18-44-41
  done: false
  experiment_id: 10970b9dd33c42daae667a98674efb6c
  hostname: Simons-MacBook-Pro.local
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 54313
  should_checkpoint: true
  time_since_restore: 7.0694568157196045
  time_this_iter_s: 7.0694568157196045
  time_total_s: 7.0694568157196045
  timestamp: 1654220681
  timesteps_since_restore: 0
  train-error: 0.050251
  train-logloss: 0.483589
  training_iteration: 1
  trial_id: b5f1b_00000
  valid-error: 0.070175
  valid-logloss: 0.497381
  warmup_time: 0.004261970520019531
  


[2m[36m(GBDTTrainable pid=54313)[0m 2022-06-02 18:44:41,273	INFO main.py:1506 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 4.28 seconds (1.10 pure XGBoost training time).


Result for XGBoostTrainer_b5f1b_00000:
  date: 2022-06-02_18-44-41
  done: true
  experiment_id: 10970b9dd33c42daae667a98674efb6c
  experiment_tag: '0'
  hostname: Simons-MacBook-Pro.local
  iterations_since_restore: 5
  node_ip: 127.0.0.1
  pid: 54313
  should_checkpoint: true
  time_since_restore: 7.125314712524414
  time_this_iter_s: 0.011556863784790039
  time_total_s: 7.125314712524414
  timestamp: 1654220681
  timesteps_since_restore: 0
  train-error: 0.035176
  train-logloss: 0.191591
  training_iteration: 5
  trial_id: b5f1b_00000
  valid-error: 0.05848
  valid-logloss: 0.220995
  warmup_time: 0.004261970520019531
  


2022-06-02 18:44:42,282	INFO tune.py:753 -- Total run time: 11.81 seconds (11.41 seconds for the tuning loop).


The following block serves a Ray AIR model from checkpoint, using built-in `XGBoostPredictor`.

In [2]:
from ray.ml.predictors.integrations.xgboost import XGBoostPredictor
from ray import serve
from ray.serve.model_wrappers import ModelWrapperDeployment
from ray.serve.http_adapters import pandas_read_json


serve.start(detached=True)
deployment = ModelWrapperDeployment.options(name="XGBoostService")

deployment.deploy(
    XGBoostPredictor, result.checkpoint, http_adapter=pandas_read_json
)

[2m[36m(ServeController pid=54360)[0m INFO 2022-06-02 18:45:18,255 controller 54360 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=54360)[0m INFO 2022-06-02 18:45:18,257 controller 54360 http_state.py:115 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
[2m[36m(HTTPProxyActor pid=54369)[0m INFO:     Started server process [54369]
[2m[36m(ServeController pid=54360)[0m INFO 2022-06-02 18:45:20,610 controller 54360 deployment_state.py:1217 - Adding 1 replicas to deployment 'XGBoostService'.


Let's send a request through HTTP.

In [3]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post(deployment.url, json=[sample_input]).json()
print(output)

[{'predictions': 0.8807213306427002}]


[2m[36m(HTTPProxyActor pid=54369)[0m INFO 2022-06-02 18:45:25,289 http_proxy 127.0.0.1 http_proxy.py:320 - POST /XGBoostService 307 3.9ms
[2m[36m(XGBoostService pid=54373)[0m INFO 2022-06-02 18:45:25,288 XGBoostService XGBoostService#yYNdzY replica.py:483 - HANDLE __call__ OK 0.3ms
[2m[36m(HTTPProxyActor pid=54369)[0m INFO 2022-06-02 18:45:25,332 http_proxy 127.0.0.1 http_proxy.py:320 - POST /XGBoostService 200 40.0ms
[2m[36m(XGBoostService pid=54373)[0m INFO 2022-06-02 18:45:25,330 XGBoostService XGBoostService#yYNdzY replica.py:483 - HANDLE __call__ OK 36.9ms



It works! As you can see, `ModelWrapper` is one of the core component in Ray AIR that deploy trained checkpoint as live endpoint. You can find more end-to-end examples with your specific frameworks in the [examples](air-examples-ref) page.

This tutorial is gear towards in depth understanding of ModelWrappers, in particular, we'll demonstrate:
- How to serve a predictor accepting array input.
- How to serve a predictor accepting dataframe input.
- How to serve a predictor accepting custom input that can be transformed to array or dataframe.
- How to configure micro-batching to enhance performance.

But before that, let's review the key concepts:
- [`Checkpoint`](ray.ml.checkpoint) represents a trained model stored in memory, file, or remote uri.
- [`Predictor`](ray.ml.predictor.Predictor)s understand how to perform a model inference given checkpoints and the model definition. Ray AIR comes with predictors for each supported frameworks. 
- [`Deployment`](serve-key-concepts-deployment) is a Ray Serve construct that represent an HTTP endpoint along with scalable pool of models.


## 1. Predictor accepting NumPy array
We'll use a simple predictor implementation that adds a scaler to input array.

In [1]:
import numpy as np

from ray.ml.predictor import Predictor
from ray.ml.checkpoint import Checkpoint

class AdderPredictor(Predictor):
    """Dummy predictor that increments input by a staic value."""
    def __init__(self, increment: int):
        self.increment = increment
    
    @classmethod
    def from_checkpoint(cls, ckpt: Checkpoint):
        """Create predictor from checkpoint.
        
        Args:
          ckpt: The AIR checkpoint representing a single dictionary. The dictionary
              should have key `increment` and an integer value.
        """
        return cls(ckpt.to_dict()["increment"])
    
    def predict(self, inp: np.ndarray) -> np.ndarray:
        return inp + self.increment

Let's first test it locally.

In [2]:
local_checkpoint = Checkpoint.from_dict({"increment": 2})
local_predictor = AdderPredictor.from_checkpoint(local_checkpoint)
assert local_predictor.predict(np.array([40])) == np.array([42])

It worked! Now let's serve it behind HTTP. In Ray Serve, the core unit of HTTP service is called a [`Deployment`](serve-key-concepts-deployment). It turns a Python class into queryable HTTP endpoint. For Ray AIR, Serve provides a `ModelWrapperDeployment` make it simpler. You don't need to implement any Python classes. You just pass in your predictor and checkpoint instead.

The deployment takes several arguments. It requires two arguments to start:
- `predictor_cls (Type[Predictor] | str)`: The predictor Python class. Typically you just need to use the builtin integration from Ray AIR like `TorchPredictor`. Alternatively, you can specify the class path to import such predictor like `"ray.ml.integrations.torch.TorchPredictor"`.
- `checkpoint (Checkpoint | str)`: A checkpoint instance, or uri to load checkpoint from.

The following cell showcase how to create a deployment with our `AdderPredictor`

For more about Ray Serve the framework, checkout [its documentation](rayserve).

In [3]:
from ray import serve
from ray.serve.model_wrappers import ModelWrapperDeployment

# Create Ray Serve instance
serve.start()

# Deploy the model behind HTTP endpoint
ModelWrapperDeployment.options(name="Adder").deploy(
    predictor_cls=AdderPredictor,
    checkpoint=local_checkpoint
)

2022-05-20 11:30:34,597	INFO services.py:1483 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:30:40,278 controller 66733 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:30:40,386 controller 66733 http_state.py:115 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:cjsRQe:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
[2m[36m(HTTPProxyActor pid=66737)[0m INFO:     Started server process [66737]
[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:30:42,814 controller 66733 deployment_state.py:1217 - Adding 1 replicas to deployment 'Adder'.


After the model has been deployed, let's send an HTTP request.

In [4]:
import requests
resp = requests.post("http://localhost:8000/Adder/", json={"array": [40]})
resp.raise_for_status()
resp.json()

[42.0]

[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:05,461 http_proxy 127.0.0.1 http_proxy.py:320 - POST /Adder 200 17.0ms
[2m[36m(Adder pid=66741)[0m INFO 2022-05-20 11:31:05,459 Adder Adder#vorDbO replica.py:483 - HANDLE __call__ OK 12.9ms


Nice! We sent `[40]` as our input and got `[42]` as our output in JSON format.

You can also specify multi-dimensional array in the JSON payload, as well as "dtype" and "shape" field to process to array. For more information about the array input schema, see [Ndarray](serve-ndarray-schema).
 
That's it for array! Let's take a look at tabular input.

## Predictor accepting Pandas DataFrame
Let's now take a look at a predictor accepting dataframe input. We'll perform some simple column wise transformation on the input data.

In [5]:
import pandas as pd


class DataFramePredictor(Predictor):
    """Dummy predictor that first multiplies input then increment it."""
    def __init__(self, increment: int):
        self.increment = increment

    @classmethod
    def from_checkpoint(cls, ckpt: Checkpoint):
        return cls(ckpt.to_dict()["increment"])

    def predict(self, inp: pd.DataFrame) -> pd.DataFrame:
        inp["prediction"] =  inp["base"] * inp["multiplier"] + self.increment
        return inp

local_df_predictor = DataFramePredictor.from_checkpoint(local_checkpoint)

Just like the `AdderPredictor`, we'll use the same `ModelWrapperDeployment` approach to make it queryable with HTTP. 

You might noticed a small addition this time. We added `http_adapter=pandas_read_json` as the keyword argument. This tells Serve how to parse incoming JSON request into a DataFrame. The `pandas_read_json` adapter accepts:
- [Pandas-parsable JSON](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) in HTTP body
- Optionally keyword arguments to [`pandas.read_json`](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) function via HTTP url parameters.

To learn more, see [HTTP Adapters](serve-http-adapters).

```{note}
You might wonders why does the previous array predictor doesn't need to specify any http adapter. This is because Serve default to a built-in adapter called `json_to_ndarray`(ray.serve.http_adapters.json_to_ndarray)!
```

In [6]:
from ray.serve.http_adapters import pandas_read_json

ModelWrapperDeployment.options(name="DataFramePredictor").deploy(
    predictor_cls=DataFramePredictor,
    checkpoint=local_checkpoint,
    http_adapter=pandas_read_json
)

[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:09,317 controller 66733 deployment_state.py:1217 - Adding 1 replicas to deployment 'DataFramePredictor'.


Let's send a request to our endpoint. 

In [7]:
resp = requests.post(
    "http://localhost:8000/DataFramePredictor/",
    json=[{"base": 1, "multiplier": 2}, {"base": 3, "multiplier": 4}],
    params={"orient": "records"},
)
resp.raise_for_status()
resp.text

'[{"base":1,"multipiler":2,"prediction":4},{"base":3,"multipiler":4,"prediction":14}]'

[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:15,388 http_proxy 127.0.0.1 http_proxy.py:320 - POST /DataFramePredictor 200 28.6ms
[2m[36m(DataFramePredictor pid=66765)[0m INFO 2022-05-20 11:31:15,387 DataFramePredictor DataFramePredictor#lcGDjS replica.py:483 - HANDLE __call__ OK 24.4ms


Great! You can see that the input JSON has been converted to a dataframe, so our predictor can work with pure dataframes instead of raw HTTP requests.

But what if we need to configure the HTTP request? You can do that as well.

## 3. Accepting custom inputs via `http_adapter`

The `http_adapter` field accept any callable function that's type annotated. You can also bring in additional types that's accepted by FastAPI's dependency injection framework. You can learn more detail [here](serve-http-adapters). In the following example, instead of using the pandas adapter Serve provides, we'll implement our own request adapter that work with just http parameters instead of JSON.

In [8]:
def our_own_http_adapter(base: int, multipiler: int):
    return pd.DataFrame([{"base": base, "multipiler": multipiler}])

Let's deploy it.

In [9]:
from ray.serve.http_adapters import pandas_read_json

ModelWrapperDeployment.options(name="DataFramePredictor").deploy(
    predictor_cls=DataFramePredictor,
    checkpoint=local_checkpoint,
    http_adapter=our_own_http_adapter
)

[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:18,916 controller 66733 deployment_state.py:1176 - Stopping 1 replicas of deployment 'DataFramePredictor' with outdated versions.
[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:21,084 controller 66733 deployment_state.py:1217 - Adding 1 replicas to deployment 'DataFramePredictor'.


Let's now send a request, note that the new predictor accepts our specified input via HTTP parameters. 

The equivalent curl request would be `curl -X POST http://localhost:8000/DataFramePredictor/?base=10&multiplier=4`.

In [10]:
resp = requests.post(
    "http://localhost:8000/DataFramePredictor/",
    params={"base": 10, "multipiler": 4}
)
resp.raise_for_status()
resp.text

'[{"base":10,"multipiler":4,"prediction":42}]'

[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:25,950 http_proxy 127.0.0.1 http_proxy.py:320 - POST /DataFramePredictor 200 24.8ms
[2m[36m(DataFramePredictor pid=66771)[0m INFO 2022-05-20 11:31:25,949 DataFramePredictor DataFramePredictor#abtzzn replica.py:483 - HANDLE __call__ OK 20.8ms


## 4. `ModelWrapper` performs microbatching to improve performance

Common machine learning models take a batch of inputs for prediction. Common ML Frameworks are optimized with vectorized instruction to make inference on a batch requests almost as fast as single requests. 

In Serve's `ModelWrapperDeployment`, the incoming requests are automatically batched. 

When multiple clients send requests at the same time, Serve will combine the requests into a single batch (array or dataframe) and predict is run only once for multiple requests. Let's take a look at a predictor that returns each row's content, batch_size, and batch group.

In [11]:
import time
class BatchSizePredictor(Predictor):
    @classmethod
    def from_checkpoint(cls, _: Checkpoint):
        return cls()
    
    def predict(self, inp: np.ndarray):
        time.sleep(0.5) # simulate model inference.
        return [(i, len(inp), inp) for i in inp]

In [12]:
ModelWrapperDeployment.options(name="BatchSizePredictor").deploy(
    predictor_cls=BatchSizePredictor,
    checkpoint=local_checkpoint,
)

[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:29,789 controller 66733 deployment_state.py:1217 - Adding 1 replicas to deployment 'BatchSizePredictor'.


Let's use a threadpool executor to send ten requests at the same time to simulate multiple clients.

In [13]:
from concurrent.futures import ThreadPoolExecutor, wait

with ThreadPoolExecutor() as pool:
    futs = [
        pool.submit(
            requests.post,
            "http://localhost:8000/BatchSizePredictor/",
            json={"array": [i]},
        )
        for i in range(10)
    ]
    wait(futs)
for fut in futs:
    i, batch_size, batch_group = fut.result().json()
    print(f"Request id: {i} is part of batch group: {batch_group}, with batch size {batch_size}")

[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:35,233 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 525.8ms
[2m[36m(BatchSizePredictor pid=66778)[0m INFO 2022-05-20 11:31:35,229 BatchSizePredictor BatchSizePredictor#pvZtwj replica.py:483 - HANDLE __call__ OK 520.0ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:35,742 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 1036.9ms
[2m[36m(BatchSizePredictor pid=66778)[0m INFO 2022-05-20 11:31:35,738 BatchSizePredictor BatchSizePredictor#pvZtwj replica.py:483 - HANDLE __call__ OK 1015.5ms
[2m[36m(BatchSizePredictor pid=66778)[0m INFO 2022-05-20 11:31:36,244 BatchSizePredictor BatchSizePredictor#pvZtwj replica.py:483 - HANDLE __call__ OK 1013.9ms
[2m[36m(BatchSizePredictor pid=66778)[0m INFO 2022-05-20 11:31:36,244 BatchSizePredictor BatchSizePredictor#pvZtwj replica.py:483 - HANDLE __call__ OK 1013.4ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-2

Request id: [0.0] is part of batch group: [[0.0]], with batch size 1
Request id: [1.0] is part of batch group: [[6.0], [1.0], [9.0], [5.0]], with batch size 4
Request id: [2.0] is part of batch group: [[4.0], [2.0]], with batch size 2
Request id: [3.0] is part of batch group: [[3.0]], with batch size 1
Request id: [4.0] is part of batch group: [[4.0], [2.0]], with batch size 2
Request id: [5.0] is part of batch group: [[6.0], [1.0], [9.0], [5.0]], with batch size 4
Request id: [6.0] is part of batch group: [[6.0], [1.0], [9.0], [5.0]], with batch size 4
Request id: [7.0] is part of batch group: [[8.0], [7.0]], with batch size 2
Request id: [8.0] is part of batch group: [[8.0], [7.0]], with batch size 2
Request id: [9.0] is part of batch group: [[6.0], [1.0], [9.0], [5.0]], with batch size 4


[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,256 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2541.8ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,257 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2549.0ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,257 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2541.7ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,258 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2542.5ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,258 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2541.8ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:37,259 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 2543.1ms
[2m[36m(BatchSizePredictor pid=66778)[0m INFO 2022-05-20 11:31:37,255 BatchSizePredictor BatchSizePredictor#p

As you can see, some of the requests are part of a bigger group that's run together.

You can also configure the exact details of batching parameters:
- `max_batch_size(int)`: the maximum batch size that will be executed in one call to predict.
- `batch_wait_timeout_s (float)`: the maximum duration to wait for `max_batch_size` elements before running the predict call.

Let's set a `max_batch_size` of 10 to make them into the same batch.

In [14]:
ModelWrapperDeployment.options(name="BatchSizePredictor").deploy(
    predictor_cls=BatchSizePredictor,
    checkpoint=local_checkpoint,
    batching_params={"max_batch_size": 10, "batch_wait_timeout_s": 5}
)

[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:40,048 controller 66733 deployment_state.py:1176 - Stopping 1 replicas of deployment 'BatchSizePredictor' with outdated versions.
[2m[36m(ServeController pid=66733)[0m INFO 2022-05-20 11:31:42,214 controller 66733 deployment_state.py:1217 - Adding 1 replicas to deployment 'BatchSizePredictor'.


Let's call them again! You should be able to see all ten requests are now part of the same group.

In [15]:
from concurrent.futures import ThreadPoolExecutor, wait

with ThreadPoolExecutor() as pool:
    futs = [
        pool.submit(
            requests.post,
            "http://localhost:8000/BatchSizePredictor/",
            json={"array": [i]},
        )
        for i in range(10)
    ]
    wait(futs)
for fut in futs:
    i, batch_size, batch_group = fut.result().json()
    print(f"Request id: {i} is part of batch group: {batch_group}, with batch size {batch_size}")

Request id: [0.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [1.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [2.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [3.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [4.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [5.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [6.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0], [2.0], [5.0], [7.0], [8.0], [4.0], [9.0]], with batch size 10
Request id: [7.0] is part of batch group: [[0.0], [3.0], [6.0], [1.0]

[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,562 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 540.4ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,563 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 533.7ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,564 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 527.2ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,564 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 531.9ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,564 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 529.9ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,564 http_proxy 127.0.0.1 http_proxy.py:320 - POST /BatchSizePredictor 200 526.7ms
[2m[36m(HTTPProxyActor pid=66737)[0m INFO 2022-05-20 11:31:47,565 http_proxy 127.0.0.1 http_proxy.py:320 - POST /Ba

The batching behavior is well defined:
- When batching arrays, they are all concatenated into a new array with batch dimension added.
- When batching dataframes, they are all concatenated row wise.

You can also turn off this behavior by setting `batching_params=False`.