(air-serving-guide)=

# Deploying Predictors with Serve

[Ray Serve](rayserve) is the recommended tool to deploy models trained with AIR.

After training a model with Ray Train, you can serve a model using Ray Serve. In this guide, we will cover how to use Ray AIR's `PredictorDeployment`, `Predictor`, and `Checkpoint` abstractions to quickly deploy a model for online inference.

But before that, let's review the key concepts:
- [`Checkpoint`](ray.air.checkpoint) represents a trained model stored in memory, file, or remote uri.
- [`Predictor`](ray.train.predictor.Predictor)s understand how to perform a model inference given checkpoints and the model definition. Ray AIR comes with predictors for each supported frameworks. 
- [`Deployment`](serve-key-concepts-deployment) is a Ray Serve construct that represent an HTTP endpoint along with scalable pool of models.

The core concept for model deployment is the `PredictorDeployment`. The `PredictorDeployment` takes a [predictor](ray.train.predictor.Predictor) class and a [checkpoint](ray.air.checkpoint) and transforms them into a live HTTP endpoint. 

We'll start with a simple quick-start demo showing how you can use the `PredictorDeployment` to deploy your model for online inference.

Let's first make sure Ray AIR is installed. For the quick-start, we'll also use Ray AIR to train and serve a XGBoost model.

In [None]:
!pip install "ray[air]" xgboost scikit-learn

You can find the preprocessor and trainer in the [key concepts walk-through](air-key-concepts).

In [1]:
import ray
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from ray.train.xgboost import XGBoostTrainer
from ray.air.config import ScalingConfig
from ray.data.preprocessors import StandardScaler

data_raw = load_breast_cancer()
dataset_df = pd.DataFrame(data_raw["data"], columns=data_raw["feature_names"])
dataset_df["target"] = data_raw["target"]
train_df, test_df = train_test_split(dataset_df, test_size=0.3)
train_dataset = ray.data.from_pandas(train_df)
valid_dataset = ray.data.from_pandas(test_df)
test_dataset = ray.data.from_pandas(test_df.drop("target", axis=1))

# Define preprocessor
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)

# Define trainer
trainer = XGBoostTrainer(
    scaling_config=ScalingConfig(num_workers=1),
    label_column="target",
    params={
    "tree_method": "approx",
    "objective": "binary:logistic",
    "eval_metric": ["logloss", "error"],
    "max_depth": 2,
},
    datasets={"train": train_dataset, "valid": valid_dataset},
    preprocessor=preprocessor,
    num_boost_round=5,
)

result = trainer.fit()

2022-08-02 17:34:14,200	INFO worker.py:1481 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m.


Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_00999_00000,TERMINATED,127.0.0.1:50560,6,4.92941,0.184756,0.0175879,0.214631


[2m[36m(XGBoostTrainer pid=50560)[0m 2022-08-02 17:34:17,955	INFO main.py:980 -- [RayXGBoost] Created 1 new actors (1 total actors). Waiting until actors are ready for training.
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m   File "/Users/cindyz/ray/python/ray/_private/workers/default_worker.py", line 237, in <module>
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m     ray._private.worker.global_worker.main_loop()
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m   File "/Users/cindyz/ray/python/ray/_private/worker.py", line 754, in main_loop
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m     self.core_worker.run_task_loop()
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m   File "/Users/cindyz/ray/python/ray/_private/function_manager.py", line 674, in actor_method_executor
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m     return method(__ray_actor, *args, **kwargs)
[2m[36m(_RemoteRayXGBoostActor pid=50569)[0m   File "/Users/cindyz/ray/python/ray/util/tracing/tracing_helper.py"

Result for XGBoostTrainer_00999_00000:
  date: 2022-08-02_17-34-21
  done: false
  experiment_id: 8cfa843c192c4f4899681672b1519889
  hostname: Cindys-MacBook-Pro-16
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 50560
  time_since_restore: 4.092304706573486
  time_this_iter_s: 4.092304706573486
  time_total_s: 4.092304706573486
  timestamp: 1659486861
  timesteps_since_restore: 0
  train-error: 0.0678391959798995
  train-logloss: 0.48957502931805713
  training_iteration: 1
  trial_id: 00999_00000
  valid-error: 0.05847953216374269
  valid-logloss: 0.48565153385463516
  warmup_time: 0.008213996887207031
  


[2m[36m(XGBoostTrainer pid=50560)[0m 2022-08-02 17:34:21,106	INFO main.py:1516 -- [RayXGBoost] Finished XGBoost training on training data with total N=398 in 3.17 seconds (1.66 pure XGBoost training time).


Result for XGBoostTrainer_00999_00000:
  date: 2022-08-02_17-34-21
  done: true
  experiment_id: 8cfa843c192c4f4899681672b1519889
  experiment_tag: '0'
  hostname: Cindys-MacBook-Pro-16
  iterations_since_restore: 6
  node_ip: 127.0.0.1
  pid: 50560
  time_since_restore: 4.9294068813323975
  time_this_iter_s: 0.77937912940979
  time_total_s: 4.9294068813323975
  timestamp: 1659486861
  timesteps_since_restore: 0
  train-error: 0.01758793969849246
  train-logloss: 0.1847562152124829
  training_iteration: 6
  trial_id: 00999_00000
  valid-error: 0.05847953216374269
  valid-logloss: 0.2146313324657797
  warmup_time: 0.008213996887207031
  


2022-08-02 17:34:21,996	INFO tune.py:758 -- Total run time: 6.87 seconds (6.45 seconds for the tuning loop).


The following block serves a Ray AIR model from a [checkpoint](ray.air.checkpoint), using the built-in [`XGBoostPredictor`](ray.train.xgboost.XGBoostPredictor).

In [2]:
from ray.train.xgboost import XGBoostPredictor
from ray import serve
from ray.serve import PredictorDeployment
from ray.serve.http_adapters import pandas_read_json

deployment = PredictorDeployment.options(name="XGBoostService")

app = deployment.bind(
    XGBoostPredictor, result.checkpoint, http_adapter=pandas_read_json
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:25,154 controller 50578 http_state.py:123 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-652b28d09d7147099a4960028c241a6ab6d4dfa630a4d2358a620705' on node '652b28d09d7147099a4960028c241a6ab6d4dfa630a4d2358a620705' listening on '127.0.0.1:8000'
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:25,670 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'XGBoostService'.
[2m[36m(HTTPProxyActor pid=50580)[0m INFO:     Started server process [50580]


RayServeSyncHandle(deployment='XGBoostService')

Let's send a request through HTTP.

In [3]:
import requests

sample_input = test_dataset.take(1)
sample_input = dict(sample_input[0])

output = requests.post("http://localhost:8000/", json=[sample_input]).json()
print(output)

[{'predictions': 0.11467155814170837}]


[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:34:33,084 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 17.0ms
[2m[36m(ServeReplica:XGBoostService pid=50583)[0m INFO 2022-08-02 17:34:33,082 XGBoostService XGBoostService#QOfQEr replica.py:482 - HANDLE __call__ OK 10.8ms



It works! As you can see, you can use the `PredictorDeployment` to deploy checkpoints trained in Ray AIR as live endpoints. You can find more end-to-end examples for your specific frameworks in the [examples](air-examples-ref) page.

This tutorial aims to provide an in-depth understanding of `PredictorDeployments`. In particular, it'll demonstrate:
- How to serve a predictor accepting array input.
- How to serve a predictor accepting dataframe input.
- How to serve a predictor accepting custom input that can be transformed to array or dataframe.
- How to configure micro-batching to enhance performance.



## 1. Predictor accepting NumPy array
We'll use a simple predictor implementation that adds an increment to an input array.

In [4]:
import numpy as np

from ray.train.predictor import Predictor
from ray.air.checkpoint import Checkpoint

class AdderPredictor(Predictor):
    """Dummy predictor that increments input by a staic value."""
    def __init__(self, increment: int):
        self.increment = increment
    
    @classmethod
    def from_checkpoint(cls, ckpt: Checkpoint):
        """Create predictor from checkpoint.
        
        Args:
          ckpt: The AIR checkpoint representing a single dictionary. The dictionary
              should have key `increment` and an integer value.
        """
        return cls(ckpt.to_dict()["increment"])
    
    def predict(self, inp: np.ndarray) -> np.ndarray:
        return inp + self.increment

Let's first test it locally.

In [5]:
local_checkpoint = Checkpoint.from_dict({"increment": 2})
local_predictor = AdderPredictor.from_checkpoint(local_checkpoint)
assert local_predictor.predict(np.array([40])) == np.array([42])

It worked! Now let's serve it behind HTTP. In Ray Serve, the core unit of an HTTP service is called a [`Deployment`](serve-key-concepts-deployment). It turns a Python class into a queryable HTTP endpoint. For Ray AIR, Serve provides a `PredictorDeployment` to simplify this transformation. You don't need to implement any Python classes. You just pass in your predictor and checkpoint instead.

The deployment takes several arguments. It requires two arguments to start:
- `predictor_cls (Type[Predictor] | str)`: The predictor Python class. Typically you can use built-in integrations from Ray AIR like the `TorchPredictor`. Alternatively, you can specify the class path to import a predictor like `"ray.air.integrations.torch.TorchPredictor"`.
- `checkpoint (Checkpoint | str)`: A checkpoint instance, or uri to load the checkpoint from.

The following cell showcases how to create a deployment with our `AdderPredictor`

To learn more about Ray Serve, check out [its documentation](rayserve).

In [6]:
from ray import serve
from ray.serve import PredictorDeployment

# Deploy the model behind HTTP endpoint
app = PredictorDeployment.options(name="Adder").bind(
    predictor_cls=AdderPredictor,
    checkpoint=local_checkpoint
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:39,865 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'Adder'.
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:40,892 controller 50578 deployment_state.py:1257 - Removing 1 replicas from deployment 'XGBoostService'.


RayServeSyncHandle(deployment='Adder')

After the model has been deployed, let's send an HTTP request.

In [7]:
import requests
resp = requests.post("http://localhost:8000/", json={"array": [40]})
resp.raise_for_status()
resp.json()

[42.0]

[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:34:47,445 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 10.4ms
[2m[36m(ServeReplica:Adder pid=50615)[0m INFO 2022-08-02 17:34:47,443 Adder Adder#YbCZvH replica.py:482 - HANDLE __call__ OK 4.5ms


Nice! We sent `[40]` as our input and got `[42]` as our output in JSON format.

You can also specify multi-dimensional arrays in the JSON payload, as well as "dtype" and "shape" fields to process to array. For more information about the array input schema, see [Ndarray](serve-ndarray-schema).
 
That's it for arrays! Let's take a look at tabular input.

## 2. Predictor accepting Pandas DataFrame
Let's now take a look at a predictor accepting dataframe inputs. We'll perform some simple column-wise transformations on the input data.

In [8]:
import pandas as pd


class DataFramePredictor(Predictor):
    """Dummy predictor that first multiplies input then increment it."""
    def __init__(self, increment: int):
        self.increment = increment

    @classmethod
    def from_checkpoint(cls, ckpt: Checkpoint):
        return cls(ckpt.to_dict()["increment"])

    def predict(self, inp: pd.DataFrame) -> pd.DataFrame:
        inp["prediction"] =  inp["base"] * inp["multiplier"] + self.increment
        return inp

local_df_predictor = DataFramePredictor.from_checkpoint(local_checkpoint)

Just like the `AdderPredictor`, we'll use the same `PredictorDeployment` approach to make it queryable with HTTP. 

Note that we added `http_adapter=pandas_read_json` as the keyword argument. This tells Serve how to convert incoming JSON requests into a DataFrame. The `pandas_read_json` adapter accepts:
- [Pandas-parsable JSON](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) in HTTP body
- Optional keyword arguments to the [`pandas.read_json`](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) function via HTTP url parameters.

To learn more, see [HTTP Adapters](serve-http-adapters).

```{note}
You might wonder why the previous array predictor doesn't need to specify any http adapter. This is because Ray Serve defaults to a built-in adapter called `json_to_ndarray`(ray.serve.http_adapters.json_to_ndarray)!
```

In [9]:
from ray.serve.http_adapters import pandas_read_json

app = PredictorDeployment.options(name="DataFramePredictor").bind(
    predictor_cls=DataFramePredictor,
    checkpoint=local_checkpoint,
    http_adapter=pandas_read_json
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:52,432 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'DataFramePredictor'.
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:34:53,460 controller 50578 deployment_state.py:1257 - Removing 1 replicas from deployment 'Adder'.


RayServeSyncHandle(deployment='DataFramePredictor')

Let's send a request to our endpoint. 

In [10]:
resp = requests.post(
    "http://localhost:8000/",
    json=[{"base": 1, "multiplier": 2}, {"base": 3, "multiplier": 4}],
    params={"orient": "records"},
)
resp.raise_for_status()
resp.text

'[{"base":1,"multiplier":2,"prediction":4},{"base":3,"multiplier":4,"prediction":14}]'

[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:34:58,470 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 10.7ms
[2m[36m(ServeReplica:DataFramePredictor pid=50682)[0m INFO 2022-08-02 17:34:58,467 DataFramePredictor DataFramePredictor#HZHIao replica.py:482 - HANDLE __call__ OK 4.9ms


Great! You can see that the input JSON has been converted to a dataframe, so our predictor can work with pure dataframes instead of raw HTTP requests.

But what if we need to configure the HTTP request? You can do that as well.

## 3. Accepting custom inputs via `http_adapter`

The `http_adapter` field accepts any callable function that's type annotated. You can also bring in additional types that are accepted by FastAPI's dependency injection framework. For more detail, see [HTTP Adapters](serve-http-adapters). In the following example, instead of using the pandas adapter Serve provides, we'll implement our own request adapter that works with just http parameters instead of JSON.

In [11]:
def our_own_http_adapter(base: int, multiplier: int):
    return pd.DataFrame([{"base": base, "multiplier": multiplier}])

Let's deploy it.

In [12]:
from ray.serve.http_adapters import pandas_read_json

app = PredictorDeployment.options(name="DataFramePredictor").bind(
    predictor_cls=DataFramePredictor,
    checkpoint=local_checkpoint,
    http_adapter=our_own_http_adapter
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:01,433 controller 50578 deployment_state.py:1189 - Stopping 1 replicas of deployment 'DataFramePredictor' with outdated versions.
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:03,585 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'DataFramePredictor'.


RayServeSyncHandle(deployment='DataFramePredictor')

Let's now send a request. Note that the new predictor accepts our specified input via HTTP parameters. 

The equivalent curl request would be `curl -X POST http://localhost:8000/DataFramePredictor/?base=10&multiplier=4`.

In [13]:
resp = requests.post(
    "http://localhost:8000/",
    params={"base": 10, "multiplier": 4}
)
resp.raise_for_status()
resp.text

'[{"base":10,"multiplier":4,"prediction":42}]'

[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:06,321 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 13.4ms
[2m[36m(ServeReplica:DataFramePredictor pid=50687)[0m INFO 2022-08-02 17:35:06,319 DataFramePredictor DataFramePredictor#qyLYRT replica.py:482 - HANDLE __call__ OK 7.0ms


## 4. `PredictorDeployment` performs microbatching to improve performance

Common machine learning models take a batch of inputs for prediction. Common ML Frameworks are optimized with vectorized instruction to make inference on batch requests almost as fast as single requests. 

In Serve's `PredictorDeployment`, the incoming requests are automatically batched. 

When multiple clients send requests at the same time, Serve will combine the requests into a single batch (array or dataframe). Then, Serve calls predict on the entire batch. Let's take a look at a predictor that returns each row's content, batch_size, and batch group.

In [14]:
import time
class BatchSizePredictor(Predictor):
    @classmethod
    def from_checkpoint(cls, _: Checkpoint):
        return cls()
    
    def predict(self, inp: np.ndarray):
        time.sleep(0.5) # simulate model inference.
        return [(i, len(inp), inp) for i in inp]

In [15]:
app = PredictorDeployment.options(name="BatchSizePredictor").bind(
    predictor_cls=BatchSizePredictor,
    checkpoint=local_checkpoint,
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:09,305 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'BatchSizePredictor'.
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:10,334 controller 50578 deployment_state.py:1257 - Removing 1 replicas from deployment 'DataFramePredictor'.


RayServeSyncHandle(deployment='BatchSizePredictor')

Let's use a threadpool executor to send ten requests at the same time to simulate multiple clients.

In [16]:
from concurrent.futures import ThreadPoolExecutor, wait

with ThreadPoolExecutor() as pool:
    futs = [
        pool.submit(
            requests.post,
            "http://localhost:8000/",
            json={"array": [i]},
        )
        for i in range(10)
    ]
    wait(futs)
for fut in futs:
    i, batch_size, batch_group = fut.result().json()
    print(f"Request id: {i} is part of batch group: {batch_group}, with batch size {batch_size}")

[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:13,858 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 530.1ms
[2m[36m(ServeReplica:BatchSizePredictor pid=50691)[0m INFO 2022-08-02 17:35:13,854 BatchSizePredictor BatchSizePredictor#fxLfdf replica.py:482 - HANDLE __call__ OK 518.4ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:14,361 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 1033.0ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:14,363 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 1036.3ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:14,364 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 1035.9ms
[2m[36m(ServeReplica:BatchSizePredictor pid=50691)[0m INFO 2022-08-02 17:35:14,358 BatchSizePredictor BatchSizePredictor#fxLfdf replica.py:482 - HANDLE __call__ OK 1018.3ms
[2m[36m(ServeReplica:BatchSizePredictor pid=50691)[0m INFO 2022-08-02 17:35:14,359 BatchSizePredictor BatchSizePredictor#fxLfdf

Request id: [0.0] is part of batch group: [[0.0], [3.0], [5.0]], with batch size 3
Request id: [1.0] is part of batch group: [[1.0], [7.0], [6.0], [4.0], [9.0]], with batch size 5
Request id: [2.0] is part of batch group: [[2.0]], with batch size 1
Request id: [3.0] is part of batch group: [[0.0], [3.0], [5.0]], with batch size 3
Request id: [4.0] is part of batch group: [[1.0], [7.0], [6.0], [4.0], [9.0]], with batch size 5
Request id: [5.0] is part of batch group: [[0.0], [3.0], [5.0]], with batch size 3
Request id: [6.0] is part of batch group: [[1.0], [7.0], [6.0], [4.0], [9.0]], with batch size 5
Request id: [7.0] is part of batch group: [[1.0], [7.0], [6.0], [4.0], [9.0]], with batch size 5
Request id: [8.0] is part of batch group: [[8.0]], with batch size 1
Request id: [9.0] is part of batch group: [[1.0], [7.0], [6.0], [4.0], [9.0]], with batch size 5


[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,375 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2046.9ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,377 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2048.3ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,378 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2049.6ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,379 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2050.9ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,381 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2051.0ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:15,382 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 2052.4ms
[2m[36m(ServeReplica:BatchSizePredictor pid=50691)[0m INFO 2022-08-02 17:35:15,373 BatchSizePredictor BatchSizePredictor#fxLfdf replica.py:482 - HANDLE __call__ OK 1517.6ms


As you can see, some of the requests are part of a bigger group that's run together.

You can also configure the exact details of batching parameters:
- `max_batch_size(int)`: the maximum batch size that will be executed in one call to predict.
- `batch_wait_timeout_s (float)`: the maximum duration to wait for `max_batch_size` elements before running the predict call.

Let's set a `max_batch_size` of 10 to group our requests into the same batch.

In [17]:
app = PredictorDeployment.options(name="BatchSizePredictor").bind(
    predictor_cls=BatchSizePredictor,
    checkpoint=local_checkpoint,
    batching_params={"max_batch_size": 10, "batch_wait_timeout_s": 5}
)
serve.run(app)

[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:19,427 controller 50578 deployment_state.py:1189 - Stopping 1 replicas of deployment 'BatchSizePredictor' with outdated versions.
[2m[36m(ServeController pid=50578)[0m INFO 2022-08-02 17:35:21,572 controller 50578 deployment_state.py:1232 - Adding 1 replicas to deployment 'BatchSizePredictor'.


RayServeSyncHandle(deployment='BatchSizePredictor')

Let's call them again! You should see all ten requests executed as part of the same group.

In [18]:
from concurrent.futures import ThreadPoolExecutor, wait

with ThreadPoolExecutor() as pool:
    futs = [
        pool.submit(
            requests.post,
            "http://localhost:8000/",
            json={"array": [i]},
        )
        for i in range(10)
    ]
    wait(futs)
for fut in futs:
    i, batch_size, batch_group = fut.result().json()
    print(f"Request id: {i} is part of batch group: {batch_group}, with batch size {batch_size}")

Request id: [0.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [1.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [2.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [3.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [4.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [5.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [6.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0], [4.0], [6.0], [7.0], [5.0], [9.0], [8.0]], with batch size 10
Request id: [7.0] is part of batch group: [[0.0], [2.0], [1.0], [3.0]

[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,203 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 532.2ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,206 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 533.5ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,208 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 536.4ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,210 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 536.2ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,211 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 537.7ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,213 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 538.1ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,216 http_proxy 127.0.0.1 http_proxy.py:315 - POST / 200 540.0ms
[2m[36m(HTTPProxyActor pid=50580)[0m INFO 2022-08-02 17:35:24,217 http_proxy 127.0.0.1 http_pr

The batching behavior is well-defined:
- When batching arrays, they are all concatenated into a new array with an added batch dimension.
- When batching dataframes, they are all concatenated row-wise.

You can also turn off this behavior by setting `batching_params=False`.