## 2. Implement an Classifier service

Let’s jump right in and get a simple ML service up and running on Ray Serve. 

Recall the `MNISTClassifier` we built to perform batch inference on the `MNIST` dataset.

In [1]:
import numpy as np
import torch


class OfflineMNISTClassifier:
    def __init__(self, local_path: str):
        self.model = torch.jit.load(local_path)
        self.model.to("cuda")
        self.model.eval()

    def __call__(self, batch: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
        return self.predict(batch)
    
    def predict(self, batch: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
        images = torch.tensor(batch["image"]).float().to("cuda")

        with torch.no_grad():
            logits = self.model(images).cpu().numpy()

        batch["predicted_label"] = np.argmax(logits, axis=1)
        return batch

In [2]:
# We download the model from s3 to the EFS storage
!aws s3 cp s3://anyscale-public-materials/ray-ai-libraries/mnist/model/model.pt /mnt/cluster_storage/model.pt

download: s3://anyscale-public-materials/ray-ai-libraries/mnist/model/model.pt to ../../../mnt/cluster_storage/model.pt


Here is how we can use the `OfflineMNISTClassifier` to perform batch inference on a dataset of random images.

In [3]:
import ray

# Create a dataset of random images
ds = ray.data.from_items([{"image": np.random.rand(1, 28, 28)} for _ in range(100)])

# Map the OfflineMNISTClassifier to the dataset
ds = ds.map_batches(
    OfflineMNISTClassifier,
    fn_constructor_kwargs={"local_path": "/mnt/cluster_storage/model.pt"},
    concurrency=1,
    num_gpus=1,
    batch_size=10
)

# Take a look at the first 10 predictions
ds.take_batch(10)["predicted_label"]

2024-12-13 08:07:46,919	INFO worker.py:1596 -- Connecting to existing Ray cluster at address: 10.0.7.17:6379...
2024-12-13 08:07:46,925	INFO worker.py:1772 -- Connected to Ray cluster. View the dashboard at [1m[32mhttps://session-wun39fg7yb3g9682a8fejskwz3.i.anyscaleuserdata.com [39m[22m
2024-12-13 08:07:47,081	INFO packaging.py:358 -- Pushing file package 'gcs://_ray_pkg_2c93dc3dcf185c9c264569cb6910da7968e97c04.zip' (63.75MiB) to Ray cluster...
2024-12-13 08:07:47,789	INFO packaging.py:371 -- Successfully pushed file package 'gcs://_ray_pkg_2c93dc3dcf185c9c264569cb6910da7968e97c04.zip'.
2024-12-13 08:07:49,557	INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-13_08-07-01_151483_2430/logs/ray-data
2024-12-13 08:07:49,558	INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> ActorPoolMapOperator[MapBatches(OfflineMNISTClassifier)] -> LimitOperator[limit=10]


- MapBatches(OfflineMNISTClassifier) 1: 0 bundle [00:00, ? bundle/s]

- limit=10 2: 0 bundle [00:00, ? bundle/s]

Running 0: 0 bundle [00:00, ? bundle/s]

array([6, 1, 6, 6, 1, 1, 1, 6, 6, 1])

[36m(autoscaler +1m10s)[0m Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.


[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m INFO 2024-12-13 08:09:45,181 mnist_classifier_OnlineMNISTClassifier igfqf50m adbb3ada-000d-4fd7-9c4b-beeac31c59e4 replica.py:408 - PREDICT OK 98.8ms
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m ERROR 2024-12-13 08:13:24,247 mnist_classifier_OnlineMNISTClassifier igfqf50m 86bc4797-76d5-41f4-becf-f1e4e8c60f5f / replica.py:394 - Request failed:
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m [36mray::ServeReplica:mnist_classifier:OnlineMNISTClassifier.handle_request_with_rejection()[39m (pid=4242, ip=10.0.7.17)
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m   File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/serve/_private/utils.py", line 168, in wrap_to_ray_error
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m     raise exception
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m   File 

Now, if want to migrate to an online inference setting, we can transform this into a Ray Serve Deployment by applying the `@serve.deployment` decorator 


In [4]:
from typing import Any
from ray import serve
from starlette.requests import Request
import json



@serve.deployment() # this is the decorator to add
class OnlineMNISTClassifier:
    def __init__(self, local_path: str):
        self.model = torch.jit.load(local_path)
        self.model.to("cuda")
        self.model.eval()

    async def __call__(self, request: Request) -> dict[str, Any]: # __call__ now takes a Starlette Request object
        batch = json.loads(await request.json()) # we will need to parse the JSON body of the request
        return await self.predict(batch)
    
    async def predict(self, batch: dict[str, np.ndarray]) -> dict[str, np.ndarray]:
        # same code as OfflineMNISTClassifier.predict except we added async to the method
        images = torch.tensor(batch["image"]).float().to("cuda")

        with torch.no_grad():
            logits = self.model(images).cpu().numpy()

        batch["predicted_label"] = np.argmax(logits, axis=1)
        return batch



We can now instantiate the `OnlineMNISTClassifier` as a Ray Serve Application using `.bind`.

In [5]:
mnist_deployment = OnlineMNISTClassifier.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 1},
)

mnist_app = mnist_deployment.bind(local_path="/mnt/cluster_storage/model.pt")

<div class="alert alert-block alert-warning">

**Note:** `.bind` is a method that takes in the arguments to pass to the Deployment constructor.

</div>


We can then run the application 

In [6]:
mnist_deployment_handle = serve.run(mnist_app, name='mnist_classifier', blocking=False)

2024-12-13 08:08:11,879	INFO handle.py:126 -- Created DeploymentHandle 'i6mh0ukx' for Deployment(name='OnlineMNISTClassifier', app='mnist_classifier').
2024-12-13 08:08:11,880	INFO handle.py:126 -- Created DeploymentHandle '9cukvklw' for Deployment(name='OnlineMNISTClassifier', app='mnist_classifier').
2024-12-13 08:08:16,901	INFO handle.py:126 -- Created DeploymentHandle 'nbalzf4x' for Deployment(name='OnlineMNISTClassifier', app='mnist_classifier').
2024-12-13 08:08:16,902	INFO api.py:574 -- Deployed app 'mnist_classifier' successfully.


[36m(ProxyActor pid=4182)[0m INFO 2024-12-13 08:08:11,838 proxy 10.0.7.17 proxy.py:1225 - Proxy starting on node e741c55434a3ea7340f7710115b0bf60b65a7e5ca47446a7590b3497 (HTTP port: 8000).
[36m(ServeController pid=4122)[0m INFO 2024-12-13 08:08:11,946 controller 4122 deployment_state.py:1598 - Deploying new version of Deployment(name='OnlineMNISTClassifier', app='mnist_classifier') (initial target replicas: 1).
[36m(ServeController pid=4122)[0m INFO 2024-12-13 08:08:12,049 controller 4122 deployment_state.py:1844 - Adding 1 replica to Deployment(name='OnlineMNISTClassifier', app='mnist_classifier').
[36m(ServeReplica:mnist_classifier:OnlineMNISTClassifier pid=4242)[0m INFO 2024-12-13 08:08:21,242 mnist_classifier_OnlineMNISTClassifier igfqf50m ec22c448-6399-4720-8a80-07fe34ab2c9a / replica.py:408 - __CALL__ OK 505.0ms


We can test it as an HTTP endpoint

In [7]:
import json
import requests


images = np.random.rand(2, 1, 28, 28).tolist()
json_request = json.dumps({"image": images})
response = requests.post("http://localhost:8000/", json=json_request)
response.json()["predicted_label"]

[6, 6]

We can also test it as a gRPC endpoint

In [8]:
batch = {"image": np.random.rand(10, 1, 28, 28)}
response = await mnist_deployment_handle.predict.remote(batch)
response["predicted_label"]

2024-12-13 08:09:45,066	INFO handle.py:126 -- Created DeploymentHandle '84zvztwn' for Deployment(name='OnlineMNISTClassifier', app='mnist_classifier').
2024-12-13 08:09:45,077	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='OnlineMNISTClassifier', app='mnist_classifier'): {'igfqf50m'}.


array([1, 6, 1, 6, 1, 1, 1, 1, 1, 6])