# Testing out Ray's ML Serving functionalities (model as a service) #
Getting started on ray-serve:
https://docs.ray.io/en/latest/serve/getting_started.html#composing-machine-learning-models-with-deployment-graphs

In [1]:
# File name: summary_model.py
from transformers import pipeline
import numpy as np
from starlette.requests import Request
import ray
from ray import serve

In [2]:
class LocalSummarizer:
    def __init__(self):
        # Load model
        self.model = pipeline("summarization", model="t5-small")

    def summarize(self, text: str) -> str:
        # Run inference
        model_output = self.model(text, min_length=5, max_length=15)

        # Post-process output to return only the summary text
        summary = model_output[0]["summary_text"]

        return summary


local_summarizer = LocalSummarizer()

summary = local_summarizer.summarize(
    "It was the best of times, it was the worst of times, it was the age "
    "of wisdom, it was the age of foolishness, it was the epoch of belief"
)
print(summary)

it was the best of times, it was worst of times .


### an example of ray deployment graph, see graph.py
next run the following command in terminal to start the service
edit: I modified and created a graph with just the summarization.
- serve run translation_graph:deployment_graph
- serve run summary_graph:deployment_graph

In [9]:
# File name: graph_client.py
import requests

english_text = (
    "It was the best of times, it was the worst of times, it was the age "
    "of wisdom, it was the age of foolishness, it was the epoch of belief"
)
response = requests.post("http://127.0.0.1:8000/", json=english_text)
response.text

'it was the best of times, it was worst of times .'

In [10]:
my_text = "As its name suggests, the primary interface to PyTorch is the Python programming language. While Python is a suitable and preferred language for many scenarios requiring dynamism and ease of iteration, there are equally many situations where precisely these properties of Python are unfavorable. One environment in which the latter often applies is production – the land of low latencies and strict deployment requirements. For production scenarios, C++ is very often the language of choice, even if only to bind it into another language like Java, Rust or Go. The following paragraphs will outline the path PyTorch provides to go from an existing Python model to a serialized representation that can be loaded and executed purely from C++, with no dependency on Python."
response = requests.post("http://127.0.0.1:8000/", json=my_text)
response.text

'the primary interface to PyTorch is the Python programming language '