# Deploying LLMs in Production

Ray Serve is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. This notebook shows examples of how to deploy a simple openai chain into production. 

In [1]:
from langchain.llms import OpenAI
from langchain import PromptTemplate, LLMChain

Install ray with `pip install ray[serve]`. 

In [None]:
from ray import serve
from starlette.requests import Request

The general skeleton for deploying a service is the following:

In [3]:
from ray import serve

# Deployment resources
deployment_resources = {}

@serve.deployment(**deployment_resources)
class LLMServe:

    def __init__(self) -> None:
        # All the initialization code goes here
        pass

    async def __call__(self, request: Request) -> str:
        # You can parse the request here
        # and return a response
        return "Hello World"

# Bind the model to the deployment
deployment = LLMServe.bind()

# Deployment options
deployment_options = {}

# Run the deployment
serve.api.run(deployment, **deployment_options)

Usage stats collection is enabled by default for nightly wheels. To disable this, run the following command: `ray disable-usage-stats` before starting Ray. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.


2023-05-03 07:30:14,524	INFO worker.py:1607 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m
[2m[36m(ServeController pid=23810)[0m INFO 2023-05-03 07:30:16,437 controller 23810 deployment_state.py:1168 - Deploying new version of deployment default_LLMServe.
[2m[36m(HTTPProxyActor pid=23811)[0m INFO:     Started server process [23811]
[2m[36m(ServeController pid=23810)[0m INFO 2023-05-03 07:30:16,506 controller 23810 deployment_state.py:1386 - Adding 1 replica to deployment default_LLMServe.


RayServeSyncHandle(deployment='default_LLMServe')

In [4]:
# Shutdown the deployment
serve.api.shutdown()

[2m[36m(ServeController pid=23810)[0m INFO 2023-05-03 07:30:17,482 controller 23810 deployment_state.py:1151 - Deleting deployment default_LLMServe.
[2m[36m(ServeController pid=23810)[0m INFO 2023-05-03 07:30:17,533 controller 23810 deployment_state.py:1412 - Removing 1 replica from deployment 'default_LLMServe'.


Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key.

In [5]:
from getpass import getpass
OPENAI_API_KEY = getpass()

In [6]:
@serve.deployment
class DeployLLM:

    def __init__(self):
        llm = OpenAI(openai_api_key=OPENAI_API_KEY)
        template = "Question: {question}\n\nAnswer: Let's think step by step."
        prompt = PromptTemplate(template=template, input_variables=["question"])
        self.chain = LLMChain(llm=llm, prompt=prompt)

    def _run_chain(self, text: str):
        return self.chain(text)

    async def __call__(self, request: Request):
        text = request.query_params["text"]
        resp = self._run_chain(text)
        return resp["text"]

In [7]:
deployment = DeployLLM.bind()

In [8]:
# Example port number
PORT_NUMBER = 8282
# Run the deployment
serve.api.run(deployment, port=PORT_NUMBER)

[2m[36m(ServeController pid=23833)[0m INFO 2023-05-03 07:30:25,693 controller 23833 deployment_state.py:1168 - Deploying new version of deployment default_DeployLLM.
[2m[36m(HTTPProxyActor pid=23838)[0m INFO:     Started server process [23838]
[2m[36m(ServeController pid=23833)[0m INFO 2023-05-03 07:30:25,782 controller 23833 deployment_state.py:1386 - Adding 1 replica to deployment default_DeployLLM.


RayServeSyncHandle(deployment='default_DeployLLM')

In [9]:
import requests

text = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')
print(response.content.decode())

 Justin Bieber was born in 1994, so the NFL team that won the Super Bowl that year was the Dallas Cowboys.


[2m[36m(ServeReplica:default_DeployLLM pid=23839)[0m INFO 2023-05-03 07:30:31,339 default_DeployLLM default_DeployLLM#lwdEnb mhenxcssHV / replica.py:527 - __CALL__ OK 3591.8ms
