(streamlit-serve-tutorial)=

# Building a Streamlit app with Ray Serve

In this example, we will show you how to wrap a machine learning model served
by Ray Serve in a [Streamlit application](https://streamlit.io/).

Specifically, we're going to download a GPT-2 model from the `transformer` library,
define a Ray Serve deployment with it, and then define and launch a Streamlit app.
Let's take a look.

In [1]:
# Install all dependencies for this example.
! pip install ray streamlit transformers requests

Collecting transformers
  Using cached transformers-4.18.0-py3-none-any.whl (4.0 MB)
Collecting regex!=2019.12.17
  Downloading regex-2022.4.24-cp37-cp37m-macosx_10_9_x86_64.whl (289 kB)
[K     |████████████████████████████████| 289 kB 1.7 MB/s eta 0:00:01
Collecting sacremoses
  Using cached sacremoses-0.0.49-py3-none-any.whl (895 kB)
Collecting huggingface-hub<1.0,>=0.1.0
  Using cached huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-macosx_10_11_x86_64.whl (3.6 MB)
[K     |████████████████████████████████| 3.6 MB 2.9 MB/s eta 0:00:01




Installing collected packages: regex, tokenizers, sacremoses, huggingface-hub, transformers
Successfully installed huggingface-hub-0.5.1 regex-2022.4.24 sacremoses-0.0.49 tokenizers-0.12.1 transformers-4.18.0
You should consider upgrading via the '/Users/maxpumperla/code/anyscale/ray/doc/venv/bin/python -m pip install --upgrade pip' command.[0m


## Deploying a model with Ray Serve

To start off, we import Ray Serve, Streamlit, the `transformers` and `requests` libraries:

In [6]:
import streamlit as st
from ray import serve
from transformers import pipeline
import requests


2022-04-26 11:46:08,662	INFO api.py:738 -- Connecting to existing Serve instance in namespace 'serve'.


<ray.serve.api.Client at 0x129ab7c90>

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:08,593	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:08,697	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:08,801	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:08,905	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:09,010	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:09,114	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:13,673	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:13,779	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:13,886	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:13,992	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:14,097	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:14,204	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:18,765	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:18,872	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:18,978	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:19,083	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:19,188	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:19,293	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:23,837	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:23,942	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:24,049	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:24,154	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:24,260	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:24,367	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:28,947	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:29,054	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:29,161	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:29,267	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:29,371	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:46:29,475	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

Next, we define a Ray Serve deployment with a GPT-2 model, by using the `@serve.deployment` decorator on a `model`
function that takes a `request` argument.
In this function we define a GPT-2 model with a call to `pipeline` and return the result of querying the model.
Before defining the deployment, we start Ray Serve using `serve.start()`, and then proceed to deploy the model
with `model.deploy()`.

In [3]:
if 'model' not in st.session_state:
    serve.start()

    @serve.deployment
    def model(request):
        language_model = pipeline("text-generation", model="gpt2")
        query = request.query_params["query"]
        return language_model(query, max_length=100)

    model.deploy()
    st.session_state['model'] = True

Note that we're using Streamlit's `session_state` to make sure the deployment only gets run once.
If we didn't use such a mechanism, Streamlit would simply run the whole script again, which is not what we want.

To test this deployment we use a simple `example` query to get a `response` from the model running
on `localhost:8000/model`.
The first time you use this endpoint, the model will be downloaded first, which can take a while to complete.
Subsequent calls will be faster.

In [4]:
example = "What's the meaning of life?"
response = requests.get(f"http://localhost:8000/model?query={example}")
print(response.text)

2022-04-26 11:45:26,738	INFO api.py:615 -- Updating deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:26,802	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:26,909	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:27,014	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:27,119	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:27,225	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529

RuntimeError: Deployment model is UNHEALTHY: Failed to update deployment:
No module named 'aiorwlock'.

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,171	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,274	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,379	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,483	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,588	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:28,693	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,258	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,363	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,468	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,576	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,684	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:33,791	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,345	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,451	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,558	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,662	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,769	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:38,874	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,429	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,536	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,643	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,748	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,856	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:43,965	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:48,559	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:48,665	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:48,770	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:48,874	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:48,982	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:49,089	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

## Defining and launching a Streamlit app

To define a streamlit app, let's first create a convenient wrapper that takes a `query` argument and returns
the result of querying the GPT model.

In [5]:
def gpt2(query):
    response = requests.get(f"http://localhost:8000/model?query={query}")
    return response.json()[0]["generated_text"]

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,245	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,350	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,455	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,562	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,666	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:50,771	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,363	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,470	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,576	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,683	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,789	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. component=serve deployment=model
[2m[36m(ServeController pid=70529)[0m 2022-04-26 11:45:55,894	INFO deployment_state.py:1211 -- Adding 1 replicas to deployment 'model'. c

Apart from this `gpt2` function, the only other thing that we need is a way for users to specify the model input,
and a way to display the result.
Since our model takes text as input and output, this turns out to be pretty simple:

In [None]:
st.title("Serving a GPT-2 model")

query = st.text_input(label="Input prompt", value="What's the meaning of life?")

if st.button('Run model'):
    output = gpt2(query)

    st.header("Model output")
    st.text(output)

To serve this model with Streamlit, we use just a few simple text components, namely `st.title`, `st.header`, and
`st.text` for output and `st.text_input` for getting the model input.
We also use a button to trigger model inference for a new input prompt.
There's much more you can do with Streamlit, but this is just a simple example.

```{margin}
The [Streamlit API documentation](https://docs.streamlit.io/library/api-reference)
covers all viable Streamlit components in detail.
```

Finally, if you put everything we just did together in a single file called `streamlit_app.py`,
you can run your Streamlit app with Ray Serve as follows:

In [None]:
streamlit run streamlit_app.py

This should launch an interface that you can interact with that looks like this:

```{image} https://raw.githubusercontent.com/ray-project/images/master/docs/serve/streamlit_serve_gpt.png
```

To summarize, if you know the basics of Streamlit, it's straightforward to deploy a model with Ray Serve with it.