TorchServe quickstart chatbot example #3003

agunapal · 2024-03-06T04:04:34Z

Description

This PR enables a new user of TorchServe to quickly launch a chatbot on Mac M1/M2 using TorchServe with 3 commands

# 1: Set HF Token as Env variable
export HUGGINGFACE_TOKEN=<Token> # get this from your HuggingFace account

# 2: Build TorchServe Image for Serving llama2-7b model with 4-bit quantization
./examples/llm/llama2/chat_app/docker/build_image.sh meta-llama/Llama-2-7b-chat-hf

# 3: Launch the streamlit app for server & client
docker run --rm -it --platform linux/amd64 -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:8084:8084 -p 127.0.0.1:8085:8085 -v <model-store>:/home/model-server/model-store pytorch/torchserve:meta-llama---Llama-2-7b-chat-hf

Prerequsites:

HuggingFace token
Docker

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

… into examples/chat_app_m1

msaroufim · 2024-03-07T03:21:41Z

examples/LLM/llama2/chat_app/Readme.md

@@ -9,6 +9,33 @@ We are using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) in
 You can run this example on your laptop to understand how to use TorchServe


+## Quick Start Guide


We can be more ambitious and make this our new getting started

My goal is to do a 3 part solution

chatbot quickstart with streamlit -> Because chatbots are popular

TS multi model app to show TS' full capability - Use this to create video series

quick start script for common use-cases with curl command -> This can be the getting started guide.

msaroufim · 2024-03-07T03:22:31Z

examples/LLM/llama2/chat_app/Readme.md

+# 2: Build TorchServe Image for Serving llama2-7b model with 4-bit quantization
+./examples/llm/llama2/chat_app/docker/build_image.sh meta-llama/Llama-2-7b-chat-hf
+
+# 3: Launch the streamlit app for server & client


I know it's not exactly what you might have in mind but I was thinking this would open a terminal based CLI

I was thinking about how to cover various scenarios..
So, my goal is to do a 3 part solution

chatbot quickstart with streamlit -> Because chatbots are popular

TS multi model app to show TS' full capability - Use this to create video series

quick start script for common use-cases with curl command -> This can be the getting started guide.

msaroufim · 2024-03-07T03:23:39Z

examples/LLM/llama2/chat_app/docker/Dockerfile

+RUN pip install -r /home/model-server/chat_bot/requirements.txt && huggingface-cli login --token $HUGGINGFACE_TOKEN
+RUN pip uninstall torchtext torchdata torch torchvision torchaudio -y
+RUN pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu --ignore-installed
+RUN pip uninstall torchserve torch-model-archiver torch-workflow-archiver -y


seems like a miss?

You are right. This is not needed for this example. Will clean it

msaroufim · 2024-03-07T03:24:01Z

examples/LLM/llama2/chat_app/docker/Dockerfile

+ARG MODEL_NAME
+ARG HUGGINGFACE_TOKEN
+
+USER root


do you need root?

Yes, we don't have permissions to install things with the default user

msaroufim · 2024-03-07T03:26:29Z

examples/LLM/llama2/chat_app/docker/torchserve_server_app.py

+
+
+def start_server():
+    os.system("torchserve --start --ts-config /home/model-server/config.properties")


This would show as success even if the the server failed to start, favor subprocess instead, check the return code and then query torchserve directly to see server health as opposed to using sleep

Oh..good point..let me try. This command was returning immediately. I did try with ping but that was failing as the server was not up yet.

I changed the logic, but it still doesn't work as expected. There is a slight difference between when the command returns and when the server starts. Add a check with ping , but needs a sleep.

msaroufim · 2024-03-07T03:27:41Z

examples/LLM/llama2/chat_app/Readme.md

+
+### What to expect
+This launches two streamlit apps
+1. TorchServe Server app to start/stop TorchServe, load model, scale up/down workers, configure dynamic batch_size ( Currently llama-cpp-python doesn't support batch_size > 1)


It's a bit painful to use llama-cpp here was hopign we could instead showcase an example with export or with mps in eager

I tried a few things

Use HF 7b models with quantization -> Only supported for CUDA

Use HF 7b models without quantization on CPU -> Extremely slow. No one would use this.

Docker with MPS -> Seems like this is still not supported. Even pytorch supports only cpu in docker. MPS-Ready, ARM64 Docker Image pytorch#81224

So, currently this seems like the best solution. Seems like some people have tried mistral7b with llama-cpp-python..Its kind of mind blowing that most existing solutions are only targeted for the GPU rich.

agunapal and others added 5 commits March 5, 2024 20:01

TorchServe quickstart chatbot example

c016879

Added more details in Readme

70c67c0

Merge branch 'master' into examples/chat_app_m1

8c10045

lint failure

6c00fe8

Merge branch 'examples/chat_app_m1' of https://github.com/pytorch/serve…

8e9e435

… into examples/chat_app_m1

agunapal marked this pull request as ready for review March 7, 2024 01:11

agunapal requested a review from msaroufim March 7, 2024 01:11

agunapal added the example label Mar 7, 2024

code cleanup

356a467

msaroufim reviewed Mar 7, 2024

View reviewed changes

msaroufim requested changes Mar 7, 2024

View reviewed changes

review comments

97580ec

agunapal requested a review from msaroufim March 7, 2024 19:37

Merge branch 'master' into examples/chat_app_m1

5f6563e

msaroufim approved these changes Mar 16, 2024

View reviewed changes

Merge branch 'master' into examples/chat_app_m1

7b1a18b

msaroufim enabled auto-merge March 16, 2024 18:31

msaroufim added this pull request to the merge queue Mar 16, 2024

Merged via the queue into master with commit d60ddb0 Mar 16, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe quickstart chatbot example #3003

TorchServe quickstart chatbot example #3003

agunapal commented Mar 6, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

agunapal Mar 7, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

agunapal Mar 7, 2024

msaroufim Mar 7, 2024

agunapal Mar 7, 2024

		@@ -9,6 +9,33 @@ We are using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) in
		You can run this example on your laptop to understand how to use TorchServe


		## Quick Start Guide



		def start_server():
		os.system("torchserve --start --ts-config /home/model-server/config.properties")

TorchServe quickstart chatbot example #3003

TorchServe quickstart chatbot example #3003

Conversation

agunapal commented Mar 6, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment