Some minor ergonomic changes for python backend #135

thakkarparth007 · 2023-01-02T13:29:14Z

Very simple changes:

Add validation rule to ensure model is set to fastertransformer or python-backend. This is to avoid confusion as in Due to the Python backend, FauxPilot returns an inference error. #134
Add warning if model is unavailable, likely the user has not set correctly. Another way to help Due to the Python backend, FauxPilot returns an inference error. #134 -- This feels like a bandaid though. I can think of two alternatives: (a) have the backend automatically pick the installed model, but a user could have installed multiple models as well, so this might not be the right thing to do. Or, (b) we could have the user be responsible for choosing the right model and make model a required, non-default field.
Logprobs is set to -1 if not supplied. This was a bit inconsistent. The models.py has logprobs set as an optional integer, defaulting to None. The code that reads this variable from the payload is like data.get('logprobs', None), and if it's None, the logprobs value is set to 1. It seems best to not compute logprobs unless the client explicitly asks for it.

Signed-off-by: Parth Thakkar thakkarparth007@gmail.com

- Add validation rule to ensure is set to fastertransformer or python-backend - Add warning if model is unavailable, likely the user has not set correctly Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

leemgs · 2023-01-03T02:27:56Z

Using this PR, I experimented with inference with python-backend to avoid an unexpected issue. But, I didn't get normal inference result. Others questioned whether or not it worked.

Here is my evaluation result. 😭

Step1: Setup and launch FauxPilot

(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ ./setup.sh
.env already exists, do you want to delete .env and recreate it? [y/n] y
Deleting .env
Checking for curl ...
/usr/bin/curl
Checking for zstd ...
/home/invain/anaconda3/envs/deepspeed/bin/zstd
Checking for docker ...
/usr/bin/docker
Enter number of GPUs [1]:
External port for the API [5000]:
Address for Triton [triton]:
Port of Triton host [8001]:
Where do you want to save your models [/work/qtlab/fauxpilot/models]?
Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 2
Do you want to share your huggingface cache between host and docker container? y/n [n]: n
Do you want to use int8? y/n [y]:
Config written to /work/qtlab/fauxpilot/models/py-Salesforce-codegen-350M-multi/py-model/config.pbtxt
docker: 'compose' is not a docker command.
See 'docker --help'
[+] Building 4.3s (17/17) FINISHED
 => [fauxpilot-triton internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                             0.6s
 => => transferring dockerfile: 32B                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-copilot_proxy internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                      0.8s
 => => transferring dockerfile: 32B                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-triton internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                                1.2s
 => => transferring context: 35B                                                                                                                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                         1.0s
 => => transferring context: 35B                                                                                                                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy internal] load metadata for docker.io/library/python:3.10-slim-buster                                                                                                                                                                                                                                                                                                2.4s
 => [fauxpilot-triton internal] load metadata for docker.io/moyix/triton_with_ft:22.09                                                                                                                                                                                                                                                                                                            0.0s
 => [fauxpilot-triton 1/3] FROM docker.io/moyix/triton_with_ft:22.09                                                                                                                                                                                                                                                                                                                              0.0s
 => CACHED [fauxpilot-triton 2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                                                                                                                                                                                                                                        0.0s
 => CACHED [fauxpilot-triton 3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy] exporting to image                                                                                                                                                                                                                                                                                                                                                  1.7s
 => => exporting layers                                                                                                                                                                                                                                                                                                                                                                           0.0s
 => => writing image sha256:1e3a6721f024a29012f8f41e5bfdca2dc7c0dbdfedfe95edfe31c0fb1d2c5bcc                                                                                                                                                                                                                                                                                                      0.1s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                                                                                                                                                                                                                                                               0.0s
 => => writing image sha256:6c1ee95a123bb52f3504bd38cf5699e861da93448bd15c345d4b6734f130c231                                                                                                                                                                                                                                                                                                      0.0s
 => => naming to docker.io/library/fauxpilot-copilot_proxy                                                                                                                                                                                                                                                                                                                                        0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                                                                                                                                                                                                                     0.0s
 => [fauxpilot-copilot_proxy internal] load build context                                                                                                                                                                                                                                                                                                                                         0.2s
 => => transferring context: 1.15kB                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-copilot_proxy 1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:b0f095dee13b2b4552d545be4f0f1c257f26810c079720c0902dc5e7f3e6b514                                                                                                                                                                                                                                          0.0s
 => CACHED [fauxpilot-copilot_proxy 2/5] WORKDIR /python-docker                                                                                                                                                                                                                                                                                                                                   0.0s
 => CACHED [fauxpilot-copilot_proxy 3/5] COPY copilot_proxy/requirements.txt requirements.txt                                                                                                                                                                                                                                                                                                     0.0s
 => CACHED [fauxpilot-copilot_proxy 4/5] RUN pip3 install --no-cache-dir -r requirements.txt                                                                                                                                                                                                                                                                                                      0.0s
 => CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy .                                                                                                                                                                                                                                                                                                                                     0.0s
Config complete, do you want to run FauxPilot? [y/n] y
unknown flag: --remove-orphans
[+] Running 2/0
 ⠿ Container fauxpilot-copilot_proxy-1  Running                                                                                                                                                                                                                                                                                                                                                   0.0s
 ⠿ Container fauxpilot-triton-1         Running                                                                                                                                                                                                                                                                                                                                                   0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
fauxpilot-copilot_proxy-1  | INFO:     Shutting down
fauxpilot-copilot_proxy-1  | INFO:     Waiting for application shutdown.
fauxpilot-copilot_proxy-1  | INFO:     Application shutdown complete.
fauxpilot-copilot_proxy-1  | INFO:     Finished server process [1]
fauxpilot-copilot_proxy-1 exited with code 0
fauxpilot-copilot_proxy-1 exited with code 0
fauxpilot-triton-1         | I0103 02:23:34.782117 89 server.cc:257] Waiting for in-flight requests to complete.
fauxpilot-triton-1         | I0103 02:23:34.782160 89 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
fauxpilot-triton-1         | I0103 02:23:34.782170 89 model_repository_manager.cc:1223] unloading: py-model:1
fauxpilot-triton-1         | I0103 02:23:34.782295 89 server.cc:288] All models are stopped, unloading models
fauxpilot-triton-1         | I0103 02:23:34.782305 89 server.cc:295] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
 ........... Omission ..........

Step 2: Run client with a REST API

(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"model":"py-model","prompt":"int hello(){","max_tokens":50,"temperature":0.1,"stop":["\n\n"], "logprobs": 0}' http://localhost:5000/v1/engines/codegen/completions
{"id": "cmpl-qEEtJHgU4QXZXojADgcNCT6u2OkaL", "choices": []}(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"prompt":"int hello(){","max_tokens":50,"temperature":0.1,"stop":["\n\n"]}' http://localhost:5000/v1/engines/codegen/completions
{"id": "cmpl-4Lbo1AMRmrszM0TVo2SvOTj3Rojln", "choices": []}(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$

leemgs · 2023-01-03T05:16:06Z

It seems I'm missing something. I am looking for the cause with log messages of the FauxPilot server.

leemgs · 2023-01-03T06:01:57Z

I discovered a cause for the problem with this PR. I submitted PR number #137. Please verify it.

thakkarparth007 · 2023-01-03T06:07:15Z

I think what's happening is that your docker images are cached.

See how even the last line shows CACHED:
=> CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy .

I'm not certain, but could you try running docker stop $(docker ps -aq); docker rm $(docker ps -aq); docker-compose --build before running ./launch.sh?

I do run that command manually. Ideally it should be included in ~/.launch.sh itself.

leemgs · 2023-01-03T06:33:21Z

I think what's happening is that your docker images are cached.

@thakkarparth007 , Please refer to the PR #137

leemgs

Neat. Let's go ahead.

Acked-by: Geunsik Lim geunsik.lim@samsung.com

leemgs · 2023-01-03T06:35:56Z

@moyix Could you review and merge this PR and #137 as emergent PRs to prevent unanticipated Python-backend issues?

Some minor ergonomic changes for python backend

4bf40cd

- Add validation rule to ensure is set to fastertransformer or python-backend - Add warning if model is unavailable, likely the user has not set correctly Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

thakkarparth007 mentioned this pull request Jan 2, 2023

Due to the Python backend, FauxPilot returns an inference error. #134

Closed

leemgs approved these changes Jan 3, 2023

View reviewed changes

thakkarparth007 merged commit 64c478d into main Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor ergonomic changes for python backend #135

Some minor ergonomic changes for python backend #135

thakkarparth007 commented Jan 2, 2023 •

edited

leemgs commented Jan 3, 2023 •

edited

leemgs commented Jan 3, 2023

leemgs commented Jan 3, 2023

thakkarparth007 commented Jan 3, 2023 •

edited

leemgs commented Jan 3, 2023

leemgs left a comment

leemgs commented Jan 3, 2023 •

edited

Some minor ergonomic changes for python backend #135

Some minor ergonomic changes for python backend #135

Conversation

thakkarparth007 commented Jan 2, 2023 • edited

leemgs commented Jan 3, 2023 • edited

Step1: Setup and launch FauxPilot

Step 2: Run client with a REST API

leemgs commented Jan 3, 2023

leemgs commented Jan 3, 2023

thakkarparth007 commented Jan 3, 2023 • edited

leemgs commented Jan 3, 2023

leemgs left a comment

Choose a reason for hiding this comment

leemgs commented Jan 3, 2023 • edited

thakkarparth007 commented Jan 2, 2023 •

edited

leemgs commented Jan 3, 2023 •

edited

thakkarparth007 commented Jan 3, 2023 •

edited

leemgs commented Jan 3, 2023 •

edited