Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some minor ergonomic changes for python backend #135

Merged
merged 1 commit into from Jan 11, 2023

Conversation

thakkarparth007
Copy link
Collaborator

@thakkarparth007 thakkarparth007 commented Jan 2, 2023

Very simple changes:

  • Add validation rule to ensure model is set to fastertransformer or python-backend. This is to avoid confusion as in Due to the Python backend, FauxPilot returns an inference error. #134
  • Add warning if model is unavailable, likely the user has not set correctly. Another way to help Due to the Python backend, FauxPilot returns an inference error. #134 -- This feels like a bandaid though. I can think of two alternatives: (a) have the backend automatically pick the installed model, but a user could have installed multiple models as well, so this might not be the right thing to do. Or, (b) we could have the user be responsible for choosing the right model and make model a required, non-default field.
  • Logprobs is set to -1 if not supplied. This was a bit inconsistent. The models.py has logprobs set as an optional integer, defaulting to None. The code that reads this variable from the payload is like data.get('logprobs', None), and if it's None, the logprobs value is set to 1. It seems best to not compute logprobs unless the client explicitly asks for it.

Signed-off-by: Parth Thakkar thakkarparth007@gmail.com

- Add validation rule to ensure  is set to fastertransformer or python-backend
- Add warning if model is unavailable, likely the user has not set  correctly

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>
@leemgs
Copy link
Contributor

leemgs commented Jan 3, 2023

Using this PR, I experimented with inference with python-backend to avoid an unexpected issue. But, I didn't get normal inference result. Others questioned whether or not it worked.

Here is my evaluation result. 😭

Step1: Setup and launch FauxPilot

(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ ./setup.sh
.env already exists, do you want to delete .env and recreate it? [y/n] y
Deleting .env
Checking for curl ...
/usr/bin/curl
Checking for zstd ...
/home/invain/anaconda3/envs/deepspeed/bin/zstd
Checking for docker ...
/usr/bin/docker
Enter number of GPUs [1]:
External port for the API [5000]:
Address for Triton [triton]:
Port of Triton host [8001]:
Where do you want to save your models [/work/qtlab/fauxpilot/models]?
Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 2
Do you want to share your huggingface cache between host and docker container? y/n [n]: n
Do you want to use int8? y/n [y]:
Config written to /work/qtlab/fauxpilot/models/py-Salesforce-codegen-350M-multi/py-model/config.pbtxt
docker: 'compose' is not a docker command.
See 'docker --help'
[+] Building 4.3s (17/17) FINISHED
 => [fauxpilot-triton internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                             0.6s
 => => transferring dockerfile: 32B                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-copilot_proxy internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                      0.8s
 => => transferring dockerfile: 32B                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-triton internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                                1.2s
 => => transferring context: 35B                                                                                                                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                         1.0s
 => => transferring context: 35B                                                                                                                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy internal] load metadata for docker.io/library/python:3.10-slim-buster                                                                                                                                                                                                                                                                                                2.4s
 => [fauxpilot-triton internal] load metadata for docker.io/moyix/triton_with_ft:22.09                                                                                                                                                                                                                                                                                                            0.0s
 => [fauxpilot-triton 1/3] FROM docker.io/moyix/triton_with_ft:22.09                                                                                                                                                                                                                                                                                                                              0.0s
 => CACHED [fauxpilot-triton 2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                                                                                                                                                                                                                                        0.0s
 => CACHED [fauxpilot-triton 3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                                                                                                                                                                                                                                  0.0s
 => [fauxpilot-copilot_proxy] exporting to image                                                                                                                                                                                                                                                                                                                                                  1.7s
 => => exporting layers                                                                                                                                                                                                                                                                                                                                                                           0.0s
 => => writing image sha256:1e3a6721f024a29012f8f41e5bfdca2dc7c0dbdfedfe95edfe31c0fb1d2c5bcc                                                                                                                                                                                                                                                                                                      0.1s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                                                                                                                                                                                                                                                               0.0s
 => => writing image sha256:6c1ee95a123bb52f3504bd38cf5699e861da93448bd15c345d4b6734f130c231                                                                                                                                                                                                                                                                                                      0.0s
 => => naming to docker.io/library/fauxpilot-copilot_proxy                                                                                                                                                                                                                                                                                                                                        0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                                                                                                                                                                                                                     0.0s
 => [fauxpilot-copilot_proxy internal] load build context                                                                                                                                                                                                                                                                                                                                         0.2s
 => => transferring context: 1.15kB                                                                                                                                                                                                                                                                                                                                                               0.0s
 => [fauxpilot-copilot_proxy 1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:b0f095dee13b2b4552d545be4f0f1c257f26810c079720c0902dc5e7f3e6b514                                                                                                                                                                                                                                          0.0s
 => CACHED [fauxpilot-copilot_proxy 2/5] WORKDIR /python-docker                                                                                                                                                                                                                                                                                                                                   0.0s
 => CACHED [fauxpilot-copilot_proxy 3/5] COPY copilot_proxy/requirements.txt requirements.txt                                                                                                                                                                                                                                                                                                     0.0s
 => CACHED [fauxpilot-copilot_proxy 4/5] RUN pip3 install --no-cache-dir -r requirements.txt                                                                                                                                                                                                                                                                                                      0.0s
 => CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy .                                                                                                                                                                                                                                                                                                                                     0.0s
Config complete, do you want to run FauxPilot? [y/n] y
unknown flag: --remove-orphans
[+] Running 2/0
 ⠿ Container fauxpilot-copilot_proxy-1  Running                                                                                                                                                                                                                                                                                                                                                   0.0s
 ⠿ Container fauxpilot-triton-1         Running                                                                                                                                                                                                                                                                                                                                                   0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
fauxpilot-copilot_proxy-1  | INFO:     Shutting down
fauxpilot-copilot_proxy-1  | INFO:     Waiting for application shutdown.
fauxpilot-copilot_proxy-1  | INFO:     Application shutdown complete.
fauxpilot-copilot_proxy-1  | INFO:     Finished server process [1]
fauxpilot-copilot_proxy-1 exited with code 0
fauxpilot-copilot_proxy-1 exited with code 0
fauxpilot-triton-1         | I0103 02:23:34.782117 89 server.cc:257] Waiting for in-flight requests to complete.
fauxpilot-triton-1         | I0103 02:23:34.782160 89 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
fauxpilot-triton-1         | I0103 02:23:34.782170 89 model_repository_manager.cc:1223] unloading: py-model:1
fauxpilot-triton-1         | I0103 02:23:34.782295 89 server.cc:288] All models are stopped, unloading models
fauxpilot-triton-1         | I0103 02:23:34.782305 89 server.cc:295] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
 ........... Omission ..........

Step 2: Run client with a REST API

(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"model":"py-model","prompt":"int hello(){","max_tokens":50,"temperature":0.1,"stop":["\n\n"], "logprobs": 0}' http://localhost:5000/v1/engines/codegen/completions
{"id": "cmpl-qEEtJHgU4QXZXojADgcNCT6u2OkaL", "choices": []}(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$ curl -s -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"prompt":"int hello(){","max_tokens":50,"temperature":0.1,"stop":["\n\n"]}' http://localhost:5000/v1/engines/codegen/completions
{"id": "cmpl-4Lbo1AMRmrszM0TVo2SvOTj3Rojln", "choices": []}(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$
(deepspeed) invain@mymate:/work/qtlab/fauxpilot$

@leemgs
Copy link
Contributor

leemgs commented Jan 3, 2023

It seems I'm missing something. I am looking for the cause with log messages of the FauxPilot server.

@leemgs
Copy link
Contributor

leemgs commented Jan 3, 2023

I discovered a cause for the problem with this PR. I submitted PR number #137. Please verify it.

@thakkarparth007
Copy link
Collaborator Author

thakkarparth007 commented Jan 3, 2023

I think what's happening is that your docker images are cached.

See how even the last line shows CACHED:
=> CACHED [fauxpilot-copilot_proxy 5/5] COPY copilot_proxy .

I'm not certain, but could you try running docker stop $(docker ps -aq); docker rm $(docker ps -aq); docker-compose --build before running ./launch.sh?

I do run that command manually. Ideally it should be included in ~/.launch.sh itself.

@leemgs
Copy link
Contributor

leemgs commented Jan 3, 2023

I think what's happening is that your docker images are cached.

@thakkarparth007 , Please refer to the PR #137

Copy link
Contributor

@leemgs leemgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. Let's go ahead.

Acked-by: Geunsik Lim geunsik.lim@samsung.com

@leemgs
Copy link
Contributor

leemgs commented Jan 3, 2023

@moyix Could you review and merge this PR and #137 as emergent PRs to prevent unanticipated Python-backend issues?

@thakkarparth007 thakkarparth007 merged commit 64c478d into main Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants