Add python backend support #86

thakkarparth007 · 2022-10-17T03:14:37Z

Changes:

Modify dockerfile to include bitsandbytes, transformers and latest version of pytorch
Minor modifications in utils/codegen.py so that same client works with FT and Py-backend
Minor modifications in launch.sh (no need to name models by GPU)
Add installation script for adding a new python model (with super simple config_template)
Modify setup.sh so that it aworks with both FT and Python backend models

I've tested the workflow from top to bottom and it seems to work for me.

Limitations:

Many parameters are unused while generating the output (e.g. beam_width). See config_template.pbtxt for unused parameters.
Doesn't take care of batching multiple requests together.
Slow performance. FasterTransformer seems to be 6-7x faster than the python backend. But this is still useful if you want to play with different models, or try out a bigger model (e.g., I can fit the 2B model in my tiny 4GB GPU which I can't with FT). Some numbers: for ~20 tokens it takes ~120ms for FT while 800-900ms for Python backend. Most of Python backend's time goes in model.generate.
Didn't test with multi-gpu world (but should work because of device_map="auto")
Code limited to installing codegen-{350M,2B}-{multi,mono} models. Can trivially allow others, but I'm not sure of the memory requirements of those (it should be O(model_params/1B) + constant GB).
Didn't modify README yet.

Caveats:

Model directory style is slightly different because we don't need a separate copy of the model based on the number of available GPUs. Accelerate takes care of that for us.
Haven't tested with models other than codegen models. The code should work for all decoder-only models IMO, and will probably need to be tweaked a bit while working with Encoder-Decoder models.

Signed-off-by: Parth Thakkar thakkarparth007@gmail.com

- Modify dockerfile to include bitsandbytes, transformers and latest version of pytorch - Minor modifications in utils/codegen.py so that same client works with FT and Py-backend - Minor modifications in launch.sh (no need to name models by GPU) - Add installation script for adding a new python model (with super simple config_template) - Modify setup.sh so that it aworks with both FT and Python backend models Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

moyix · 2022-10-18T03:17:00Z

Oh this is wonderful! This was one of my big wishlist items :D I will try to take this for a test drive later this week and hopefully get it merged shortly after #73 goes in!

fdegier · 2022-10-19T15:21:57Z

Dockerfile

+
+# Install dependencies: torch
+RUN pip3 install -U torch --extra-index-url https://download.pytorch.org/whl/cu116
+RUN pip3 install -U transformers bitsandbytes accelerate


moyix/triton_with_ft:22.09 comes from this repo https://github.com/moyix/fastertransformer_backend/blob/main/docker/Dockerfile

Seems to me more logical to add the dependencies there?

fdegier

I've made a pr for you that resolved the merge conflicts etc. thakkarparth007#1

fdegier · 2022-10-20T13:19:35Z

copilot_proxy/models.py

@@ -4,7 +4,7 @@


 class OpenAIinput(BaseModel):
-    model: str
+    model: str = "fastertransformer|py-model"


Does the | act as an or operator? If not then I think it should be fastertransformer or no default at all. What do you think?

Oh it was meant to signal users that both fastertransformer and py-model are valid values. I wasn't sure how else to convey that, so ended up using this unusual format. I guess a better way to convey that would be to just mark it in the README/docs and let the user who wants to use python backend set that on their own.

Or maybe there's a way to annotate this information such that it shows up in Swagger UI. I'll try to see if there's such an option or fallback to the above option.

fdegier · 2022-10-20T13:21:05Z

docker-compose.yaml

-    image: moyix/triton_with_ft:22.09
+    build:
+      context: .
+      dockerfile: Dockerfile


As mentioned in my other comment, I think the dependencies should be added in moyix/triton_with_ft

Sounds good to me. I can make a PR to that repo instead

fdegier · 2022-10-21T07:29:10Z

docker-compose.yaml

    command: bash -c "CUDA_VISIBLE_DEVICES=${GPUS} mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=/model"
    shm_size: '2gb'
    volumes:
      - ${MODEL_DIR}:/model
+      - ${HF_CACHE_DIR}:/root/.cache/huggingface


If no HF_CACHE_DIR is set this breaks the deployment. Maybe we should default to true?

The current version sets HF_CACHE_DIR to /tmp/hf_cache. The reason I didn't set it to default to true was because docker messes up the file permissions by setting many of them to root (unless rootless docker is used). So, the cache is shared only if the user knows what they're doing.

I could default it to true, and warn the user about the permission issues. Not sure which is a good option.

@fdegier, @moyix thoughts?

@thakkarparth007 if a user does not use the cache, the volume is set to empty, which can not be mounted and causes the docker compose up to fail, hence why I suggested to default to true but I understand the permission issues you mentioned.

I think adding something like HF_DATASETS_CACHE="fauxpilot/.hf-cache" and removing the option to cache, will always cache and not mess up permissions? Cache path needs to verified, consider it just an example.

Whoops, I just saw your comment @fdegier

So currently if you notice in the setup.sh (https://github.com/moyix/fauxpilot/pull/86/files#diff-4209d788ad32c40cbda3c66b3de47eefb929308ca703bb77a6382625986add17R148) then you'll see that HF_CACHE_DIR is being set to /tmp/hf_cache if the user doesn't want to share hf_cache.

But yes, perhaps it'll be better to store the cache in the fauxpilot directory itself. Updated it!

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

thakkarparth007 · 2022-10-21T18:31:12Z

@fdegier thanks for your PR that merges python_backend code on top of #73

I made a few small changes on top of those (except your last commit since my changes were already in progress). Specifically, fixed some issues in the setup.sh file and added a test_setup.py script which performs an E2E test of the python backend functionality.

@moyix this test script builds towards your CI feature from the wishlist. I've run the test locally (just need to invoke the script using pytest). The script currently doesn't need GPU so it can be integrated in a free tier CI tool.

With a few modifications, it can be used for checking fastertransformer backend as well. I don't know if fastertransformer backend can work without GPU though.

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

thakkarparth007 · 2022-11-09T18:52:26Z

Update: Fixed the test and merged with main.

For running the test, you can just run pytest tests from the root directory.

I'd been using dlpack for copying triton tensors to torch tensors, which I did because it was advertised to perform zero copy transfers. Turns out that only worked on my laptop, and didn't work on other machines. IDK why. But for now, I'm just copying the tensors as triton<->numpy<->torch. That works on the VM on which earlier code was segfaulting Signed-off-by: Parth <thakkarparth007@gmail.com>

fdegier reviewed Oct 19, 2022

View reviewed changes

Fred de Gier added 2 commits October 20, 2022 15:37

Pep8 formatting

4f936c3

Resolve merge conflicts and fix issues with setup.sh

2a91018

fdegier requested changes Oct 20, 2022

View reviewed changes

fdegier mentioned this pull request Oct 20, 2022

Python backend thakkarparth007/fauxpilot#1

Closed

fdegier reviewed Oct 21, 2022

View reviewed changes

thakkarparth007 and others added 2 commits October 21, 2022 13:23

Fix setup issues and add test script

c6be129

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

Merge branch 'moyix:main' into python_backend

b3bf26b

moyix mentioned this pull request Oct 21, 2022

chore: Set up CI #98

Open

4 tasks

thakkarparth007 added 2 commits November 8, 2022 17:57

Update location of hf_cache in case user doesn't want to share cache

fa423d1

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

Merge branch 'main' into python_backend

f0a12b5

Signed-off-by: Parth Thakkar <thakkarparth007@gmail.com>

moyix mentioned this pull request Nov 12, 2022

Add contrastive search sampler #117

Closed

moyix merged commit 92dc571 into fauxpilot:main Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add python backend support #86

Add python backend support #86

thakkarparth007 commented Oct 17, 2022 •

edited

moyix commented Oct 18, 2022

fdegier Oct 19, 2022

fdegier left a comment •

edited

fdegier Oct 20, 2022

thakkarparth007 Oct 20, 2022

fdegier Oct 20, 2022

thakkarparth007 Oct 20, 2022

fdegier Oct 21, 2022

thakkarparth007 Oct 22, 2022

fdegier Oct 25, 2022 •

edited

thakkarparth007 Nov 8, 2022 •

edited

thakkarparth007 commented Oct 21, 2022 •

edited

thakkarparth007 commented Nov 9, 2022 •

edited

Add python backend support #86

Add python backend support #86

Conversation

thakkarparth007 commented Oct 17, 2022 • edited

moyix commented Oct 18, 2022

fdegier Oct 19, 2022

Choose a reason for hiding this comment

fdegier left a comment • edited

Choose a reason for hiding this comment

fdegier Oct 20, 2022

Choose a reason for hiding this comment

thakkarparth007 Oct 20, 2022

Choose a reason for hiding this comment

fdegier Oct 20, 2022

Choose a reason for hiding this comment

thakkarparth007 Oct 20, 2022

Choose a reason for hiding this comment

fdegier Oct 21, 2022

Choose a reason for hiding this comment

thakkarparth007 Oct 22, 2022

Choose a reason for hiding this comment

fdegier Oct 25, 2022 • edited

Choose a reason for hiding this comment

thakkarparth007 Nov 8, 2022 • edited

Choose a reason for hiding this comment

thakkarparth007 commented Oct 21, 2022 • edited

thakkarparth007 commented Nov 9, 2022 • edited

thakkarparth007 commented Oct 17, 2022 •

edited

fdegier left a comment •

edited

fdegier Oct 25, 2022 •

edited

thakkarparth007 Nov 8, 2022 •

edited

thakkarparth007 commented Oct 21, 2022 •

edited

thakkarparth007 commented Nov 9, 2022 •

edited