Example llama3 on inf2 #3133

lxning · 2024-05-04T02:40:49Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test continuous batching

python examples/large_models/utils/test_llm_streaming_response.py -m llama-3-70b -o 50 -t 2 -n 4 --prompt-text "Today the weather is really nice and I am planning on " --prompt-randomize
Tasks are completed
payload={'prompt': 'q a k h f n u u p w d l k w g q s Today the weather is really nice and I am planning on ', 'max_new_tokens': 67}
, output=q a k h f n u u p w d l k w g q s Today the weather is really nice and I am planning on  1) ___________ to the beach with my friends. We are going to 2) ___________ a picnic and 3) ___________ some games. I am going to 4) ___________ my camera so I can take some pictures. I am also going to 5) ___________ my sunglasses and

payload={'prompt': 'i y b n d d n k l o r j j o x c u Today the weather is really nice and I am planning on ', 'max_new_tokens': 67}
, output=i y b n d d n k l o r j j o x c u Today the weather is really nice and I am planning on  1) (go) to the beach with my friends. We are going to 2) (take) a picnic and 3) (have) a great time. I am going to 4) (wear) my new bikini and 5) (lie) on the beach all day. I am going to 6

payload={'prompt': 'a x y w x Today the weather is really nice and I am planning on ', 'max_new_tokens': 55}
, output=a x y w x Today the weather is really nice and I am planning on  1. going to the beach. 2. going to the park. 3. going to the cinema. 4. going to the zoo. 5. going to the museum. 6. going to the theatre. 7. going to the swimming pool

payload={'prompt': 't z c n j o t i o h z n s r f Today the weather is really nice and I am planning on ', 'max_new_tokens': 65}
, output=t z c n j o t i o h z n s r f Today the weather is really nice and I am planning on  1) going to the beach. I am going to take my 2) camera with me. I am going to take some 3) pictures of the 4) sea and the 5) sand. I am going to take my 6) sunglasses with me because the sun is really 7) bright.

payload={'prompt': 'p d Today the weather is really nice and I am planning on ', 'max_new_tokens': 52}
, output=p d Today the weather is really nice and I am planning on   going to the beach. I am going to the beach with my family. I am going to the beach with my family because I want to have fun with them. I am going to the beach with my family because I want to have fun with them.

payload={'prompt': 'v l j d c h Today the weather is really nice and I am planning on ', 'max_new_tokens': 56}
, output=v l j d c h Today the weather is really nice and I am planning on  2 things. First, I am going to go to the park and play some basketball. Then, I am going to go to the mall and buy some new clothes. I am going to buy a new pair of shoes, a new shirt, and a new pair of pants.

payload={'prompt': 'v m w i s x x x w l g c Today the weather is really nice and I am planning on ', 'max_new_tokens': 62}
, output=v m w i s x x x w l g c Today the weather is really nice and I am planning on  1) going to the beach 2) going to the park 3) going to the mall 4) going to the movies 5) going to the zoo 6) going to the museum 7) going to the library 8) going to the park 9) going to the mall

payload={'prompt': 'e c k e l b j p j s Today the weather is really nice and I am planning on ', 'max_new_tokens': 60}
, output=e c k e l b j p j s Today the weather is really nice and I am planning on  1. going to the beach 2. going to the park 3. going to the mall 4. going to the movies 5. going to the zoo 6. going to the library 7. going to the museum 8. going to the park 9. going to

Test microbatch + streamer

python examples/large_models/utils/test_llm_streaming_response.py -m llama-3-70b -o 50 -t 2 -n 4 --prompt-text "Today the weather is really nice and I am planning on "
Tasks are completed
payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

mreso

LGTM, left some comments, please address the move of the neuron handlers into example folder before merging.

mreso · 2024-05-08T00:28:32Z

examples/large_models/inferentia2/llama/Readme.md

@@ -1,6 +1,6 @@
 # Large model inference on Inferentia2

-This folder briefs on serving the [Llama 2](https://huggingface.co/meta-llama) model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
+This folder briefs on serving the [Llama 2 and Llama 3](https://huggingface.co/meta-llama) model a on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:


"...model on an AWS Inferentia2..."?

mreso · 2024-05-08T00:40:56Z

...rge_models/inferentia2/llama/continuous_batching/base_neuronx_continuous_batching_handler.py

Seems like something went wrong with your previous PR where you said you moved the handlers into example dir. Please make sure the file in ts/torch_handler/distributed/ gets cleaned up. My assumption is that this file got moved here. Please clarify if there where changes.

The previous PR is about the microbatching+streamer. Here is about continuous batching. They are two different base handlers.

I see, not sure if we're referring to the same PR. I meant #3035 which touched ts/torch_handler/distributed/base_neuronx_continuous_batching_handler.py as well as files under examples/large_models/inferentia2/llama2/continuous_batching so I assumed that we were talking about moving the cb_handler as well. Anyways, please make sure to remove ts/torch_handler/distributed/base_neuronx_continuous_batching_handler.py with this pr.

mreso · 2024-05-08T00:45:32Z

examples/large_models/inferentia2/llama/continuous_batching/llama3-model-config.yaml

+    tp_degree: 24
+    max_length: 256
+    max_new_tokens: 50
+    batch_size: 8


Was paged attention supported in inferentia? Does batch size of 8 give enough flexibility in that case? Would be good to discuss this in the documentation.

Will run benchmark and update batch size.

add llama3 support

5c38ccc

lxning added the example label May 4, 2024

lxning added this to the v0.11.0 milestone May 4, 2024

lxning requested review from mreso and agunapal May 4, 2024 02:40

lxning self-assigned this May 4, 2024

lxning and others added 3 commits May 3, 2024 19:49

delete model config yaml

6c88a31

update model config

695c8fa

Merge branch 'master' into feat/llama3

6289c90

lxning added this to In Review in v0.11.0 lifecycle May 4, 2024

Merge branch 'master' into feat/llama3

e73e273

mreso approved these changes May 8, 2024

View reviewed changes

fix typo

8dc076d

lxning enabled auto-merge May 8, 2024 03:58

mreso disabled auto-merge May 8, 2024 04:25

lxning added this pull request to the merge queue May 8, 2024

Merged via the queue into master with commit 0b4539f May 8, 2024
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example llama3 on inf2 #3133

Example llama3 on inf2 #3133

lxning commented May 4, 2024 •

edited

mreso left a comment

mreso May 8, 2024

mreso May 8, 2024

lxning May 8, 2024

mreso May 8, 2024

mreso May 8, 2024

lxning May 8, 2024

Example llama3 on inf2 #3133

Example llama3 on inf2 #3133

Conversation

lxning commented May 4, 2024 • edited

Description

Type of change

Feature/Issue validation/testing

Checklist:

mreso left a comment

Choose a reason for hiding this comment

mreso May 8, 2024

Choose a reason for hiding this comment

mreso May 8, 2024

Choose a reason for hiding this comment

lxning May 8, 2024

Choose a reason for hiding this comment

mreso May 8, 2024

Choose a reason for hiding this comment

mreso May 8, 2024

Choose a reason for hiding this comment

lxning May 8, 2024

Choose a reason for hiding this comment

lxning commented May 4, 2024 •

edited