Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@desertfire
Copy link
Contributor

Summary: This improves average tokens/sec from 33.43 to 72.63 on A100 for AOTI.

python3 torchchat.py export llama3 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cuda"}}' --output-dso-path /tmp/model16.so && python3 torchchat.py generate llama3 --dso-path /tmp/model16.so --prompt "Once upon a time," --max-new-tokens 256 --device cuda --num-samples 3

Summary: This improves average tokens/sec from 33.43 to 72.63 on A100 for AOTI.

```
python3 torchchat.py export llama3 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cuda"}}' --output-dso-path /tmp/model16.so && python3 torchchat.py generate llama3 --dso-path /tmp/model16.so --prompt "Once upon a time," --max-new-tokens 256 --device cuda --num-samples 3
```
@desertfire desertfire requested a review from Jack-Khuu August 6, 2024 02:55
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1013

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5a9a4b5 with merge base 46e3ab7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 6, 2024
@desertfire desertfire requested a review from byjlw August 6, 2024 12:42
@desertfire desertfire merged commit 67f678b into pytorch:main Aug 6, 2024
@desertfire desertfire deleted the aoti_2 branch August 6, 2024 13:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants