Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@desertfire
Copy link
Contributor

@desertfire desertfire commented Aug 5, 2024

Summary: The --dynamic-shapes option will default to False. When the actual inputs are with static shapes, calling export with static shapes will make sure more Inductor optimizations take effect down the line. This change by itself improves average tokens/sec from 29.60 to 33.43 on A100. Some following PRs will provide further perf gains.

python3 torchchat.py export llama3 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cuda"}}' --output-dso-path /tmp/model16.so && python3 torchchat.py generate llama3 --dso-path /tmp/model16.so --prompt "Once upon a time," --max-new-tokens 256 --device cuda --num-samples 3

Summary: The inputs to model forward are with static shapes, so changing the export call to make sure more Inductor optimizations will take effect down the stream. This change by itself improves average tokens/sec from 29.60 to 33.43 on A100. Some following PRs will provide further perf gains.
@desertfire desertfire requested a review from Jack-Khuu August 5, 2024 15:08
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1011

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 334654e with merge base 912917f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 5, 2024
@Jack-Khuu
Copy link
Contributor

Great to see the perf gains, is anything lost by switching over to static? Does model support change?

@Jack-Khuu Jack-Khuu requested a review from malfet August 5, 2024 19:24
@desertfire
Copy link
Contributor Author

Great to see the perf gains, is anything lost by switching over to static? Does model support change?

ET export is doing the same so it should be fine,

input = (
torch.tensor([[1]], dtype=torch.long, device=device),
torch.tensor([0], dtype=torch.long, device=device),
)
state_dict = model.state_dict()
state_dict_dtype = state_dict[next(iter(state_dict))].dtype
target_precision = get_precision()
dynamic_shapes = None

@desertfire desertfire changed the title [AOTI] Change export to use static shapes [AOTI] Add a --dynamic-shapes option to export Aug 5, 2024
@Jack-Khuu
Copy link
Contributor

Jack-Khuu commented Aug 6, 2024

Amazing, thanks for fixing this

One last ask: Can you put the repro commands for the #'s in your description?

@desertfire
Copy link
Contributor Author

Amazing, thanks for fixing this

One last ask: Can you put the repro commands for the #'s in your description?

Done. We should add a benchmarking script so that everyone can run the same experiment.

@desertfire desertfire merged commit 46e3ab7 into pytorch:main Aug 6, 2024
@desertfire desertfire deleted the aoti_1 branch August 6, 2024 02:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants