Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@swolchok
Copy link
Contributor

With the current default behavior, performance for e.g. stories110Mwithout custom SDPA is bad because the QKV tensors are long (8192 in the last dim). Limiting the max sequence length remedies this.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1192

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d702862 with merge base 04ea309 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Base automatically changed from gh/swolchok/12/head to main September 24, 2024 19:19
@swolchok swolchok merged commit c40c6bb into main Sep 24, 2024
100 checks passed
@swolchok swolchok deleted the gh/swolchok/13/head branch September 24, 2024 19:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants