Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding window for transformer #61

Open
MarcusLoppe opened this issue Feb 11, 2024 · 1 comment
Open

Sliding window for transformer #61

MarcusLoppe opened this issue Feb 11, 2024 · 1 comment

Comments

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Feb 11, 2024

Hi,
I was wondering if it was possible to implement a sliding window decoder for the transformer?
When increasing the max sequences length, the training time goes up dramatic and and I think that using a sliding decoder would greatly help with the training and inference speed.

I've tried using LocalAttention but I'm not sure how to properly implement it since it inputs q, k and v.

I know @lucidrains have already spent all their allotted timed and more for this project so if I could be provided with some tips I could try to implement it.

@lucidrains
Copy link
Owner

lucidrains commented Feb 28, 2024

@MarcusLoppe local attention is a good exercise to implement, moderate difficulty for a research engineer. getting kv cache working for bonus points..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants