You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I was wondering if it was possible to implement a sliding window decoder for the transformer?
When increasing the max sequences length, the training time goes up dramatic and and I think that using a sliding decoder would greatly help with the training and inference speed.
I've tried using LocalAttention but I'm not sure how to properly implement it since it inputs q, k and v.
I know @lucidrains have already spent all their allotted timed and more for this project so if I could be provided with some tips I could try to implement it.
The text was updated successfully, but these errors were encountered:
Hi,
I was wondering if it was possible to implement a sliding window decoder for the transformer?
When increasing the max sequences length, the training time goes up dramatic and and I think that using a sliding decoder would greatly help with the training and inference speed.
I've tried using LocalAttention but I'm not sure how to properly implement it since it inputs q, k and v.
I know @lucidrains have already spent all their allotted timed and more for this project so if I could be provided with some tips I could try to implement it.
The text was updated successfully, but these errors were encountered: