-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing LSH attention runtime #43
Comments
@erickrf yes, I think this was noted in this chart in the paper The payoffs only come at sequences ~2048 or above, depending on the hash rate. I introduced an extra keyword |
@erickrf there is also perhaps room to improve on LSH. https://science.sciencemag.org/content/358/6364/793/tab-figures-data But that is more a topic for longer term research. I would be happy to make the framework more flexible to experimentation if that is something you would be interested in |
I was thinking more of implementation than the research problem of LSH itself. Well, I'll just leave it for now then. |
@erickrf if you have any ideas for improving on performance, welcoming PRs :) |
The LSH attention is considerably slower than the vanilla full attention; around 8 times slower in my experiments (with sequences shorter than 512 time steps).
I know LSH is supposed to be slower than vanilla attention since there's all the bucketing overhead, and for smaller sequences we can use full attention. But I wonder if we can still get some improvements that will affect running times on longer sequences.
@lucidrains if you believe there's not much to optimize, then please just close this issue.
The text was updated successfully, but these errors were encountered: