Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing LSH attention runtime #43

Closed
erickrf opened this issue Feb 21, 2020 · 4 comments
Closed

Optimizing LSH attention runtime #43

erickrf opened this issue Feb 21, 2020 · 4 comments

Comments

@erickrf
Copy link

erickrf commented Feb 21, 2020

The LSH attention is considerably slower than the vanilla full attention; around 8 times slower in my experiments (with sequences shorter than 512 time steps).

I know LSH is supposed to be slower than vanilla attention since there's all the bucketing overhead, and for smaller sequences we can use full attention. But I wonder if we can still get some improvements that will affect running times on longer sequences.

@lucidrains if you believe there's not much to optimize, then please just close this issue.

@lucidrains
Copy link
Owner

lucidrains commented Feb 21, 2020

@erickrf yes, I think this was noted in this chart in the paper
Screen Shot 2020-02-21 at 1 06 26 PM

The payoffs only come at sequences ~2048 or above, depending on the hash rate. I introduced an extra keyword full_attn_thres, which you can set at your desired threshold to auto-switch between full and LSH attention, so that there is not wasted time and computation at smaller sequences. The other thing to note is, you could set the n_hashes to be smaller. Per my conversations with some of the authors, 4 was enough in most of their runs.

@lucidrains
Copy link
Owner

@erickrf there is also perhaps room to improve on LSH. https://science.sciencemag.org/content/358/6364/793/tab-figures-data But that is more a topic for longer term research. I would be happy to make the framework more flexible to experimentation if that is something you would be interested in

@erickrf
Copy link
Author

erickrf commented Feb 21, 2020

I was thinking more of implementation than the research problem of LSH itself. Well, I'll just leave it for now then.

@erickrf erickrf closed this as completed Feb 21, 2020
@lucidrains
Copy link
Owner

@erickrf if you have any ideas for improving on performance, welcoming PRs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants