Optimizing LSH attention runtime #43

erickrf · 2020-02-21T21:04:03Z

The LSH attention is considerably slower than the vanilla full attention; around 8 times slower in my experiments (with sequences shorter than 512 time steps).

I know LSH is supposed to be slower than vanilla attention since there's all the bucketing overhead, and for smaller sequences we can use full attention. But I wonder if we can still get some improvements that will affect running times on longer sequences.

@lucidrains if you believe there's not much to optimize, then please just close this issue.

lucidrains · 2020-02-21T21:10:02Z

@erickrf yes, I think this was noted in this chart in the paper

The payoffs only come at sequences ~2048 or above, depending on the hash rate. I introduced an extra keyword full_attn_thres, which you can set at your desired threshold to auto-switch between full and LSH attention, so that there is not wasted time and computation at smaller sequences. The other thing to note is, you could set the n_hashes to be smaller. Per my conversations with some of the authors, 4 was enough in most of their runs.

lucidrains · 2020-02-21T21:16:29Z

@erickrf there is also perhaps room to improve on LSH. https://science.sciencemag.org/content/358/6364/793/tab-figures-data But that is more a topic for longer term research. I would be happy to make the framework more flexible to experimentation if that is something you would be interested in

erickrf · 2020-02-21T21:20:57Z

I was thinking more of implementation than the research problem of LSH itself. Well, I'll just leave it for now then.

lucidrains · 2020-02-21T21:28:53Z

@erickrf if you have any ideas for improving on performance, welcoming PRs :)

erickrf closed this as completed Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing LSH attention runtime #43

Optimizing LSH attention runtime #43

erickrf commented Feb 21, 2020

lucidrains commented Feb 21, 2020 •

edited

Loading

lucidrains commented Feb 21, 2020

erickrf commented Feb 21, 2020

lucidrains commented Feb 21, 2020

Optimizing LSH attention runtime #43

Optimizing LSH attention runtime #43

Comments

erickrf commented Feb 21, 2020

lucidrains commented Feb 21, 2020 • edited Loading

lucidrains commented Feb 21, 2020

erickrf commented Feb 21, 2020

lucidrains commented Feb 21, 2020

lucidrains commented Feb 21, 2020 •

edited

Loading