Making this work with relative position bias from XTransformers #5

pfeatherstone · 2022-12-02T13:46:26Z

Is there a way to make this work with RelativePositionBias. Currently this produces an attention bias of size $BHN^2$ where B is batch size, H is number of heads and N is input size. Can this be chunked and computed per chunk?

The text was updated successfully, but these errors were encountered:

lucidrains · 2022-12-05T18:27:11Z

@pfeatherstone if you are working with 1d sequences, the best approach would be https://github.com/lucidrains/x-transformers#dynamic-positional-bias, which is O(n)

the other alternative is ALiBi positional embedding, which needs only to be materialized within each block, but may come with some limitations (unidirectional, forced local attending, etc)

lucidrains · 2022-12-05T22:48:55Z

@pfeatherstone which module are you using from this repository?

you should be using the CUDA implementation from here

pfeatherstone · 2022-12-06T10:44:12Z

@lucidrains Actually, i've just realized, you can pass in attn_bias to both normal and memory efficient attention, which can have dimensions up to [B,H,L,S] where L is target length and S is context length. So you can use that for any additional masking (by filling with -float('inf')) or positional encoding. Correct?

pfeatherstone · 2022-12-06T10:47:58Z

I need to use something that can be ONNX exported. I don't think https://github.com/hazyResearch/flash-attention will work through torch.onnx.export().

Memory efficient attention is great because it yields the exact same result as normal attention, so i can train with memory efficient option turned on, then export to ONNX using normal attention.

Correct me if I'm wrong, but i don't think this will work with flash attention?

pfeatherstone · 2022-12-06T10:48:15Z

I've also kind of given up on the memory efficient implementation, it is cripplingly slow to train.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making this work with relative position bias from XTransformers #5

Making this work with relative position bias from XTransformers #5

pfeatherstone commented Dec 2, 2022 •

edited

lucidrains commented Dec 5, 2022 •

edited

lucidrains commented Dec 5, 2022

pfeatherstone commented Dec 6, 2022

pfeatherstone commented Dec 6, 2022 •

edited

pfeatherstone commented Dec 6, 2022

Making this work with relative position bias from XTransformers #5

Making this work with relative position bias from XTransformers #5

Comments

pfeatherstone commented Dec 2, 2022 • edited

lucidrains commented Dec 5, 2022 • edited

lucidrains commented Dec 5, 2022

pfeatherstone commented Dec 6, 2022

pfeatherstone commented Dec 6, 2022 • edited

pfeatherstone commented Dec 6, 2022

pfeatherstone commented Dec 2, 2022 •

edited

lucidrains commented Dec 5, 2022 •

edited

pfeatherstone commented Dec 6, 2022 •

edited