longT5 BetterTransformer implementation #1506

omri-sap · 2023-11-01T18:22:39Z

Feature request

longT5 BetterTransformer implementation

Motivation

Encoder-decoder model trained on large context allow machine translation tasks

Your contribution

I looked at the implementation of regular T5 and it doesnt look to complex, i tried to implement myself but didnt succeed. If i can contribute please let me know.

Thank you,
Omri

pszemraj · 2023-11-08T21:46:35Z

seconding this! would be great

matvey-kolbasov-hs · 2023-11-09T12:07:50Z

Totally on board with this! Would love to see this feature added!

omri-sap · 2023-11-12T11:18:02Z

@fxmarty can we try tackle this together?

Thanks in advance

fxmarty · 2023-12-13T12:40:16Z

Hi for reference we are upstreaming SDPA in Transformers, maybe it would be a better fit for longT5: huggingface/transformers#28005

Leaving this open as we may leverage nested tensors for longt5 (which are not in Transformers).

ENate · 2023-12-19T13:13:09Z

Hi @ALL. Is this still open or you guys will work on it?

fxmarty mentioned this issue Dec 13, 2023

Open to contribution: adding torch.nn.functional.scaled_dot_product_attention support for more architectures huggingface/transformers#28005

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

longT5 BetterTransformer implementation #1506

longT5 BetterTransformer implementation #1506

omri-sap commented Nov 1, 2023

pszemraj commented Nov 8, 2023

matvey-kolbasov-hs commented Nov 9, 2023

omri-sap commented Nov 12, 2023

fxmarty commented Dec 13, 2023

ENate commented Dec 19, 2023

longT5 BetterTransformer implementation #1506

longT5 BetterTransformer implementation #1506

Comments

omri-sap commented Nov 1, 2023

Feature request

Motivation

Your contribution

pszemraj commented Nov 8, 2023

matvey-kolbasov-hs commented Nov 9, 2023

omri-sap commented Nov 12, 2023

fxmarty commented Dec 13, 2023

ENate commented Dec 19, 2023