Added dropout support to memory efficient variant #6

usryokousha · 2022-12-30T08:26:27Z

Hey Phil,

I have been using this repository for a project and I wanted to add dropout for completeness. I checked consistency with perceiver-ar impl.. I hope this is helpful.

-Matt

lucidrains · 2022-12-30T18:29:53Z

@usryokousha oh sure, thanks Matt! just for your information, the field is slowly starting to realize that the traditional dropout is pretty useless

however, structured dropout, like https://github.com/lucidrains/x-transformers#forgetful-causal-mask or https://arxiv.org/abs/2206.00826 can still be used, but would not need to exist within the attention operation

lucidrains · 2022-12-30T18:30:50Z

let's merge it for completeness sake though! hope rabe or flash attention is working well for your project! just one more note, you should use the CUDA implementation here for optimal performance

usryokousha · 2023-01-13T14:38:26Z

Phil, thanks for pointing out the two papers on dropout! I wonder how the Bayesformer paper's proposed dropout stands up in non-causal attention. In my own experiments I have always turned it off because I found it hurt training. The CUDA optimized flash attention package looks very appealing! This is going to help in my future projects for sure!

Added dropout support to memory efficient variant

70521cd

lucidrains merged commit c37fbd2 into lucidrains:main Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added dropout support to memory efficient variant #6

Added dropout support to memory efficient variant #6

usryokousha commented Dec 30, 2022

lucidrains commented Dec 30, 2022

lucidrains commented Dec 30, 2022

usryokousha commented Jan 13, 2023

Added dropout support to memory efficient variant #6

Added dropout support to memory efficient variant #6

Conversation

usryokousha commented Dec 30, 2022

lucidrains commented Dec 30, 2022

lucidrains commented Dec 30, 2022

usryokousha commented Jan 13, 2023