You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @lucidrains
I was wondering if it could make sense to you if I create a pull request where the user can choose between GLUE or MISH as the activation function.
The explanation of MISH can be found here:
If I'm not mistaken there is only one place in reformer_pytorch library where you define GLUE_ in the FeedForward layer, I could add a parameter to the constructor as a flag.
Let me know what would you think about it.
Thank you,
Cal
The text was updated successfully, but these errors were encountered:
@lucidrains I just lost that part! Thank you.
Btw I'm training using RangersLars with DeepSpeed and FP16 Apex optimization on an Encoder-Decoder Reformer architecture and it's working very well.
Hi @lucidrains
I was wondering if it could make sense to you if I create a pull request where the user can choose between GLUE or MISH as the activation function.
The explanation of MISH can be found here:
The GitHub is here:
And the discussion can be found here:
Here there is a little benchmark:
![image](https://user-images.githubusercontent.com/4557464/76566339-d9d52b00-64ac-11ea-90f9-0437527d1f9b.png)
If I'm not mistaken there is only one place in
reformer_pytorch
library where you defineGLUE_
in theFeedForward
layer, I could add a parameter to the constructor as a flag.Let me know what would you think about it.
Thank you,
Cal
The text was updated successfully, but these errors were encountered: