You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your greate job.Do you have any experiment of ReZero applied in different layers of transformers, like 1 layer Transformer layer and it performance , 2 layer Transformer layers and it performance, and so on.Does it make convergence faster in not so deep net?Thank you.
The text was updated successfully, but these errors were encountered:
In our paper, we experimented with Transformers with 12 layers and found speed increase. We have not tried shallower models, but we hypothesize it should also bring speed improvements.
Thanks for your greate job.Do you have any experiment of ReZero applied in different layers of transformers, like 1 layer Transformer layer and it performance , 2 layer Transformer layers and it performance, and so on.Does it make convergence faster in not so deep net?Thank you.
The text was updated successfully, but these errors were encountered: