You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how the class token is supposed to be handled in the reversible design? Since, replicating the token across the two residual paths is perhaps not optimal.
Any thoughts/pointers to code is appreciated.
The text was updated successfully, but these errors were encountered:
@karttikeya for the Reformer, I'd follow what this paper has done https://arxiv.org/abs/2103.17239 and only have the CLS token cross attend to the full sequence for about ~2-3 rounds at the end, as means of attention pooling
Hi,
I was wondering how the class token is supposed to be handled in the reversible design? Since, replicating the token across the two residual paths is perhaps not optimal.
Any thoughts/pointers to code is appreciated.
The text was updated successfully, but these errors were encountered: