You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for the implementation! It has been very helpful to my understanding of the architecture.
I ran into an alleged discrepancy between code and paper, and I was wondering if you could help clear this up. In particular, it seems to me that the PMA implementation is missing the row-wise feed-forward layer that is mentioned in the paper:
Hi, you're right; the implementation there slightly diverges from what we described in the paper. In my experience, that change makes virtually no difference because the rFF at the end of the previous ISAB/SAB block serves the same role. Recovering the block in the paper should be a simple 2-line change: add a linear layer (nn.Linear(dim, dim)) and feed X through it before MAB.
Dear Juho,
First of all, thank you for the implementation! It has been very helpful to my understanding of the architecture.
I ran into an alleged discrepancy between code and paper, and I was wondering if you could help clear this up. In particular, it seems to me that the PMA implementation is missing the row-wise feed-forward layer that is mentioned in the paper:
The PMA code:
To me this reads PMA(S, X) = MAB(S, X), rather than the MAB(S, rFF(X)) of the paper.
Thanks!
Tim
The text was updated successfully, but these errors were encountered: