PMA implementation missing rFF? #11

Timsey · 2021-01-26T15:36:05Z

Dear Juho,

First of all, thank you for the implementation! It has been very helpful to my understanding of the architecture.

I ran into an alleged discrepancy between code and paper, and I was wondering if you could help clear this up. In particular, it seems to me that the PMA implementation is missing the row-wise feed-forward layer that is mentioned in the paper:

PMA(S, Z) = MAB(S, rFF(Z))

The PMA code:

class PMA(nn.Module):
    def __init__(self, dim, num_heads, num_seeds, ln=False):
        super(PMA, self).__init__()
        self.S = nn.Parameter(torch.Tensor(1, num_seeds, dim))
        nn.init.xavier_uniform_(self.S)
        self.mab = MAB(dim, dim, dim, num_heads, ln=ln)

    def forward(self, X):
        return self.mab(self.S.repeat(X.size(0), 1, 1), X)

To me this reads PMA(S, X) = MAB(S, X), rather than the MAB(S, rFF(X)) of the paper.

Thanks!

Tim

The text was updated successfully, but these errors were encountered:

yoonholee · 2021-01-28T03:43:09Z

Hi, you're right; the implementation there slightly diverges from what we described in the paper. In my experience, that change makes virtually no difference because the rFF at the end of the previous ISAB/SAB block serves the same role. Recovering the block in the paper should be a simple 2-line change: add a linear layer (nn.Linear(dim, dim)) and feed X through it before MAB.

Timsey · 2021-01-28T10:29:10Z

Ah perfect: that makes sense, thanks!

Timsey closed this as completed Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMA implementation missing rFF? #11

PMA implementation missing rFF? #11

Timsey commented Jan 26, 2021

yoonholee commented Jan 28, 2021

Timsey commented Jan 28, 2021

PMA implementation missing rFF? #11

PMA implementation missing rFF? #11

Comments

Timsey commented Jan 26, 2021

yoonholee commented Jan 28, 2021

Timsey commented Jan 28, 2021