Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiHeadpooling is same with the paper? #12

Open
AbnerCode opened this issue Sep 17, 2019 · 1 comment
Open

MultiHeadpooling is same with the paper? #12

AbnerCode opened this issue Sep 17, 2019 · 1 comment

Comments

@AbnerCode
Copy link

AbnerCode commented Sep 17, 2019

hi @nlpyang

I have some questions.

  1. In your paper,
    equation (15)is not used, so why you propose that?

equation(13)and equation(14)

I don't know why you do this? Can you give some explanation? In addition, the az and bz do not appear in your code.

        scores = self.linear_keys(key)
        value = self.linear_values(value)

        scores = shape(scores, 1).squeeze(-1)
        value = shape(value)
        # key_len = key.size(2)
        # query_len = query.size(2)
        #
        # scores = torch.matmul(query, key.transpose(2, 3))

        if mask is not None:
            mask = mask.unsqueeze(1).expand_as(scores)
            scores = scores.masked_fill(mask, -1e18)

You also not use the way from paper to compute your scores. Why?
Best Wishes!

@nlpyang
Copy link
Owner

nlpyang commented Sep 17, 2019

  1. equation 16 has a typo, where it should be used with \hat(a) not a.

  2. you can think a and b as scores and values, which do appear in the code as

    scores = self.linear_keys(key)

    value = self.linear_values(value)

  3. I can't see why this is different from the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants