MultiHeadpooling is same with the paper? #12

AbnerCode · 2019-09-17T13:08:26Z

I have some questions.

In your paper,
equation （15）is not used, so why you propose that?

equation（13）and equation(14)

I don't know why you do this? Can you give some explanation? In addition, the az and bz do not appear in your code.

        scores = self.linear_keys(key)
        value = self.linear_values(value)

        scores = shape(scores, 1).squeeze(-1)
        value = shape(value)
        # key_len = key.size(2)
        # query_len = query.size(2)
        #
        # scores = torch.matmul(query, key.transpose(2, 3))

        if mask is not None:
            mask = mask.unsqueeze(1).expand_as(scores)
            scores = scores.masked_fill(mask, -1e18)

You also not use the way from paper to compute your scores. Why?
Best Wishes!

The text was updated successfully, but these errors were encountered:

nlpyang · 2019-09-17T13:18:41Z

equation 16 has a typo, where it should be used with \hat(a) not a.
you can think a and b as scores and values, which do appear in the code as

hiersumm/src/abstractive/attn.py

Line 241 in 476e6bf

scores = self.linear_keys(key)

hiersumm/src/abstractive/attn.py

Line 242 in 476e6bf

value = self.linear_values(value)
I can't see why this is different from the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiHeadpooling is same with the paper? #12

MultiHeadpooling is same with the paper? #12

AbnerCode commented Sep 17, 2019 •

edited

Loading

nlpyang commented Sep 17, 2019

MultiHeadpooling is same with the paper? #12

MultiHeadpooling is same with the paper? #12

Comments

AbnerCode commented Sep 17, 2019 • edited Loading

nlpyang commented Sep 17, 2019

AbnerCode commented Sep 17, 2019 •

edited

Loading