Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triangular matrices ? #42

Closed
jeremycochoy opened this issue Dec 7, 2020 · 10 comments
Closed

Triangular matrices ? #42

jeremycochoy opened this issue Dec 7, 2020 · 10 comments

Comments

@jeremycochoy
Copy link

jeremycochoy commented Dec 7, 2020

Does the current implementation provide triangular matrices (to constrain the attention always on the "left" of the sequence, both for input and encoded values) as described in the last section of the original paper?

@lucidrains
Copy link
Owner

@jeremycochoy Hi Jeremy, do you mean in the autoregressive (unidirectional) case? I only see triangular matrices being mentioned in that context

@lucidrains
Copy link
Owner

@jeremycochoy can you point me at this passage in the paper?

@jeremycochoy
Copy link
Author

Yes, its page 17, Annexe B.1. I don't know to which extends it is complex to implement this, if not already there.

@lucidrains
Copy link
Owner

@jeremycochoy ohh I see, yeah, that is for the unidirectional case, and it is already taken care of, through a cumulative sum actually (no masking needed)

@lucidrains
Copy link
Owner

@jeremycochoy you don't need to worry about that detail, you just need to set causal = True and you are good to go

@lucidrains
Copy link
Owner

Screen Shot 2020-12-07 at 11 52 53 AM

just to make sure we are looking at the same thing lol

@jeremycochoy
Copy link
Author

There is no words to say how happy I am to learn it, thats awesome (yes we are looking at the same thing). I can't wait to test it. :)

@lucidrains
Copy link
Owner

lucidrains commented Dec 7, 2020

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

@Muennighoff
Copy link

Muennighoff commented Mar 3, 2021

@jeremycochoy good timing, since @Sleepychord just caught and fixed a big bug in that part of the code loll

Am I understanding it correctly that because of the pretty neat cumsum, we could even run the EncDec version without a decoder mask & still wouldn't spoil the ground truth to the model?

@Muennighoff
Copy link

& so in practice we can construct attn masks the same way for inputs & outputs and they are treated the same way by the model? @lucidrains

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants