Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Questions about Attention Mask #2

Closed
tang-ed opened this issue Jul 20, 2023 · 3 comments
Closed

Some Questions about Attention Mask #2

tang-ed opened this issue Jul 20, 2023 · 3 comments

Comments

@tang-ed
Copy link

tang-ed commented Jul 20, 2023

Hello, I have reviewed some of the code and did not use an attention mask. It's retnet. Don't you need to cover up the pad ID? Or does the pad ID have no impact on the previous sequence?

@Jamie-Stirling
Copy link
Owner

Please could you clarify what is meant by pad ID here?

@tang-ed
Copy link
Author

tang-ed commented Jul 20, 2023

Is the ID for padding the sentence

@Jamie-Stirling
Copy link
Owner

There's no need for an attention mask in this case, as the architecture enforces causality internally (please see the internal usage of the D matrix in retention.py).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants