Some Questions about Attention Mask #2

tang-ed · 2023-07-20T15:30:59Z

Hello, I have reviewed some of the code and did not use an attention mask. It's retnet. Don't you need to cover up the pad ID? Or does the pad ID have no impact on the previous sequence?

Jamie-Stirling · 2023-07-20T15:39:14Z

Please could you clarify what is meant by pad ID here?

tang-ed · 2023-07-20T18:23:38Z

Is the ID for padding the sentence

Jamie-Stirling · 2023-07-21T06:33:23Z

There's no need for an attention mask in this case, as the architecture enforces causality internally (please see the internal usage of the D matrix in retention.py).

Jamie-Stirling closed this as completed Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Questions about Attention Mask #2

Some Questions about Attention Mask #2

tang-ed commented Jul 20, 2023

Jamie-Stirling commented Jul 20, 2023

tang-ed commented Jul 20, 2023

Jamie-Stirling commented Jul 21, 2023

Some Questions about Attention Mask #2

Some Questions about Attention Mask #2

Comments

tang-ed commented Jul 20, 2023

Jamie-Stirling commented Jul 20, 2023

tang-ed commented Jul 20, 2023

Jamie-Stirling commented Jul 21, 2023