Fix a small bug in the attention bias calculation when flash attention is not available #398

tbuthfer · 2023-11-28T10:17:48Z

Fix a small bug in the attention bias calculation when flash attention is not available.

vhmth · 2024-07-02T21:04:49Z

model.py

            att = F.softmax(att, dim=-1)
-            att = self.attn_dropout(att)
+            att = self.attn_dropout(att) if self.training else att


I believe this shouldn't be necessary since nn.Dropout will not apply dropout if self.training = True.

Dropout source: https://pytorch.org/docs/stable/_modules/torch/nn/modules/dropout.html#Dropout

Which calls F.dropout:

https://pytorch.org/docs/stable/generated/torch.nn.functional.dropout.html

Notice it takes a training arg which comes from nn.Dropout's self.training state.

tbuthfer added 3 commits October 9, 2023 09:53

Fix bug in manual causal self attention

09280d4

tidy

61e0444

delete debug files

0feec7e

vhmth suggested changes Jul 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a small bug in the attention bias calculation when flash attention is not available #398

Fix a small bug in the attention bias calculation when flash attention is not available #398

tbuthfer commented Nov 28, 2023

vhmth Jul 2, 2024

Fix a small bug in the attention bias calculation when flash attention is not available #398

Are you sure you want to change the base?

Fix a small bug in the attention bias calculation when flash attention is not available #398

Conversation

tbuthfer commented Nov 28, 2023

vhmth Jul 2, 2024

Choose a reason for hiding this comment