Fix distilbert scaling #43

anmarques · 2022-04-28T17:32:43Z

This PR moves the scaling by dimensions per head to the attention scores (rather than the query). This is equivalent in FP32, but in quantized models it reduces the scale of discrepancies introduced by quantization before the softmax operation.

This reverts commit b84a90a.

anmarques added 2 commits April 28, 2022 12:54

"Moved scaling by dimensions per head to attention scores"

675059b

"Removed temporary code."

b14a35d

anmarques requested review from bfineran, natuan and spacemanidol April 28, 2022 19:41

natuan approved these changes Apr 28, 2022

View reviewed changes

bfineran approved these changes Apr 28, 2022

View reviewed changes

anmarques merged commit b84a90a into master Apr 28, 2022

anmarques deleted the fix-distilbert-scaling branch April 28, 2022 22:26

bfineran pushed a commit that referenced this pull request May 3, 2022

Revert "Fix distilbert scaling (#43)"

564785e

This reverts commit b84a90a.

bfineran added a commit that referenced this pull request May 3, 2022

Revert "Fix distilbert scaling (#43)" (#44)

86c51a9

This reverts commit b84a90a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix distilbert scaling #43

Fix distilbert scaling #43

Uh oh!

anmarques commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix distilbert scaling #43

Fix distilbert scaling #43

Uh oh!

Conversation

anmarques commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants