Skip to content
This repository was archived by the owner on Jun 4, 2025. It is now read-only.

Conversation

@anmarques
Copy link
Member

This PR moves the scaling by dimensions per head to the attention scores (rather than the query). This is equivalent in FP32, but in quantized models it reduces the scale of discrepancies introduced by quantization before the softmax operation.

@anmarques anmarques merged commit b84a90a into master Apr 28, 2022
@anmarques anmarques deleted the fix-distilbert-scaling branch April 28, 2022 22:26
bfineran pushed a commit that referenced this pull request May 3, 2022
bfineran added a commit that referenced this pull request May 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants