Skip to content

Comments

Fix ALiBi slopes for non-power-of-2 num_heads#3071

Merged
awni merged 2 commits intoml-explore:mainfrom
vovw:main
Jan 29, 2026
Merged

Fix ALiBi slopes for non-power-of-2 num_heads#3071
awni merged 2 commits intoml-explore:mainfrom
vovw:main

Conversation

@vovw
Copy link
Contributor

@vovw vovw commented Jan 27, 2026

Proposed Changes

What the diff addresses

The old create_alibi_slope function used a simple formula: x = (2**8) ** (1 / num_heads). The output was mx.power(x, -mx.arange(1, num_heads + 1)). This formula generated a uniform geometric sequence of slopes, which was only accurate when num_heads was a power of 2 (e.g., 1, 2, 4, 8, 16…). However, for non-power-of-2 heads (e.g., 3, 5, 6, 7, 12), it produced incorrect slopes.

The new code employs the algorithm from the https://github.com/ofirpress/attention_with_linear_biases. This algorithm handles non-power-of-2 heads by:

  1. Calculating slopes for the nearest lower power of 2.
  2. Recursively calculating slopes for 2 times that value and interleaving every other slope for the remaining heads.

Additionally, the diff:

  • Removes broken class-level caching (_alibi_mask_key / _alibi_mask), which stored state on the class itself. This was problematic for correctness with varying inputs.

Checklist

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@vovw
Copy link
Contributor Author

vovw commented Jan 27, 2026

solves ##343

@vovw
Copy link
Contributor Author

vovw commented Jan 28, 2026

cc: @zcbenz

@vovw vovw requested a review from awni January 29, 2026 06:33
Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@awni awni merged commit 590b4f1 into ml-explore:main Jan 29, 2026
16 checks passed
@vovw
Copy link
Contributor Author

vovw commented Jan 29, 2026

yay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants