-
Notifications
You must be signed in to change notification settings - Fork 749
ANE llama runner fixes for iOS26 #16057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: This fixes issues with the ANE-friendly llama on iOS26. See updated readme.md for more information. A key change is decomposing SDPA into matmuls and softmax because iOS26 has a bug in its implementation of SDPA on the ANE. Differential Revision: D88083155
|
@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88083155. |
This PR needs a
|
| in_target_split_size=1, | ||
| in_max_splits=1, | ||
| ) | ||
| def maybe_split_model(model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is maybe_split_model mostly just split linear or something else
This PR:
With these changes, we estimate performance of Llama1B on iPhone 15 Pro / iOS 26 at:
Differential Revision: D88083155