Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NEON-accelerated int8mm for bfloat16 #125290

Closed
wants to merge 2 commits into from
Closed

Commits on May 1, 2024

  1. Add NEON-accelerated int8mm for bfloat16

    Apparently `vshlq_u32` is faster than `vcvt_f32_f16`
    
    I.e. the same stories110M run at 60 tokens/sec with f16, but at 66 tokens/sec with bf16
    malfet committed May 1, 2024
    Configuration menu
    Copy the full SHA
    cf9a938 View commit details
    Browse the repository at this point in the history
  2. And float32

    malfet committed May 1, 2024
    Configuration menu
    Copy the full SHA
    954400e View commit details
    Browse the repository at this point in the history