A direct Neural Engine encoder backend, ~2x faster than CoreML and faster than Metal #3903
Replies: 3 comments
-
|
Have you ran some additional benchmarks on longer audio - in the readme, I only see tests with |
Beta Was this translation helpful? Give feedback.
-
|
@ggerganov Fair point.
So ~1.9x, stable across all 8 encoder windows with no drift. Added this to RESULTS.md in the repo. Happy to run a specific file or a larger model if you have one in mind. |
Beta Was this translation helpful? Give feedback.
-
|
Opened the PR: #3905. It applies on current |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I built a whisper.cpp encoder backend that runs the encoder directly on the Apple Neural
Engine via ANEForge (no CoreML). On M-series it is the fastest of the three Apple-Silicon
paths: about 2x faster than the CoreML encoder at every size, faster than Metal, and ~5x
less energy.
Same transcripts as the reference encoder (cosine 0.999), so it is not a quality
tradeoff. The speed comes from an ANE-native layout: channels-first 1x1-conv projections
and query-tiled attention, so the full score matrix is never materialized.
It drops in like the CoreML and OpenVINO backends you already ship: the same fill-embd_enc
seam, about 30 lines plus one CMake line, gated by an env var and inert when unset. The
ANE work lives in the external ANEForge package, so the part in whisper.cpp is a thin
shim. Encoder only; the decoder is unchanged.
Code, patch, and benchmark: https://github.com/sbryngelson/whisper-aneforge
Would you take it as an optional backend? I am happy to open a PR mirroring the CoreML one.
Beta Was this translation helpful? Give feedback.
All reactions