This release adds SIMD-accelerated audio decoding and fixes a motion-compensation rounding bug on arm64.
- Add SSE2, AVX2, and NEON implementations of the audio synthesis filter
- Fix incorrect rounding in the arm64 NEON copyMacroblock
- Consolidate and clean the SSE2/AVX2/NEON copyMacroblock
- Rewrite the pure-Go (noasm) fallback with SWAR, processing 8 bytes per operation
Audio decoding is ~3x faster on both amd64 and arm64.
goos: linux
goarch: amd64
pkg: github.com/gen2brain/mpeg
cpu: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
benchmark old ns/op new ns/op delta
BenchmarkDecodeAudio-8 45920 14250 -68.97%
goos: linux
goarch: arm64
pkg: github.com/gen2brain/mpeg
benchmark old ns/op new ns/op delta
BenchmarkDecodeAudio-4 796000 262000 -67.08%