You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some small — micro? — optimizations suggested by an LLM that hopefully do not
harm readability/clarity of code.
Individual small optimizations summarized here, with references to original
commits on GitHub:
1. Use fixed size arrays where possible to minimize heap allocations (c13460f)
- ~1.5% improvement over baseline in `sec/op` and `B/op`
2. Convert bytes to lengths directly, skipping `binary.Read()` (98daa47)
- Why? binary.Read() uses reflection on the destination, copies bytes
instead of using them in place, takes an io.Reader interface requiring
method dispatch
- Additional ~0.25% improvement over (1) in `sec/op` and `B/op`
3. Manually optimized payload masking** (a5569e2)
- Additional ~14% improvement over (2) 🔥
4. Replace modulo division with bit shifting** (70fcb03)
- Why? TIL that `i % N == i & (N - 1)` when `N` is a power of 2 and bitwise
AND is usually a single CPU cycle vs multiple cycles for modulo division
- Another ~1.5% improvement over (3)
With all of these combined, it appears that we've improved ReadFrame's
throughput by ~20%!
Note: All intermediate results are combined into the edit history of this
comment[1], which shows only the latest results.
[1]: #50 (comment)
0 commit comments