⚡ Much faster ResponseReader performance (backports #642) by nevans · Pull Request #651 · ruby/net-imap

nevans · 2026-04-22T21:41:30Z

This backports the first several commits from #642 to v0.4-stable. While the other commits also improve performance, this is all that is needed to improve the algorithmic complexity from O(n²) to O(n).

Also the other commits add a minor breaking change.

TruffleRuby handles `io.gets(CRLF, limit)` differently when the limit cuts in the middle of the terminator. It's helpful behavior, but it's different enough to break the tests.

A very large response with many small repeated literals can trigger super-linear time. This happens because the regular expression that checks for literal continuation matches from the beginning of the buffer every time. This could be mitigated by searching from an offset, based on what has already been processed, or only searching the most recent line (before merging it with the buffer), but that is still `O(n)` on line length. The regexp is anchored to the end of the string, so searching in reverse from the end of the string should be `O(1)`. This is accomplished by converting `=~` to `rindex`. Note that this _does_ slow down the "no literals" scenario. ``` $ benchmark-driver benchmarks/response_reader.yml --filter KiB Warming up -------------------------------------- 1KiB with no literals 143.564k i/s - 153.197k times in 1.067099s (6.97μs/i) 10KiB with no literals 27.394k i/s - 28.864k times in 1.053670s (36.50μs/i) 100KiB with no literals 2.926k i/s - 3.157k times in 1.079109s (341.81μs/i) 1KiB of 25B literals 2.786k i/s - 2.970k times in 1.066159s (358.98μs/i) 10KiB of 25B literals 263.498 i/s - 286.000 times in 1.085396s (3.80ms/i) 100KiB of 25B literals 19.470 i/s - 20.000 times in 1.027203s (51.36ms/i) 1KiB of 0B literals 530.014 i/s - 530.000 times in 0.999974s (1.89ms/i) 10KiB of 0B literals 45.239 i/s - 50.000 times in 1.105233s (22.10ms/i) 100KiB of 0B literals 3.075 i/s - 4.000 times in 1.300721s (325.18ms/i) Calculating ------------------------------------- local YJIT 1KiB with no literals 137.049k 159.971k i/s - 430.691k times in 3.142607s 2.692304s 10KiB with no literals 27.272k 28.101k i/s - 82.181k times in 3.013413s 2.924470s 100KiB with no literals 2.941k 2.937k i/s - 8.776k times in 2.984095s 2.988129s 1KiB of 25B literals 2.803k 4.136k i/s - 8.357k times in 2.981249s 2.020772s 10KiB of 25B literals 262.978 385.394 i/s - 790.000 times in 3.004055s 2.049850s 100KiB of 25B literals 18.355 22.549 i/s - 58.000 times in 3.159962s 2.572152s 1KiB of 0B literals 505.733 759.572 i/s - 1.590k times in 3.143953s 2.093285s 10KiB of 0B literals 45.414 67.569 i/s - 135.000 times in 2.972648s 1.997962s 100KiB of 0B literals 2.722 3.510 i/s - 9.000 times in 3.306786s 2.564007s Comparison: 1KiB with no literals YJIT: 159971.1 i/s local: 137049.0 i/s - 1.17x slower 10KiB with no literals YJIT: 28101.2 i/s local: 27271.7 i/s - 1.03x slower 100KiB with no literals local: 2940.9 i/s YJIT: 2937.0 i/s - 1.00x slower 1KiB of 25B literals YJIT: 4135.5 i/s local: 2803.2 i/s - 1.48x slower 10KiB of 25B literals YJIT: 385.4 i/s local: 263.0 i/s - 1.47x slower 100KiB of 25B literals YJIT: 22.5 i/s local: 18.4 i/s - 1.23x slower 1KiB of 0B literals YJIT: 759.6 i/s local: 505.7 i/s - 1.50x slower 10KiB of 0B literals YJIT: 67.6 i/s local: 45.4 i/s - 1.49x slower 100KiB of 0B literals YJIT: 3.5 i/s local: 2.7 i/s - 1.29x slower ``` For responses that are larger than 10KiB, the benchmarks do take another dip. Despite that, I believe the algorithm _is_ still linear, and that the performance hit on large responses is probably due to the large strings inducing memory locality (paging/caching) bottlenecks.

I was suprised at how much slower `buff.rindex` is (vs `=~`) when it doesn't match. This speeds up that case significantly (it's now faster than it was prior to the `rindex` change), with only a small impact in the case when it does match. ``` $ benchmark-driver benchmarks/response_reader.yml --filter KiB Warming up -------------------------------------- 1KiB with no literals 202.754k i/s - 210.144k times in 1.036449s (4.93μs/i) 10KiB with no literals 55.683k i/s - 57.541k times in 1.033362s (17.96μs/i) 100KiB with no literals 6.654k i/s - 7.176k times in 1.078491s (150.29μs/i) 1KiB of 25B literals 2.780k i/s - 2.959k times in 1.064363s (359.70μs/i) 10KiB of 25B literals 260.357 i/s - 286.000 times in 1.098491s (3.84ms/i) 100KiB of 25B literals 19.485 i/s - 20.000 times in 1.026445s (51.32ms/i) 1KiB of 0B literals 506.675 i/s - 550.000 times in 1.085508s (1.97ms/i) 10KiB of 0B literals 44.384 i/s - 45.000 times in 1.013872s (22.53ms/i) 100KiB of 0B literals 3.063 i/s - 4.000 times in 1.305939s (326.48ms/i) Calculating ------------------------------------- local YJIT 1KiB with no literals 194.355k 247.756k i/s - 608.261k times in 3.129645s 2.455086s 10KiB with no literals 55.733k 58.585k i/s - 167.049k times in 2.997311s 2.851414s 100KiB with no literals 6.553k 6.453k i/s - 19.961k times in 3.045870s 3.093461s 1KiB of 25B literals 2.732k 4.061k i/s - 8.340k times in 3.052737s 2.053682s 10KiB of 25B literals 256.552 379.524 i/s - 781.000 times in 3.044220s 2.057840s 100KiB of 25B literals 17.804 23.286 i/s - 58.000 times in 3.257733s 2.490779s 1KiB of 0B literals 467.714 703.446 i/s - 1.520k times in 3.249846s 2.160791s 10KiB of 0B literals 45.376 65.876 i/s - 133.000 times in 2.931045s 2.018955s 100KiB of 0B literals 3.072 3.840 i/s - 9.000 times in 2.929458s 2.343586s Comparison: 1KiB with no literals YJIT: 247755.5 i/s local: 194354.7 i/s - 1.27x slower 10KiB with no literals YJIT: 58584.6 i/s local: 55733.0 i/s - 1.05x slower 100KiB with no literals local: 6553.5 i/s YJIT: 6452.6 i/s - 1.02x slower 1KiB of 25B literals YJIT: 4061.0 i/s local: 2732.0 i/s - 1.49x slower 10KiB of 25B literals YJIT: 379.5 i/s local: 256.6 i/s - 1.48x slower 100KiB of 25B literals YJIT: 23.3 i/s local: 17.8 i/s - 1.31x slower 1KiB of 0B literals YJIT: 703.4 i/s local: 467.7 i/s - 1.50x slower 10KiB of 0B literals YJIT: 65.9 i/s local: 45.4 i/s - 1.45x slower 100KiB of 0B literals YJIT: 3.8 i/s local: 3.1 i/s - 1.25x slower ```

Unfortunately, neither `#rindex` nor even `#end_with?` appear to be truly `O(1)` for very large strings (100K+). I'm guessing that this is due to memory locality and caching issues. But, by parsing the literal from the latest `line` (rather than the full buffer), we mostly avoid that problem. Also, by explicitly parsing literal_size immediately after reading the line, we don't need to parse it again in `#done?`. ``` $ benchmark-driver benchmarks/response_reader.yml --filter KiB Warming up -------------------------------------- 1KiB with no literals 202.846k i/s - 214.181k times in 1.055878s (4.93μs/i) 10KiB with no literals 55.699k i/s - 57.354k times in 1.029717s (17.95μs/i) 100KiB with no literals 6.622k i/s - 6.688k times in 1.009943s (151.01μs/i) 1KiB of 25B literals 3.428k i/s - 3.751k times in 1.094065s (291.67μs/i) 10KiB of 25B literals 342.733 i/s - 350.000 times in 1.021202s (2.92ms/i) 100KiB of 25B literals 34.343 i/s - 36.000 times in 1.048234s (29.12ms/i) 1KiB of 0B literals 683.800 i/s - 690.000 times in 1.009066s (1.46ms/i) 10KiB of 0B literals 69.186 i/s - 70.000 times in 1.011759s (14.45ms/i) 100KiB of 0B literals 6.914 i/s - 7.000 times in 1.012449s (144.64ms/i) Calculating ------------------------------------- local YJIT 1KiB with no literals 193.622k 250.330k i/s - 608.539k times in 3.142929s 2.430944s 10KiB with no literals 55.944k 58.881k i/s - 167.096k times in 2.986849s 2.837843s 100KiB with no literals 6.550k 6.480k i/s - 19.866k times in 3.033041s 3.065821s 1KiB of 25B literals 3.445k 5.520k i/s - 10.285k times in 2.985693s 1.863057s 10KiB of 25B literals 338.578 548.670 i/s - 1.028k times in 3.036224s 1.873620s 100KiB of 25B literals 33.829 55.860 i/s - 103.000 times in 3.044728s 1.843900s 1KiB of 0B literals 626.970 1.103k i/s - 2.051k times in 3.271287s 1.860275s 10KiB of 0B literals 66.065 108.347 i/s - 207.000 times in 3.133301s 1.910523s 100KiB of 0B literals 6.720 8.159 i/s - 20.000 times in 2.976273s 2.451265s Comparison: 1KiB with no literals YJIT: 250330.3 i/s local: 193621.6 i/s - 1.29x slower 10KiB with no literals YJIT: 58881.3 i/s local: 55943.9 i/s - 1.05x slower 100KiB with no literals local: 6549.9 i/s YJIT: 6479.8 i/s - 1.01x slower 1KiB of 25B literals YJIT: 5520.5 i/s local: 3444.8 i/s - 1.60x slower 10KiB of 25B literals YJIT: 548.7 i/s local: 338.6 i/s - 1.62x slower 100KiB of 25B literals YJIT: 55.9 i/s local: 33.8 i/s - 1.65x slower 1KiB of 0B literals YJIT: 1102.5 i/s local: 627.0 i/s - 1.76x slower 10KiB of 0B literals YJIT: 108.3 i/s local: 66.1 i/s - 1.64x slower 100KiB of 0B literals YJIT: 8.2 i/s local: 6.7 i/s - 1.21x slower ```

nevans added 5 commits April 21, 2026 09:55

🍒 pick 2179cb4: ✅ Split ResponseReader test for limited io.gets

044a977

TruffleRuby handles `io.gets(CRLF, limit)` differently when the limit cuts in the middle of the terminator. It's helpful behavior, but it's different enough to break the tests.

🍒 pick 348db83: ✅ Use binary mode in ResponseReader tests' IO

7c6f0d5

nevans added backport This issue or PR is for a stable release branch performance related to CPU use, memory use, latency, etc labels Apr 22, 2026

nevans merged commit 88d9523 into v0.4-stable Apr 22, 2026
34 checks passed

nevans deleted the backport/v0.4/response_reader-nonlinear-performance branch April 22, 2026 21:43

nevans added bug Something isn't working security vulnerability patch Pull requests that address security vulnerabilities labels Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Much faster ResponseReader performance (backports #642)#651

⚡ Much faster ResponseReader performance (backports #642)#651
nevans merged 5 commits intov0.4-stablefrom
backport/v0.4/response_reader-nonlinear-performance

nevans commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nevans commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant