Skip to content

perf: optimize Substring to work directly with strings instead of converting to runes#822

Merged
samber merged 2 commits intosamber:masterfrom
d-enk:perf-substring
Feb 27, 2026
Merged

perf: optimize Substring to work directly with strings instead of converting to runes#822
samber merged 2 commits intosamber:masterfrom
d-enk:perf-substring

Conversation

@d-enk
Copy link
Contributor

@d-enk d-enk commented Feb 27, 2026

  • Rewrite Substring to iterate over string bytes directly, avoiding full []rune conversion
  • Improve performance for long strings by only processing necessary portions
  • Add comprehensive test cases for Unicode handling, invalid UTF-8, and edge cases
  • Add BenchmarkSubstring to measure performance improvements
  • Improve documentation with detailed parameter descriptions
  • Handle invalid UTF-8 sequences by converting to []rune when needed

Bencstat:

old.txtnew.txt               │
                   │    sec/opsec/op     vs baseSubstring/{10_10}-4    558.85n ±  9%   39.75n ± 10%  -92.89% (p=0.000 n=8)
Substring/{50_50}-4    783.10n ±  6%   85.15n ±  5%  -89.13% (p=0.000 n=8)
Substring/{50_45}-4    773.30n ±  3%   126.5n ±  7%  -83.65% (p=0.000 n=8)
Substring/{-50_50}-4   794.00n ±  2%   177.6n ±  7%  -77.63% (p=0.000 n=8)
Substring/{-10_10}-4   542.85n ± 20%   41.82n ±  6%  -92.30% (p=0.000 n=8)
geomean               680.4n         79.52n        -88.31%old.txtnew.txt                │
                   │    B/opB/op    vs baseSubstring/{10_10}-4    432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    464.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)

                   │  old.txtnew.txt                 │
                   │ allocs/opallocs/op   vs baseSubstring/{10_10}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)

…verting to runes

- Rewrite Substring to iterate over string bytes directly, avoiding full []rune conversion
- Improve performance for long strings by only processing necessary portions
- Add comprehensive test cases for Unicode handling, invalid UTF-8, and edge cases
- Add BenchmarkSubstring to measure performance improvements
- Improve documentation with detailed parameter descriptions
- Handle invalid UTF-8 sequences by converting to []rune when needed

Bencstat:

                   │    old.txt    │               new.txt               │
                   │    sec/op     │    sec/op     vs base               │
Substring/{10_10}-4    558.85n ±  9%   39.75n ± 10%  -92.89% (p=0.000 n=8)
Substring/{50_50}-4    783.10n ±  6%   85.15n ±  5%  -89.13% (p=0.000 n=8)
Substring/{50_45}-4    773.30n ±  3%   126.5n ±  7%  -83.65% (p=0.000 n=8)
Substring/{-50_50}-4   794.00n ±  2%   177.6n ±  7%  -77.63% (p=0.000 n=8)
Substring/{-10_10}-4   542.85n ± 20%   41.82n ±  6%  -92.30% (p=0.000 n=8)
geomean               680.4n         79.52n        -88.31%

                   │  old.txt   │               new.txt                │
                   │    B/op    │   B/op    vs base                    │
Substring/{10_10}-4    432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    464.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)

                   │  old.txt   │                new.txt                 │
                   │ allocs/op  │ allocs/op   vs base                    │
Substring/{10_10}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.11%. Comparing base (ac8295b) to head (fbd209a).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #822      +/-   ##
==========================================
+ Coverage   92.03%   92.11%   +0.07%     
==========================================
  Files          32       32              
  Lines        4232     4259      +27     
==========================================
+ Hits         3895     3923      +28     
+ Misses        253      252       -1     
  Partials       84       84              
Flag Coverage Δ
unittests 92.11% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-enk
Copy link
Contributor Author

d-enk commented Feb 27, 2026

Сompletely repeats the current behavior

But looks like here it's as if instead of

strings.ReplaceAll(str, "\x00", "")

was supposed to be

strings.ToValidUTF8(str, "")

Returns a substring starting at the given offset with the specified length. Supports negative offsets; out-of-bounds are clamped. Operates on Unicode runes (characters) and is optimized for zero allocations.
@samber samber merged commit 68f827d into samber:master Feb 27, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants