Refactor `toNormalisedLower`: shorter and slightly faster. #14299

colega · 2024-06-14T07:30:16Z

This is a follow up on #14170

TL;DR, this version of code is much shorter, easier to understand (IMO) and it's also faster!

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
                                                             │     main      │                 new                 │
                                                             │    sec/op     │   sec/op     vs base                │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12       19.840n ±  0%   5.546n ± 0%  -72.05% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12       138.2n ±  0%   120.7n ± 1%  -12.63% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12       20.66n ±  1%   18.52n ± 5%  -10.36% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=false-12      142.6n ±  1%   145.0n ± 1%   +1.72% (p=0.004 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12        21.60n ±  0%   19.54n ± 1%   -9.52% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=false-12       150.3n ±  4%   132.6n ± 1%  -11.81% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12         26.14n ±  2%   20.84n ± 1%  -20.31% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=false-12        143.8n ±  1%   130.8n ± 1%   -9.01% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12       67.97n ±  0%   48.21n ± 1%  -29.08% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12      1.743µ ±  2%   1.719µ ± 1%   -1.38% (p=0.017 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12      69.95n ±  2%   67.11n ± 1%   -4.07% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=false-12     1.721µ ±  0%   1.797µ ± 0%   +4.42% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12       71.70n ± 14%   68.12n ± 1%   -4.99% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=false-12      1.779µ ±  2%   1.728µ ± 1%   -2.87% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12       147.70n ±  1%   99.92n ± 0%  -32.35% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=false-12       1.632µ ±  2%   1.652µ ± 0%   +1.19% (p=0.022 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12      550.6n ±  4%   407.6n ± 0%  -25.96% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12     14.79µ ±  1%   14.50µ ± 1%   -1.95% (p=0.002 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12     554.6n ±  1%   554.8n ± 1%        ~ (p=0.927 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12    14.67µ ±  0%   15.14µ ± 0%   +3.21% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12      542.8n ±  2%   551.5n ± 0%   +1.60% (p=0.027 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12     14.74µ ±  0%   14.55µ ± 0%   -1.30% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1205.5n ±  1%   814.5n ± 1%  -32.43% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12      13.76µ ±  1%   13.66µ ± 0%   -0.72% (p=0.035 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12      2.173µ ±  0%   1.606µ ± 0%  -26.09% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12     58.72µ ±  1%   58.03µ ± 2%   -1.17% (p=0.035 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12     2.196µ ±  1%   2.177µ ± 0%   -0.87% (p=0.001 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12    58.27µ ±  0%   59.97µ ± 0%   +2.93% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12      2.147µ ±  1%   2.095µ ± 5%        ~ (p=0.102 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12     58.73µ ±  1%   58.20µ ± 1%   -0.90% (p=0.001 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12       4.800µ ±  0%   3.265µ ± 1%  -31.97% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12      54.47µ ±  0%   54.53µ ± 1%        ~ (p=0.853 n=10)
geomean                                                         951.6n         832.9n       -12.47%

                                                             │     main     │                    new                    │
                                                             │     B/op     │     B/op      vs base                     │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12         16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12       25.000 ± 0%     9.000 ± 0%   -64.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12        16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=false-12       20.00 ± 0%     17.00 ± 0%   -15.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12         16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=false-12        27.00 ± 0%     11.00 ± 0%   -59.26% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12          16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=false-12         32.00 ± 0%     22.00 ± 0%   -31.25% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12        112.0 ± 0%       0.0 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12     1.159Ki ± 0%   1.050Ki ± 0%    -9.44% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12       112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=false-12    1.126Ki ± 0%   1.104Ki ± 0%    -1.91% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12        112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=false-12     1.170Ki ± 0%   1.061Ki ± 0%    -9.35% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12         112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=false-12      1.203Ki ± 0%   1.137Ki ± 0%    -5.52% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12     1.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12    5.862Ki ± 0%   4.862Ki ± 0%   -17.06% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12    1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12   5.524Ki ± 0%   5.375Ki ± 0%    -2.70% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12     1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12    5.975Ki ± 0%   4.975Ki ± 0%   -16.74% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12     6.312Ki ± 0%   5.712Ki ± 0%    -9.51% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12     4.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12    21.41Ki ± 0%   17.41Ki ± 0%   -18.68% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12    4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12   19.99Ki ± 0%   19.49Ki ± 0%    -2.50% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12     4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12    21.89Ki ± 0%   17.89Ki ± 0%   -18.28% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12      4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12     23.31Ki ± 0%   20.91Ki ± 0%   -10.30% (p=0.000 n=10)
geomean                                                          647.2                      ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                                             │    main    │                   new                   │
                                                             │ allocs/op  │ allocs/op   vs base                     │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12       1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=false-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=true-12       1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=false-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12        1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=false-12       2.000 ± 0%   1.000 ± 0%   -50.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12     5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=false-12    5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=false-12     5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12       1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=false-12      6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12     1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12    1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12   5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12     6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12     1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12    1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12   5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12     6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
geomean                                                        1.901                    ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

Old description, in case you want to understand why there are so many commits here

In the series of "optimising methods nobody cares about", I spent some optimising this one. I started from removing the redundant `isASCII` check, but I felt I couldn't just sent a PR on that so I tried to make it shorter and hopefully faster.

I tried several approaches, some with trade-off, finally ended in a version that optimizes the performance for an already-lowercase string, while not penalizing the "uppercase" in more than 20%.

Expand to see the benchmarks results

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
                                                             │     main      │                 new                 │
                                                             │    sec/op     │   sec/op     vs base                │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12       19.840n ±  0%   5.356n ± 2%  -73.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12       138.2n ±  0%   119.0n ± 1%  -13.90% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12       20.66n ±  1%   20.48n ± 1%   -0.82% (p=0.019 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=false-12      142.6n ±  1%   140.2n ± 1%   -1.61% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12        21.60n ±  0%   22.35n ± 0%   +3.50% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=false-12       150.3n ±  4%   129.3n ± 1%  -13.94% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12         26.14n ±  2%   29.07n ± 1%  +11.21% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=false-12        143.8n ±  1%   129.8n ± 1%   -9.74% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12       67.97n ±  0%   39.77n ± 3%  -41.48% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12      1.743µ ±  2%   1.671µ ± 6%   -4.10% (p=0.014 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12      69.95n ±  2%   68.27n ± 2%   -2.41% (p=0.009 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=false-12     1.721µ ±  0%   1.756µ ± 0%   +2.06% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12       71.70n ± 14%   70.92n ± 0%        ~ (p=0.122 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=false-12      1.779µ ±  2%   1.700µ ± 1%   -4.47% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12        147.7n ±  1%   163.5n ± 2%  +10.73% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=false-12       1.632µ ±  2%   1.627µ ± 0%   -0.31% (p=0.002 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12      550.6n ±  4%   401.5n ± 2%  -27.07% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12     14.79µ ±  1%   14.26µ ± 0%   -3.58% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12     554.6n ±  1%   548.4n ± 0%   -1.13% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12    14.67µ ±  0%   14.91µ ± 1%   +1.68% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12      542.8n ±  2%   556.1n ± 4%        ~ (p=0.796 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12     14.74µ ±  0%   14.44µ ± 0%   -2.01% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12       1.206µ ±  1%   1.445µ ± 1%  +19.87% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12      13.76µ ±  1%   13.60µ ± 0%   -1.15% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12      2.173µ ±  0%   1.594µ ± 1%  -26.67% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12     58.72µ ±  1%   56.95µ ± 1%   -3.01% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12     2.196µ ±  1%   2.159µ ± 1%   -1.71% (p=0.002 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12    58.27µ ±  0%   59.81µ ± 0%   +2.64% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12      2.147µ ±  1%   2.198µ ± 3%        ~ (p=0.617 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12     58.73µ ±  1%   57.30µ ± 4%   -2.43% (p=0.022 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12       4.800µ ±  0%   5.691µ ± 1%  +18.57% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12      54.47µ ±  0%   54.02µ ± 0%   -0.82% (p=0.000 n=10)
geomean                                                         951.6n         880.6n        -7.46%

                                                             │     main     │                    new                    │
                                                             │     B/op     │     B/op      vs base                     │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12         16.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12       25.000 ± 0%     9.000 ± 0%   -64.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12        16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=false-12       20.00 ± 0%     17.00 ± 0%   -15.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12         16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=false-12        27.00 ± 0%     11.00 ± 0%   -59.26% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12          16.00 ± 0%     16.00 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=false-12         32.00 ± 0%     22.00 ± 0%   -31.25% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12        112.0 ± 0%       0.0 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12     1.159Ki ± 0%   1.050Ki ± 0%    -9.44% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12       112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=false-12    1.126Ki ± 0%   1.104Ki ± 0%    -1.91% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12        112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=false-12     1.170Ki ± 0%   1.061Ki ± 0%    -9.35% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12         112.0 ± 0%     112.0 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=false-12      1.203Ki ± 0%   1.137Ki ± 0%    -5.52% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12     1.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12    5.862Ki ± 0%   4.862Ki ± 0%   -17.06% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12    1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12   5.524Ki ± 0%   5.375Ki ± 0%    -2.70% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12     1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12    5.975Ki ± 0%   4.975Ki ± 0%   -16.74% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1.000Ki ± 0%   1.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12     6.312Ki ± 0%   5.712Ki ± 0%    -9.51% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12     4.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12    21.41Ki ± 0%   17.41Ki ± 0%   -18.68% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12    4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12   19.99Ki ± 0%   19.49Ki ± 0%    -2.50% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12     4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12    21.89Ki ± 0%   17.89Ki ± 0%   -18.28% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12      4.000Ki ± 0%   4.000Ki ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12     23.31Ki ± 0%   20.91Ki ± 0%   -10.30% (p=0.000 n=10)
geomean                                                          647.2                      ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                                             │    main    │                   new                   │
                                                             │ allocs/op  │ allocs/op   vs base                     │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12       1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=false-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=true-12       1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=false-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12        1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=false-12       2.000 ± 0%   1.000 ± 0%   -50.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12      1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12     5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=false-12    5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=false-12     5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12       1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=false-12      6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12     1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12    1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12   5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12     6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12     1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12    1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12   5.000 ± 0%   5.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12     1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12    5.000 ± 0%   4.000 ± 0%   -20.00% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12      1.000 ± 0%   1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12     6.000 ± 0%   5.000 ± 0%   -16.67% (p=0.000 n=10)
geomean                                                        1.901                    ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

All in all, I think that the new version is not just shorter, but also easier to read because of that: it fits in one screen, also some things I changed:

Renamed written to pos: it's more descriptive, IMO.
Moved written to the scope of the loop: outside of the loop we can just use b.Len(), which might be slightly slower to call, but benchmarks show it's not making overall results slower (it's inlined)
Removed the check before writing the rest of the string: the compiled code will check that anyway, no need to do it twice.
Lazily grown the strings builder only when needed.

I tried doing:

		if 'A' <= c && c <= 'Z' {
			b.WriteString(s[b.Len():i])
			b.WriteByte(c + 'a' - 'A')
		}

But that's way slower on all uppercase chars.

Interesting things I took from this: replacing strings.Builder with a bytes.Buffer with a stack-backed array is much slower: I thought that the reason is that unfortunately bytes.Buffer.WriteString is not inlined, as opposed to strings.Builder.WriteString()

Okay, so I tried the good old slice-append approach, backed by the stack array then allocating a string from that: it's also much slower.
I also tried just allocating a slice on the heap, and then building an unsafe string from that: also much slower
I wonder if one of the reasons is that strings.Builder does the bytealg.MakeNoZero call to grow, which is something that simple mortals like me can't do..., or maybe I can?

I tried doing this ugly hack (I wasn't going to propose it as a production solution, but since we're already in this rabbithole...)

func noZeroSlice(cap int) []byte {
	sb := strings.Builder{}
	sb.Grow(cap)
	s := sb.String()

	var b []byte
	*(*string)(unsafe.Pointer(&b)) = s
	(*reflect.SliceHeader)(unsafe.Pointer(&b)).Cap = cap
	return b
}

And then using that as a slice, with usual appends, and then yolo-ing the string: this way we could skip all the copyChecks that strings builder does. It turned out to be ~20% faster in the case when all chars are ASCII uppercase.

Ugly hack benchmarks

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
                                                             │     main      │                 new                 │                yolo                 │
                                                             │    sec/op     │    sec/op     vs base               │   sec/op     vs base                │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12        19.84n ±  0%    18.96n ± 2%  -4.41% (p=0.000 n=10)   19.52n ± 1%   -1.61% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=none/ascii=false-12       138.2n ±  0%    131.0n ± 1%  -5.14% (p=0.000 n=10)   137.3n ± 9%        ~ (p=0.490 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=true-12       20.66n ±  1%    19.86n ± 0%  -3.87% (p=0.000 n=10)   20.30n ± 5%        ~ (p=0.159 n=10)
ToNormalizedLower/length=10/uppercase=first/ascii=false-12      142.6n ±  1%    149.5n ± 1%  +4.88% (p=0.001 n=10)   154.0n ± 2%   +8.07% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12        21.60n ±  0%    22.23n ± 1%  +2.92% (p=0.000 n=10)   22.27n ± 2%   +3.13% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=false-12       150.3n ±  4%    145.3n ± 1%  -3.29% (p=0.000 n=10)   147.2n ± 2%   -2.06% (p=0.003 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=true-12         26.14n ±  2%    26.64n ± 0%  +1.89% (p=0.022 n=10)   24.45n ± 1%   -6.50% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=all/ascii=false-12        143.8n ±  1%    136.8n ± 4%  -4.83% (p=0.001 n=10)   138.6n ± 1%   -3.55% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=true-12       67.97n ±  0%    68.26n ± 2%  +0.43% (p=0.002 n=10)   69.22n ± 3%        ~ (p=0.101 n=10)
ToNormalizedLower/length=100/uppercase=none/ascii=false-12      1.743µ ±  2%    1.718µ ± 1%  -1.43% (p=0.041 n=10)   1.721µ ± 4%        ~ (p=0.148 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=true-12      69.95n ±  2%    68.53n ± 1%  -2.03% (p=0.000 n=10)   69.03n ± 0%   -1.33% (p=0.014 n=10)
ToNormalizedLower/length=100/uppercase=first/ascii=false-12     1.721µ ±  0%    1.781µ ± 0%  +3.49% (p=0.000 n=10)   1.806µ ± 1%   +4.94% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12       71.70n ± 14%    70.61n ± 1%       ~ (p=0.089 n=10)   72.08n ± 0%        ~ (p=0.670 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=false-12      1.779µ ±  2%    1.752µ ± 2%  -1.55% (p=0.014 n=10)   1.757µ ± 1%   -1.26% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=true-12        147.7n ±  1%    140.6n ± 2%  -4.84% (p=0.000 n=10)   113.6n ± 2%  -23.09% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=all/ascii=false-12       1.632µ ±  2%    1.649µ ± 2%  +1.04% (p=0.030 n=10)   1.634µ ± 0%        ~ (p=0.423 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12      550.6n ±  4%    540.6n ± 1%  -1.81% (p=0.002 n=10)   556.0n ± 3%        ~ (p=0.218 n=10)
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12     14.79µ ±  1%    14.56µ ± 1%  -1.57% (p=0.001 n=10)   14.95µ ± 4%   +1.04% (p=0.029 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12     554.6n ±  1%    540.7n ± 1%  -2.51% (p=0.000 n=10)   562.0n ± 9%   +1.34% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12    14.67µ ±  0%    15.05µ ± 1%  +2.60% (p=0.000 n=10)   15.36µ ± 2%   +4.69% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12      542.8n ±  2%    525.8n ± 3%  -3.13% (p=0.003 n=10)   555.7n ± 1%   +2.38% (p=0.009 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12     14.74µ ±  0%    14.48µ ± 2%  -1.73% (p=0.009 n=10)   14.70µ ± 1%        ~ (p=0.529 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1205.5n ±  1%   1200.0n ± 0%       ~ (p=0.867 n=10)   944.7n ± 1%  -21.63% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12      13.76µ ±  1%    13.75µ ± 1%       ~ (p=0.684 n=10)   13.84µ ± 0%        ~ (p=0.325 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12      2.173µ ±  0%    2.158µ ± 1%       ~ (p=0.085 n=10)   2.207µ ± 9%   +1.56% (p=0.001 n=10)
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12     58.72µ ±  1%    57.99µ ± 1%  -1.24% (p=0.002 n=10)   59.18µ ± 6%        ~ (p=0.579 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12     2.196µ ±  1%    2.143µ ± 1%  -2.44% (p=0.000 n=10)   2.195µ ± 5%        ~ (p=1.000 n=10)
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12    58.27µ ±  0%    60.13µ ± 0%  +3.20% (p=0.000 n=10)   60.91µ ± 1%   +4.54% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12      2.147µ ±  1%    2.154µ ± 6%       ~ (p=0.898 n=10)   2.237µ ± 0%   +4.19% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12     58.73µ ±  1%    58.71µ ± 5%       ~ (p=0.837 n=10)   58.52µ ± 1%        ~ (p=0.424 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12       4.800µ ±  0%    4.968µ ± 3%  +3.51% (p=0.000 n=10)   3.717µ ± 1%  -22.55% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12      54.47µ ±  0%    56.57µ ± 3%  +3.86% (p=0.000 n=10)   54.58µ ± 1%        ~ (p=0.165 n=10)
geomean                                                         951.6n          945.1n       -0.68%                  934.7n        -1.77%

                                                             │     main     │                  new                   │                  yolo                  │
                                                             │     B/op     │     B/op      vs base                  │     B/op      vs base                  │
ToNormalizedLower/length=10/uppercase=none/ascii=true-12         16.00 ± 0%     16.00 ± 0%        ~ (p=1.000 n=10) ¹     16.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=none/ascii=false-12        25.00 ± 0%     25.00 ± 0%        ~ (p=1.000 n=10) ¹     25.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=true-12        16.00 ± 0%     16.00 ± 0%        ~ (p=1.000 n=10) ¹     16.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=first/ascii=false-12       20.00 ± 0%     27.00 ± 0%  +35.00% (p=0.000 n=10)       27.00 ± 0%  +35.00% (p=0.000 n=10)
ToNormalizedLower/length=10/uppercase=last/ascii=true-12         16.00 ± 0%     16.00 ± 0%        ~ (p=1.000 n=10) ¹     16.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=last/ascii=false-12        27.00 ± 0%     27.00 ± 0%        ~ (p=1.000 n=10) ¹     27.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=true-12          16.00 ± 0%     16.00 ± 0%        ~ (p=1.000 n=10) ¹     16.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=10/uppercase=all/ascii=false-12         32.00 ± 0%     32.00 ± 0%        ~ (p=1.000 n=10) ¹     32.00 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=none/ascii=true-12        112.0 ± 0%     112.0 ± 0%        ~ (p=1.000 n=10) ¹     112.0 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=none/ascii=false-12     1.159Ki ± 0%   1.159Ki ± 0%        ~ (p=1.000 n=10) ¹   1.159Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=true-12       112.0 ± 0%     112.0 ± 0%        ~ (p=1.000 n=10) ¹     112.0 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=first/ascii=false-12    1.126Ki ± 0%   1.170Ki ± 0%   +3.90% (p=0.000 n=10)     1.170Ki ± 0%   +3.90% (p=0.000 n=10)
ToNormalizedLower/length=100/uppercase=last/ascii=true-12        112.0 ± 0%     112.0 ± 0%        ~ (p=1.000 n=10) ¹     112.0 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=last/ascii=false-12     1.170Ki ± 0%   1.170Ki ± 0%        ~ (p=1.000 n=10) ¹   1.170Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=true-12         112.0 ± 0%     112.0 ± 0%        ~ (p=1.000 n=10) ¹     112.0 ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=100/uppercase=all/ascii=false-12      1.203Ki ± 0%   1.203Ki ± 0%        ~ (p=1.000 n=10) ¹   1.203Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=none/ascii=true-12     1.000Ki ± 0%   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=none/ascii=false-12    5.862Ki ± 0%   5.862Ki ± 0%        ~ (p=1.000 n=10) ¹   5.862Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=true-12    1.000Ki ± 0%   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=first/ascii=false-12   5.524Ki ± 0%   5.975Ki ± 0%   +8.15% (p=0.000 n=10)     5.975Ki ± 0%   +8.15% (p=0.000 n=10)
ToNormalizedLower/length=1000/uppercase=last/ascii=true-12     1.000Ki ± 0%   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=last/ascii=false-12    5.975Ki ± 0%   5.975Ki ± 0%        ~ (p=1.000 n=10) ¹   5.975Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=true-12      1.000Ki ± 0%   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹   1.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=1000/uppercase=all/ascii=false-12     6.312Ki ± 0%   6.312Ki ± 0%        ~ (p=1.000 n=10) ¹   6.312Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=none/ascii=true-12     4.000Ki ± 0%   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=none/ascii=false-12    21.41Ki ± 0%   21.41Ki ± 0%        ~ (p=1.000 n=10) ¹   21.41Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=true-12    4.000Ki ± 0%   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=first/ascii=false-12   19.99Ki ± 0%   21.89Ki ± 0%   +9.51% (p=0.000 n=10)     21.89Ki ± 0%   +9.51% (p=0.000 n=10)
ToNormalizedLower/length=4000/uppercase=last/ascii=true-12     4.000Ki ± 0%   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=last/ascii=false-12    21.89Ki ± 0%   21.89Ki ± 0%        ~ (p=1.000 n=10)     21.89Ki ± 0%        ~ (p=1.000 n=10)
ToNormalizedLower/length=4000/uppercase=all/ascii=true-12      4.000Ki ± 0%   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹   4.000Ki ± 0%        ~ (p=1.000 n=10) ¹
ToNormalizedLower/length=4000/uppercase=all/ascii=false-12     23.31Ki ± 0%   23.31Ki ± 0%        ~ (p=1.000 n=10) ¹   23.31Ki ± 0%        ~ (p=1.000 n=10) ¹
geomean                                                          647.2          657.6        +1.60%                      657.6        +1.60%
¹ all samples are equal

colega · 2024-06-15T14:21:14Z

This is ready for review.

Ranveer777 · 2024-06-16T07:03:32Z

model/labels/labels_common.go

@@ -215,3 +216,7 @@ func contains(s []Label, n string) bool {
 	}
 	return false
 }
+
+func yoloString(b []byte) string {
+	return *((*string)(unsafe.Pointer(&b)))


Would using string(buf) be more efficient and safer, or are we opting for *((*string)(unsafe.Pointer(&b))) solely to avoid an allocation?

We're using the trick to avoid an allocation (and also avoid copying the contents, which would happen if we did string(buf))

Ranveer777 · 2024-06-16T07:04:40Z

model/labels/regexp_test.go

+		"FOO":                      "foo",
+		"Foo":                      "foo",
+		"foO":                      "foo",
+		"fOo":                      "foo",


Would it be possible to incorporate a test case into both TestToNormalisedLower and BenchmarkToNormalizedLower? The test case should include all uppercase ASCII characters, with the final character being non-ASCII.
Something like "AAAAſ".

I benchmarked this:

b.Run("ascii uppercase then utf8", func(b *testing.B) { inputs := make([]string, 10) for i := range inputs { inputs[i] = benchCase(9, "all", true, i) + "Ж" } b.ResetTimer() for n := 0; n < b.N; n++ { toNormalisedLower(inputs[n%len(inputs)]) } })

And the results were:

goos: darwin goarch: arm64 pkg: github.com/prometheus/prometheus/model/labels │ main_asciithenunicode │ new_asciithenunicode │ │ sec/op │ sec/op vs base │ ToNormalizedLower/ascii_uppercase_then_utf8-12 85.16n ± 0% 84.91n ± 1% ~ (p=0.221 n=6) │ main_asciithenunicode │ new_asciithenunicode │ │ B/op │ B/op vs base │ ToNormalizedLower/ascii_uppercase_then_utf8-12 32.00 ± 0% 32.00 ± 0% ~ (p=1.000 n=6) ¹ ¹ all samples are equal │ main_asciithenunicode │ new_asciithenunicode │ │ allocs/op │ allocs/op vs base │ ToNormalizedLower/ascii_uppercase_then_utf8-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=6) ¹ ¹ all samples are equal

Given that fitting this benchmark into current benchmarks structure would require a refactor, and it's not measuring any difference, I'd prefer to leave the benchmarks unmodified in this PR for the sake of clarity.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

There's now in common code without buildtags. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

jesusvazquez

LGTM

Two learnings from this PR:

utf8.RuneSelf lets you know if a character is single byte or multibyte
our yoloString method to avoid copying the string. Wondering how different the performance would be if we used strings.Builder instead.

I agree method is more readable now, used split view to compare 👍

colega · 2024-06-18T09:50:42Z

Wondering how different the performance would be if we used strings.Builder instead.

strings.Builder was in one of the intermediate commits, we can't modify string.Builder's contents (boo!), so we need to write the string as we go, instead of this approach where we can just copy the entire string (and rely on CPU & compiler optimizations to do that efficiently) and then modify the uppercase chars.

colega marked this pull request as draft June 14, 2024 07:38

colega marked this pull request as ready for review June 14, 2024 08:09

colega changed the title ~~Refactor toNormalisedLower: shorter and slightly faster.~~ Refactor toNormalisedLower: shorter and ~slightly faster~. Jun 14, 2024

colega changed the title ~~Refactor toNormalisedLower: shorter and ~slightly faster~.~~ Refactor toNormalisedLower: shorter and slightly faster. Jun 14, 2024

colega force-pushed the refactor-to-normalised-lower branch from 1654677 to f0439f0 Compare June 14, 2024 14:08

colega marked this pull request as draft June 14, 2024 16:38

colega marked this pull request as ready for review June 14, 2024 16:49

Ranveer777 reviewed Jun 16, 2024

View reviewed changes

Ranveer777 approved these changes Jun 16, 2024

View reviewed changes

colega added 6 commits June 17, 2024 09:06

Refactor toNormalisedLower: shorter and slightly faster

5041b18

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Normalise the original string

c9d50af

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Lazy-allocate string

8bf629a

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

No need for a pointer

5755289

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

KISS

e335c02

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Remove duplicate yoloString

4340269

There's now in common code without buildtags. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

colega force-pushed the refactor-to-normalised-lower branch from 818419d to 4340269 Compare June 17, 2024 07:07

Merge branch 'main' into refactor-to-normalised-lower

c3caaf2

jesusvazquez approved these changes Jun 18, 2024

View reviewed changes

jesusvazquez enabled auto-merge (squash) June 18, 2024 09:47

jesusvazquez merged commit 4f78cc8 into prometheus:main Jun 18, 2024
25 checks passed

pracucci mentioned this pull request Jun 20, 2024

Sync upstream prometheus grafana/mimir-prometheus#653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `toNormalisedLower`: shorter and slightly faster. #14299

Refactor `toNormalisedLower`: shorter and slightly faster. #14299

colega commented Jun 14, 2024 •

edited

Loading

colega commented Jun 15, 2024

Ranveer777 Jun 16, 2024

colega Jun 16, 2024

Ranveer777 Jun 16, 2024

colega Jun 16, 2024

jesusvazquez left a comment

colega commented Jun 18, 2024

Refactor toNormalisedLower: shorter and slightly faster. #14299

Refactor toNormalisedLower: shorter and slightly faster. #14299

Conversation

colega commented Jun 14, 2024 • edited Loading

colega commented Jun 15, 2024

Ranveer777 Jun 16, 2024

Choose a reason for hiding this comment

colega Jun 16, 2024

Choose a reason for hiding this comment

Ranveer777 Jun 16, 2024

Choose a reason for hiding this comment

colega Jun 16, 2024

Choose a reason for hiding this comment

jesusvazquez left a comment

Choose a reason for hiding this comment

colega commented Jun 18, 2024

Refactor `toNormalisedLower`: shorter and slightly faster. #14299

Refactor `toNormalisedLower`: shorter and slightly faster. #14299

colega commented Jun 14, 2024 •

edited

Loading