Skip to content

Commit

Permalink
Improve index-header reading performance further by avoiding allocati…
Browse files Browse the repository at this point in the history
…ng unnecessary strings.

This is inspired by prometheus/prometheus#11535.

Unfortunately, we can't adopt that change as-is, as byte slices
returned by our new Decbuf.UvarintBytes() implementation are not valid
after subsequent reads - we can't take advantage of the magic of mmap.

This means that we must decide whether or not to allocate a string
for a key or value before reading any further in the file. However,
we want to store the last value for each key, but won't know if the
value is the last one until we've read the next one.

The trick is to read the table in two passes. On the first pass, we
read every 1-in-postingOffsetsInMemSampling entries, and keep track of
the position of the last value for each key.

On the second pass, we go back and read the last values for each key.

(I've started with two passes to avoid seeking backwards and
discarding the entire file buffer every time we start reading a new
key - it may be interesting to see if discarding the buffer is as
expensive as I expect.)

This involves a trade off: we'll scan the index-header file twice, but
gain massively reduced memory allocations. On my machine (a M1 MacBook
Pro with a fast SSD), the trade off pays off.

Compared to the previous commit:

name                                         old time/op    new time/op    delta
NewStreamBinaryReader/1Names1Values-10          122µs ± 5%     129µs ±13%     ~     (p=0.151 n=5+5)
NewStreamBinaryReader/1Names10Values-10         131µs ± 9%     124µs ± 2%     ~     (p=0.056 n=5+5)
NewStreamBinaryReader/1Names100Values-10        135µs ± 3%     133µs ± 3%     ~     (p=0.548 n=5+5)
NewStreamBinaryReader/1Names500Values-10        177µs ± 2%     162µs ± 1%   -8.29%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10       229µs ± 2%     198µs ± 2%  -13.51%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10       689µs ± 1%     535µs ± 2%  -22.37%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10         169µs ± 3%     171µs ± 2%     ~     (p=0.310 n=5+5)
NewStreamBinaryReader/20Names10Values-10        194µs ± 2%     188µs ± 4%     ~     (p=0.056 n=5+5)
NewStreamBinaryReader/20Names100Values-10       438µs ± 7%     355µs ± 6%  -19.07%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names500Values-10      1.31ms ± 0%    0.94ms ± 3%  -28.29%  (p=0.016 n=4+5)
NewStreamBinaryReader/20Names1000Values-10     2.28ms ± 2%    1.62ms ± 3%  -29.16%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names5000Values-10     9.95ms ± 2%    6.71ms ± 1%  -32.57%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1Values-10         280µs ± 1%     277µs ± 1%     ~     (p=0.095 n=5+5)
NewStreamBinaryReader/50Names10Values-10        347µs ± 2%     322µs ± 2%   -7.08%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names100Values-10       910µs ± 2%     701µs ± 1%  -22.94%  (p=0.016 n=5+4)
NewStreamBinaryReader/50Names500Values-10      2.97ms ± 2%    2.14ms ± 3%  -28.06%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1000Values-10     5.29ms ± 2%    3.79ms ± 2%  -28.32%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names5000Values-10     24.9ms ± 1%    16.6ms ± 1%  -33.42%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1Values-10        543µs ± 1%     548µs ± 3%     ~     (p=0.548 n=5+5)
NewStreamBinaryReader/100Names10Values-10       678µs ± 3%     632µs ± 4%   -6.77%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10     1.73ms ± 5%    1.37ms ± 5%  -21.00%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names500Values-10     5.63ms ± 2%    4.08ms ± 2%  -27.62%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1000Values-10    10.2ms ± 2%     7.3ms ± 1%  -29.01%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names5000Values-10    49.8ms ± 1%    33.4ms ± 0%  -33.04%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10       1.16ms ± 2%    1.16ms ± 2%     ~     (p=0.548 n=5+5)
NewStreamBinaryReader/200Names10Values-10      1.39ms ± 1%    1.29ms ± 2%   -6.95%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names100Values-10     3.35ms ± 3%    2.68ms ± 4%  -20.01%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names500Values-10     11.5ms ± 1%     8.0ms ± 0%  -30.78%  (p=0.016 n=5+4)
NewStreamBinaryReader/200Names1000Values-10    21.1ms ± 3%    14.5ms ± 1%  -31.39%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10     100ms ± 2%      67ms ± 0%  -32.81%  (p=0.008 n=5+5)

name                                         old alloc/op   new alloc/op   delta
NewStreamBinaryReader/1Names1Values-10         3.18MB ± 0%    3.18MB ± 0%   +0.00%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names10Values-10        3.18MB ± 0%    3.18MB ± 0%   -0.00%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names100Values-10       3.18MB ± 0%    3.18MB ± 0%   -0.05%  (p=0.016 n=5+4)
NewStreamBinaryReader/1Names500Values-10       3.19MB ± 0%    3.18MB ± 0%   -0.24%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10      3.20MB ± 0%    3.18MB ± 0%   -0.48%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10      3.28MB ± 0%    3.20MB ± 0%   -2.36%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10        3.18MB ± 0%    3.18MB ± 0%   +0.06%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names10Values-10       3.19MB ± 0%    3.18MB ± 0%   -0.02%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10      3.22MB ± 0%    3.19MB ± 0%   -0.90%  (p=0.029 n=4+4)
NewStreamBinaryReader/20Names500Values-10      3.38MB ± 0%    3.23MB ± 0%   -4.54%  (p=0.016 n=4+5)
NewStreamBinaryReader/20Names1000Values-10     3.58MB ± 0%    3.27MB ± 0%   -8.61%  (p=0.029 n=4+4)
NewStreamBinaryReader/20Names5000Values-10     5.43MB ± 0%    3.56MB ± 0%  -34.41%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names1Values-10        3.19MB ± 0%    3.19MB ± 0%   +0.13%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names10Values-10       3.20MB ± 0%    3.19MB ± 0%   -0.09%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names100Values-10      3.29MB ± 0%    3.22MB ± 0%   -2.24%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names500Values-10      3.68MB ± 0%    3.30MB ± 0%  -10.44%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1000Values-10     4.18MB ± 0%    3.41MB ± 0%  -18.44%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names5000Values-10     9.29MB ± 0%    4.13MB ± 0%  -55.48%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1Values-10       3.20MB ± 0%    3.21MB ± 0%   +0.25%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names10Values-10      3.22MB ± 0%    3.21MB ± 0%   -0.17%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10     3.40MB ± 0%    3.26MB ± 0%   -4.32%  (p=0.016 n=5+4)
NewStreamBinaryReader/100Names500Values-10     4.19MB ± 0%    3.42MB ± 0%  -18.36%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1000Values-10    5.19MB ± 0%    3.65MB ± 0%  -29.73%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names5000Values-10    15.7MB ± 0%     5.1MB ± 0%  -67.61%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10       3.22MB ± 0%    3.23MB ± 0%   +0.51%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names10Values-10      3.26MB ± 0%    3.25MB ± 0%   -0.33%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names100Values-10     3.63MB ± 0%    3.33MB ± 0%   -8.11%  (p=0.029 n=4+4)
NewStreamBinaryReader/200Names500Values-10     5.52MB ± 0%    3.66MB ± 0%  -33.68%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1000Values-10    7.92MB ± 0%    4.12MB ± 0%  -48.02%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10    29.3MB ± 0%     7.0MB ± 0%  -76.10%  (p=0.016 n=5+4)

name                                         old allocs/op  new allocs/op  delta
NewStreamBinaryReader/1Names1Values-10           76.0 ± 0%      78.0 ± 0%   +2.63%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names10Values-10          95.0 ± 0%      80.0 ± 0%  -15.79%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names100Values-10          278 ± 0%        86 ± 0%  -69.06%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names500Values-10        1.08k ± 0%     0.10k ± 0%  -90.74%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10       2.08k ± 0%     0.12k ± 0%  -94.38%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10       10.1k ± 0%      0.2k ± 0%  -97.58%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10           154 ± 0%       160 ± 0%   +3.90%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names10Values-10          534 ± 0%       200 ± 0%  -62.55%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10       4.19k ± 0%     0.32k ± 0%  -92.37%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names500Values-10       20.2k ± 0%      0.6k ± 0%  -97.03%  (p=0.029 n=4+4)
NewStreamBinaryReader/20Names1000Values-10      40.3k ± 0%      0.9k ± 0%  -97.66%  (p=0.000 n=5+4)
NewStreamBinaryReader/20Names5000Values-10       200k ± 0%        3k ± 0%  -98.26%  (p=0.029 n=4+4)
NewStreamBinaryReader/50Names1Values-10           278 ± 0%       285 ± 0%   +2.52%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names10Values-10        1.23k ± 0%     0.39k ± 0%  -68.65%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names100Values-10       10.4k ± 0%      0.7k ± 0%  -93.40%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names500Values-10       50.5k ± 0%      1.4k ± 0%  -97.26%  (p=0.000 n=5+4)
NewStreamBinaryReader/50Names1000Values-10       101k ± 0%        2k ± 0%  -97.78%  (p=0.029 n=4+4)
NewStreamBinaryReader/50Names5000Values-10       501k ± 0%        9k ± 0%  -98.28%  (p=0.016 n=4+5)
NewStreamBinaryReader/100Names1Values-10          481 ± 0%       489 ± 0%   +1.66%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names10Values-10       2.38k ± 0%     0.69k ± 0%  -71.05%  (p=0.000 n=4+5)
NewStreamBinaryReader/100Names100Values-10      20.7k ± 0%      1.3k ± 0%  -93.76%  (p=0.000 n=5+4)
NewStreamBinaryReader/100Names500Values-10       101k ± 0%        3k ± 0%  -97.33%  (p=0.016 n=5+4)
NewStreamBinaryReader/100Names1000Values-10      201k ± 0%        4k ± 0%  -97.82%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names5000Values-10     1.00M ± 0%     0.02M ± 0%  -98.29%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10          882 ± 0%       891 ± 0%   +1.02%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names10Values-10       4.68k ± 0%     1.29k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/200Names100Values-10      41.3k ± 0%      2.5k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/200Names500Values-10       202k ± 0%        5k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/200Names1000Values-10      402k ± 0%        9k ± 0%  -97.84%  (p=0.000 n=4+5)
NewStreamBinaryReader/200Names5000Values-10     2.00M ± 0%     0.03M ± 0%  -98.30%  (p=0.016 n=4+5)

...and compared to bada69c:

name                                         old time/op    new time/op    delta
NewStreamBinaryReader/1Names1Values-10          122µs ± 8%     129µs ±13%     ~     (p=0.151 n=5+5)
NewStreamBinaryReader/1Names10Values-10         124µs ± 4%     124µs ± 2%     ~     (p=0.421 n=5+5)
NewStreamBinaryReader/1Names100Values-10        138µs ± 2%     133µs ± 3%   -3.45%  (p=0.016 n=5+5)
NewStreamBinaryReader/1Names500Values-10        187µs ± 4%     162µs ± 1%  -13.16%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10       262µs ± 2%     198µs ± 2%  -24.35%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10       837µs ± 3%     535µs ± 2%  -36.05%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10         168µs ± 2%     171µs ± 2%     ~     (p=0.056 n=5+5)
NewStreamBinaryReader/20Names10Values-10        199µs ± 2%     188µs ± 4%   -5.37%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10       505µs ± 1%     355µs ± 6%  -29.75%  (p=0.016 n=4+5)
NewStreamBinaryReader/20Names500Values-10      1.63ms ± 1%    0.94ms ± 3%  -42.27%  (p=0.016 n=4+5)
NewStreamBinaryReader/20Names1000Values-10     2.90ms ± 3%    1.62ms ± 3%  -44.24%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names5000Values-10     12.9ms ± 2%     6.7ms ± 1%  -47.87%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1Values-10         276µs ± 0%     277µs ± 1%     ~     (p=0.286 n=4+5)
NewStreamBinaryReader/50Names10Values-10        368µs ± 1%     322µs ± 2%  -12.49%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names100Values-10      1.10ms ± 4%    0.70ms ± 1%  -36.16%  (p=0.016 n=5+4)
NewStreamBinaryReader/50Names500Values-10      3.73ms ± 3%    2.14ms ± 3%  -42.68%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1000Values-10     6.74ms ± 2%    3.79ms ± 2%  -43.81%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names5000Values-10     32.0ms ± 0%    16.6ms ± 1%  -48.27%  (p=0.016 n=4+5)
NewStreamBinaryReader/100Names1Values-10        547µs ± 1%     548µs ± 3%     ~     (p=0.413 n=4+5)
NewStreamBinaryReader/100Names10Values-10       728µs ± 4%     632µs ± 4%  -13.19%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10     2.08ms ± 5%    1.37ms ± 5%  -34.32%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names500Values-10     7.13ms ± 1%    4.08ms ± 2%  -42.86%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1000Values-10    13.3ms ± 2%     7.3ms ± 1%  -45.46%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names5000Values-10    64.9ms ± 2%    33.4ms ± 0%  -48.57%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10       1.17ms ± 0%    1.16ms ± 2%     ~     (p=0.190 n=4+5)
NewStreamBinaryReader/200Names10Values-10      1.49ms ± 0%    1.29ms ± 2%  -13.17%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names100Values-10     4.05ms ± 3%    2.68ms ± 4%  -33.87%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names500Values-10     14.4ms ± 2%     8.0ms ± 0%  -44.83%  (p=0.016 n=5+4)
NewStreamBinaryReader/200Names1000Values-10    27.3ms ± 2%    14.5ms ± 1%  -47.02%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10     131ms ± 2%      67ms ± 0%  -48.50%  (p=0.008 n=5+5)

name                                         old alloc/op   new alloc/op   delta
NewStreamBinaryReader/1Names1Values-10         3.18MB ± 0%    3.18MB ± 0%   +0.00%  (p=0.032 n=5+5)
NewStreamBinaryReader/1Names10Values-10        3.18MB ± 0%    3.18MB ± 0%   -0.01%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names100Values-10       3.18MB ± 0%    3.18MB ± 0%   -0.15%  (p=0.016 n=5+4)
NewStreamBinaryReader/1Names500Values-10       3.20MB ± 0%    3.18MB ± 0%   -0.74%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10      3.23MB ± 0%    3.18MB ± 0%   -1.47%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10      3.44MB ± 0%    3.20MB ± 0%   -6.91%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10        3.18MB ± 0%    3.18MB ± 0%   +0.04%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names10Values-10       3.19MB ± 0%    3.18MB ± 0%   -0.22%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10      3.29MB ± 0%    3.19MB ± 0%   -2.83%  (p=0.016 n=5+4)
NewStreamBinaryReader/20Names500Values-10      3.70MB ± 0%    3.23MB ± 0%  -12.80%  (p=0.016 n=4+5)
NewStreamBinaryReader/20Names1000Values-10     4.22MB ± 0%    3.27MB ± 0%  -22.47%  (p=0.029 n=4+4)
NewStreamBinaryReader/20Names5000Values-10     8.63MB ± 0%    3.56MB ± 0%  -58.73%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names1Values-10        3.19MB ± 0%    3.19MB ± 0%   +0.08%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names10Values-10       3.21MB ± 0%    3.19MB ± 0%   -0.58%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names100Values-10      3.45MB ± 0%    3.22MB ± 0%   -6.77%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names500Values-10      4.48MB ± 0%    3.30MB ± 0%  -26.43%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1000Values-10     5.78MB ± 0%    3.41MB ± 0%  -41.01%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names5000Values-10     17.3MB ± 0%     4.1MB ± 0%  -76.09%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1Values-10       3.20MB ± 0%    3.21MB ± 0%   +0.15%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names10Values-10      3.25MB ± 0%    3.21MB ± 0%   -1.15%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10     3.72MB ± 0%    3.26MB ± 0%  -12.55%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names500Values-10     5.79MB ± 0%    3.42MB ± 0%  -40.94%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names1000Values-10    8.39MB ± 0%    3.65MB ± 0%  -56.54%  (p=0.016 n=4+5)
NewStreamBinaryReader/100Names5000Values-10    31.7MB ± 0%     5.1MB ± 0%  -83.95%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10       3.22MB ± 0%    3.23MB ± 0%   +0.31%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names10Values-10      3.32MB ± 0%    3.25MB ± 0%   -2.26%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names100Values-10     4.27MB ± 0%    3.33MB ± 0%  -21.89%  (p=0.029 n=4+4)
NewStreamBinaryReader/200Names500Values-10     8.72MB ± 0%    3.66MB ± 0%  -58.03%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names1000Values-10    14.3MB ± 0%     4.1MB ± 0%  -71.26%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10    61.3MB ± 0%     7.0MB ± 0%  -88.58%  (p=0.016 n=5+4)

name                                         old allocs/op  new allocs/op  delta
NewStreamBinaryReader/1Names1Values-10           78.0 ± 0%      78.0 ± 0%     ~     (all equal)
NewStreamBinaryReader/1Names10Values-10           106 ± 0%        80 ± 0%  -24.53%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names100Values-10          379 ± 0%        86 ± 0%  -77.31%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names500Values-10        1.58k ± 0%     0.10k ± 0%  -93.67%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names1000Values-10       3.08k ± 0%     0.12k ± 0%  -96.20%  (p=0.008 n=5+5)
NewStreamBinaryReader/1Names5000Values-10       15.1k ± 0%      0.2k ± 0%  -98.38%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10           175 ± 0%       160 ± 0%   -8.57%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names10Values-10          735 ± 0%       200 ± 0%  -72.79%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10       6.20k ± 0%     0.32k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/20Names500Values-10       30.2k ± 0%      0.6k ± 0%  -98.02%  (p=0.000 n=5+4)
NewStreamBinaryReader/20Names1000Values-10      60.3k ± 0%      0.9k ± 0%  -98.44%  (p=0.029 n=4+4)
NewStreamBinaryReader/20Names5000Values-10       300k ± 0%        3k ± 0%  -98.84%  (p=0.029 n=4+4)
NewStreamBinaryReader/50Names1Values-10           329 ± 0%       285 ± 0%  -13.37%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names10Values-10        1.73k ± 0%     0.39k ± 0%  -77.73%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names100Values-10       15.4k ± 0%      0.7k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/50Names500Values-10       75.5k ± 0%      1.4k ± 0%  -98.17%  (p=0.029 n=4+4)
NewStreamBinaryReader/50Names1000Values-10       151k ± 0%        2k ± 0%  -98.52%  (p=0.029 n=4+4)
NewStreamBinaryReader/50Names5000Values-10       751k ± 0%        9k ± 0%  -98.86%  (p=0.016 n=4+5)
NewStreamBinaryReader/100Names1Values-10          582 ± 0%       489 ± 0%  -15.98%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names10Values-10       3.38k ± 0%     0.69k ± 0%  -79.62%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10      30.7k ± 0%      1.3k ± 0%  -95.80%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names500Values-10       151k ± 0%        3k ± 0%  -98.22%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names1000Values-10      301k ± 0%        4k ± 0%  -98.54%  (p=0.029 n=4+4)
NewStreamBinaryReader/100Names5000Values-10     1.50M ± 0%     0.02M ± 0%  -98.86%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10        1.08k ± 0%     0.89k ± 0%  -17.73%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names10Values-10       6.68k ± 0%     1.29k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/200Names100Values-10      61.3k ± 0%      2.5k ± 0%     ~     (p=0.079 n=4+5)
NewStreamBinaryReader/200Names500Values-10       302k ± 0%        5k ± 0%  -98.25%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1000Values-10      602k ± 0%        9k ± 0%  -98.56%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10     3.00M ± 0%     0.03M ± 0%  -98.86%  (p=0.008 n=5+5)

Read postings offset table in one pass by seeking back to previous values when required.

On my machine with a SSD, this produces mixed results, but seems to
improve things for index-headers with a high number of values and
relatively few names:

name                                         old time/op    new time/op    delta
NewStreamBinaryReader/1Names1Values-10          129µs ±13%     136µs ± 4%     ~     (p=0.151 n=5+5)
NewStreamBinaryReader/1Names10Values-10         124µs ± 2%     129µs ± 7%   +3.75%  (p=0.032 n=5+5)
NewStreamBinaryReader/1Names100Values-10        133µs ± 3%     135µs ± 1%     ~     (p=0.421 n=5+5)
NewStreamBinaryReader/1Names500Values-10        162µs ± 1%     164µs ± 2%     ~     (p=0.421 n=5+5)
NewStreamBinaryReader/1Names1000Values-10       198µs ± 2%     198µs ± 2%     ~     (p=1.000 n=5+5)
NewStreamBinaryReader/1Names5000Values-10       535µs ± 2%     518µs ± 2%   -3.13%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names1Values-10         171µs ± 2%     171µs ± 1%     ~     (p=0.841 n=5+5)
NewStreamBinaryReader/20Names10Values-10        188µs ± 4%     208µs ± 2%  +10.17%  (p=0.008 n=5+5)
NewStreamBinaryReader/20Names100Values-10       355µs ± 6%     383µs ± 2%   +8.03%  (p=0.016 n=5+5)
NewStreamBinaryReader/20Names500Values-10       941µs ± 3%     932µs ± 3%     ~     (p=0.421 n=5+5)
NewStreamBinaryReader/20Names1000Values-10     1.62ms ± 3%    1.57ms ± 3%     ~     (p=0.095 n=5+5)
NewStreamBinaryReader/20Names5000Values-10     6.71ms ± 1%    6.33ms ± 1%   -5.57%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names1Values-10         277µs ± 1%     291µs ± 5%   +5.10%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names10Values-10        322µs ± 2%     394µs ± 5%  +22.15%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names100Values-10       701µs ± 1%     730µs ± 2%   +4.11%  (p=0.016 n=4+5)
NewStreamBinaryReader/50Names500Values-10      2.14ms ± 3%    2.08ms ± 3%     ~     (p=0.095 n=5+5)
NewStreamBinaryReader/50Names1000Values-10     3.79ms ± 2%    3.63ms ± 2%   -4.09%  (p=0.008 n=5+5)
NewStreamBinaryReader/50Names5000Values-10     16.6ms ± 1%    15.5ms ± 0%   -6.59%  (p=0.016 n=5+4)
NewStreamBinaryReader/100Names1Values-10        548µs ± 3%     542µs ± 3%     ~     (p=0.095 n=5+5)
NewStreamBinaryReader/100Names10Values-10       632µs ± 4%     741µs ± 3%  +17.27%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names100Values-10     1.37ms ± 5%    1.40ms ± 5%     ~     (p=0.222 n=5+5)
NewStreamBinaryReader/100Names500Values-10     4.08ms ± 2%    3.97ms ± 2%   -2.62%  (p=0.016 n=5+5)
NewStreamBinaryReader/100Names1000Values-10    7.27ms ± 1%    6.96ms ± 1%   -4.25%  (p=0.008 n=5+5)
NewStreamBinaryReader/100Names5000Values-10    33.4ms ± 0%    31.1ms ± 0%   -6.73%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names1Values-10       1.16ms ± 2%    1.16ms ± 3%     ~     (p=1.000 n=5+5)
NewStreamBinaryReader/200Names10Values-10      1.29ms ± 2%    1.57ms ± 4%  +21.49%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names100Values-10     2.68ms ± 4%    2.73ms ± 3%     ~     (p=0.095 n=5+5)
NewStreamBinaryReader/200Names500Values-10     7.97ms ± 0%    7.64ms ± 1%   -4.09%  (p=0.016 n=4+5)
NewStreamBinaryReader/200Names1000Values-10    14.5ms ± 1%    13.7ms ± 1%   -5.30%  (p=0.008 n=5+5)
NewStreamBinaryReader/200Names5000Values-10    67.3ms ± 0%    62.5ms ± 1%   -7.15%  (p=0.008 n=5+5)
  • Loading branch information
charleskorn committed Dec 19, 2022
1 parent 4ce991c commit 1d98754
Showing 1 changed file with 84 additions and 33 deletions.
117 changes: 84 additions & 33 deletions pkg/storegateway/indexheader/index/postings.go
Expand Up @@ -87,58 +87,109 @@ func newV1PostingOffsetTable(factory *streamencoding.DecbufFactory, tableOffset
return &t, nil
}

func newV2PostingOffsetTable(factory *streamencoding.DecbufFactory, tableOffset int, indexLastPostingEnd uint64, postingOffsetsInMemSampling int) (*PostingOffsetTableV2, error) {
func newV2PostingOffsetTable(factory *streamencoding.DecbufFactory, tableOffset int, indexLastPostingEnd uint64, postingOffsetsInMemSampling int) (table *PostingOffsetTableV2, err error) {
t := PostingOffsetTableV2{
factory: factory,
tableOffset: tableOffset,
postings: map[string]*postingValueOffsets{},
postingOffsetsInMemSampling: postingOffsetsInMemSampling,
}

lastTableOff := 0
valueCount := 0
var lastKey string
var lastValue string
d := factory.NewDecbufAtChecked(tableOffset, castagnoliTable)
defer runutil.CloseWithErrCapture(&err, &d, "read offset table")

// For the postings offset table we keep every label name but only every nth
// label value (plus the first and last one), to save memory.
if err := readOffsetTable(factory, tableOffset, func(key string, value string, off uint64, tableOff int) error {
if _, ok := t.postings[key]; !ok {
// Not seen before label name.
if len(t.postings) > 0 {
// Always include last value for each label name, unless it was just added in previous iteration based
// on valueCount.
if (valueCount-1)%postingOffsetsInMemSampling != 0 {
t.postings[lastKey].offsets = append(t.postings[lastKey].offsets, postingOffset{value: lastValue, tableOff: lastTableOff})
}
t.postings[lastKey].lastValOffset = int64(off - crc32.Size)
startLen := d.Len()
remainingCount := d.Be32()
currentKey := ""
valuesForCurrentKey := 0
lastEntryOffsetInTable := -1

for d.Err() == nil && remainingCount > 0 {
lastKey := currentKey
offsetInTable := startLen - d.Len()
keyCount := d.Uvarint()

// The Postings offset table takes only 2 keys per entry (name and value of label).
if keyCount != 2 {
return nil, errors.Errorf("unexpected key length for posting table %d", keyCount)
}

// Important: this value is only valid as long as we don't perform any further reads from d.
// If we need to retain its value, we must copy it before performing another read.
key := d.UvarintBytes()

if len(t.postings) == 0 || currentKey != string(key) {
newKey := string(key)

if lastEntryOffsetInTable != -1 {
// We haven't recorded the last offset for the last value of the previous key.
// Go back and read the last value for the previous key.
newValueOffset := d.Len()
d.ResetAt(lastEntryOffsetInTable + 4) // 4 bytes for entry count
d.Uvarint() // Skip the key count
d.SkipUvarintBytes() // Skip the key
value := d.UvarintStr()
t.postings[currentKey].offsets = append(t.postings[currentKey].offsets, postingOffset{value: value, tableOff: lastEntryOffsetInTable})

// Skip ahead to where we were before we called ResetAt() above.
d.Skip(d.Len() - newValueOffset)
}
t.postings[key] = &postingValueOffsets{}
valueCount = 0

currentKey = newKey
t.postings[currentKey] = &postingValueOffsets{}
lastEntryOffsetInTable = -1
valuesForCurrentKey = 0
}

lastKey = key
lastValue = value
lastTableOff = tableOff
valueCount++
if valuesForCurrentKey%postingOffsetsInMemSampling == 0 {
value := d.UvarintStr()
off := d.Uvarint64()
t.postings[currentKey].offsets = append(t.postings[currentKey].offsets, postingOffset{value: value, tableOff: offsetInTable})

if (valueCount-1)%postingOffsetsInMemSampling == 0 {
t.postings[key].offsets = append(t.postings[key].offsets, postingOffset{value: value, tableOff: tableOff})
if lastKey != currentKey {
t.postings[lastKey].lastValOffset = int64(off - crc32.Size)
}

// If the current value is the last one for this key, we don't need to record it again.
lastEntryOffsetInTable = -1
} else {
// We only need to store this value if it's the last one for this key.
// Record our current position in the table and come back to it if it turns out this is the last value.
lastEntryOffsetInTable = offsetInTable

// Skip over the value and offset.
d.SkipUvarintBytes()
d.Uvarint64()
}

return nil
}); err != nil {
return nil, errors.Wrap(err, "read postings table")
valuesForCurrentKey++
remainingCount--
}

if lastEntryOffsetInTable != -1 {
// We haven't recorded the last offset for the last value of the last key
// Go back and read the last value for the last key.
d.ResetAt(lastEntryOffsetInTable + 4) // 4 bytes for initial count
d.Uvarint() // Skip the key count
d.SkipUvarintBytes() // Skip the key
value := d.UvarintStr()
t.postings[currentKey].offsets = append(t.postings[currentKey].offsets, postingOffset{value: value, tableOff: lastEntryOffsetInTable})
}

if d.Err() != nil {
return nil, errors.Wrap(d.Err(), "read postings table")
}

if len(t.postings) > 0 {
if (valueCount-1)%postingOffsetsInMemSampling != 0 {
// Always include last value for each label name if not included already based on valueCount.
t.postings[lastKey].offsets = append(t.postings[lastKey].offsets, postingOffset{value: lastValue, tableOff: lastTableOff})
}
// In any case lastValOffset is unknown as don't have next posting anymore. Guess from TOC table.
// In worst case we will overfetch a few bytes.
t.postings[lastKey].lastValOffset = int64(indexLastPostingEnd) - crc32.Size // Each posting offset table ends with a CRC32 checksum.
t.postings[currentKey].lastValOffset = int64(indexLastPostingEnd) - crc32.Size // Each posting offset table ends with a CRC32 checksum.
}

if d.Err() != nil {
return nil, errors.Wrap(d.Err(), "read last values for entries in postings table")
}

// Trim any extra space in the slices.
for k, v := range t.postings {
if len(v.offsets) == cap(v.offsets) {
Expand Down

0 comments on commit 1d98754

Please sign in to comment.