Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

revamp sean's bstream write optimizations #2024

Merged
merged 5 commits into from
Feb 1, 2022
Merged

Conversation

Dieterbe
Copy link
Contributor

@Dieterbe Dieterbe commented Feb 1, 2022

This Pr is a reworked version of #2010.
It first does some tweaks to benchmark functions, and then introduces sean's code, but in a split way, so that both tweaks can be tested independently.

I ran this analysis on it:

export GO111MODULE=off
base=c580c26c8989e909c2b382c23d0098cf7bf83f1e
opt1=69b22e9d070a475506beb8b4976af7760230970a
opt2=29ba4276960e908722b62d39d970f9a8a2513144

echo "BASE"
git checkout $base
for i in echo {1..10}; do
go test github.com/grafana/metrictank/mdata/chunk/tsz -run '^$' -bench PushSeriesLong | egrep -v 'tsz_test.go:41: SeriesLong size|BENCH:' | tee -a master.bench
done

echo "small optimization for writeByte happy path"
git checkout $opt1
for i in echo {1..10}; do
go test github.com/grafana/metrictank/mdata/chunk/tsz -run '^$' -bench PushSeriesLong | egrep -v 'tsz_test.go:41: SeriesLong size|BENCH:' | tee -a opt1.bench
done

echo "Byte align writes PLUS Clean up unused code"
git checkout $opt2
for i in echo {1..10}; do
go test github.com/grafana/metrictank/mdata/chunk/tsz -run '^$' -bench PushSeriesLong | egrep -v 'tsz_test.go:41: SeriesLong size|BENCH:' | tee -a opt2.bench
done

echo "BASE->OPT1"
benchstat master.bench opt1.bench

echo "OPT1->OPT2"
benchstat opt1.bench opt2.bench
BASE->OPT1
name                                              old time/op  new time/op  delta
PushSeriesLongMonotonicIncreaseWithResets-16       272ns ±37%   246ns ±20%    ~     (p=0.857 n=4+3)
PushSeriesLongMonotonicIncrease-16                 259ns ±37%   246ns ± 9%    ~     (p=0.629 n=4+3)
PushSeriesLongSawtooth-16                          205ns ±48%   197ns ±34%    ~     (p=0.857 n=4+3)
PushSeriesLongSawtoothWithFlats-16                 167ns ±11%   196ns ±26%    ~     (p=0.629 n=4+3)
PushSeriesLongSteps-16                             282ns ±32%   227ns ±20%    ~     (p=0.229 n=4+3)
PushSeriesLongRealWorldCPU-16                      302ns ±23%   289ns ±26%    ~     (p=0.857 n=4+3)
PushSeriesLongMonotonicIncreaseWithResets/1-16     109ns ± 1%   108ns ± 1%  -0.77%  (p=0.000 n=12+11)
PushSeriesLongMonotonicIncreaseWithResets/10-16    468ns ± 1%   462ns ± 1%  -1.46%  (p=0.000 n=12+11)
PushSeriesLongMonotonicIncreaseWithResets/30-16   1.05µs ± 1%  1.03µs ± 1%  -1.64%  (p=0.000 n=12+11)
PushSeriesLongMonotonicIncreaseWithResets/60-16   1.89µs ± 1%  1.87µs ± 1%  -1.09%  (p=0.000 n=12+11)
PushSeriesLongMonotonicIncreaseWithResets/120-16  3.65µs ± 1%  3.61µs ± 3%  -1.32%  (p=0.002 n=11+11)
PushSeriesLongMonotonicIncreaseWithResets/180-16  5.48µs ± 2%  5.42µs ± 2%  -1.07%  (p=0.008 n=12+11)
PushSeriesLongMonotonicIncreaseWithResets/240-16  7.19µs ± 1%  7.09µs ± 1%  -1.36%  (p=0.000 n=11+11)
PushSeriesLongMonotonicIncrease/1-16               109ns ± 1%   108ns ± 1%  -0.54%  (p=0.024 n=10+10)
PushSeriesLongMonotonicIncrease/10-16              371ns ± 1%   365ns ± 1%  -1.59%  (p=0.000 n=11+11)
PushSeriesLongMonotonicIncrease/30-16              724ns ± 1%   709ns ± 1%  -2.04%  (p=0.000 n=11+10)
PushSeriesLongMonotonicIncrease/60-16             1.25µs ± 1%  1.25µs ± 1%    ~     (p=0.070 n=11+12)
PushSeriesLongMonotonicIncrease/120-16            2.35µs ± 1%  2.35µs ± 1%    ~     (p=0.594 n=11+11)
PushSeriesLongMonotonicIncrease/180-16            3.42µs ± 2%  3.43µs ± 1%    ~     (p=0.477 n=12+11)
PushSeriesLongMonotonicIncrease/240-16            4.49µs ± 2%  4.50µs ± 2%    ~     (p=0.504 n=12+12)
PushSeriesLongSawtooth/1-16                        111ns ± 3%   109ns ± 1%  -1.71%  (p=0.000 n=12+11)
PushSeriesLongSawtooth/10-16                       458ns ± 1%   453ns ± 1%  -1.19%  (p=0.000 n=11+11)
PushSeriesLongSawtooth/30-16                      1.06µs ± 1%  1.04µs ± 1%  -1.40%  (p=0.000 n=10+10)
PushSeriesLongSawtooth/60-16                      1.94µs ± 5%  1.91µs ± 1%  -1.99%  (p=0.010 n=12+11)
PushSeriesLongSawtooth/120-16                     3.72µs ± 5%  3.64µs ± 1%  -2.27%  (p=0.000 n=12+10)
PushSeriesLongSawtooth/180-16                     5.62µs ± 1%  5.55µs ± 1%  -1.27%  (p=0.000 n=10+11)
PushSeriesLongSawtooth/240-16                     7.57µs ± 2%  7.42µs ± 1%  -2.07%  (p=0.000 n=11+10)
PushSeriesLongSawtoothWithFlats/1-16               110ns ± 5%   108ns ± 1%  -2.30%  (p=0.000 n=12+10)
PushSeriesLongSawtoothWithFlats/10-16              457ns ± 1%   455ns ± 3%    ~     (p=0.081 n=10+11)
PushSeriesLongSawtoothWithFlats/30-16             1.06µs ± 2%  1.04µs ± 1%  -1.97%  (p=0.000 n=11+11)
PushSeriesLongSawtoothWithFlats/60-16             1.94µs ± 2%  1.91µs ± 1%  -1.75%  (p=0.000 n=11+9)
PushSeriesLongSawtoothWithFlats/120-16            3.80µs ± 2%  3.74µs ± 1%  -1.67%  (p=0.000 n=11+11)
PushSeriesLongSawtoothWithFlats/180-16            5.76µs ± 1%  5.71µs ± 2%  -0.75%  (p=0.005 n=10+12)
PushSeriesLongSawtoothWithFlats/240-16            7.76µs ± 1%  7.64µs ± 1%  -1.47%  (p=0.000 n=10+12)
PushSeriesLongSteps/1-16                           110ns ± 2%   109ns ± 2%    ~     (p=0.085 n=12+12)
PushSeriesLongSteps/10-16                          429ns ± 1%   413ns ± 2%  -3.82%  (p=0.000 n=12+11)
PushSeriesLongSteps/30-16                         1.22µs ± 1%  1.16µs ± 0%  -4.94%  (p=0.000 n=12+10)
PushSeriesLongSteps/60-16                         2.00µs ± 3%  1.93µs ± 1%  -3.37%  (p=0.000 n=12+12)
PushSeriesLongSteps/120-16                        3.89µs ± 1%  3.76µs ± 2%  -3.22%  (p=0.000 n=12+12)
PushSeriesLongSteps/180-16                        5.85µs ± 1%  5.62µs ± 1%  -3.98%  (p=0.000 n=12+10)
PushSeriesLongSteps/240-16                        7.70µs ± 1%  7.49µs ± 2%  -2.80%  (p=0.000 n=11+11)
PushSeriesLongRealWorldCPU/1-16                    109ns ± 1%   109ns ± 1%    ~     (p=0.187 n=11+11)
PushSeriesLongRealWorldCPU/10-16                   460ns ± 2%   447ns ± 4%  -2.96%  (p=0.001 n=11+12)
PushSeriesLongRealWorldCPU/30-16                  1.16µs ± 1%  1.10µs ± 1%  -4.94%  (p=0.000 n=11+11)
PushSeriesLongRealWorldCPU/60-16                  2.24µs ± 1%  2.15µs ± 2%  -4.02%  (p=0.000 n=11+11)
PushSeriesLongRealWorldCPU/120-16                 4.32µs ± 1%  4.16µs ± 1%  -3.78%  (p=0.000 n=11+12)
PushSeriesLongRealWorldCPU/180-16                 6.34µs ± 1%  6.08µs ± 1%  -4.10%  (p=0.000 n=11+11)
PushSeriesLongRealWorldCPU/240-16                 8.61µs ± 2%  8.25µs ± 1%  -4.13%  (p=0.000 n=11+11)
OPT1->OPT2
name                                              old time/op  new time/op  delta
PushSeriesLongMonotonicIncreaseWithResets-16       246ns ±20%   208ns ±27%     ~     (p=0.400 n=3+3)
PushSeriesLongMonotonicIncrease-16                 246ns ± 9%   133ns ±28%     ~     (p=0.100 n=3+3)
PushSeriesLongSawtooth-16                          197ns ±34%   172ns ±45%     ~     (p=0.400 n=3+3)
PushSeriesLongSawtoothWithFlats-16                 196ns ±26%   149ns ±12%     ~     (p=0.200 n=3+3)
PushSeriesLongSteps-16                             227ns ±20%   190ns ±31%     ~     (p=0.400 n=3+3)
PushSeriesLongRealWorldCPU-16                      289ns ±26%   245ns ±27%     ~     (p=0.700 n=3+3)
PushSeriesLongMonotonicIncreaseWithResets/1-16     108ns ± 1%    88ns ± 1%  -18.37%  (p=0.000 n=11+10)
PushSeriesLongMonotonicIncreaseWithResets/10-16    462ns ± 1%   352ns ± 3%  -23.74%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncreaseWithResets/30-16   1.03µs ± 1%  0.76µs ± 1%  -25.95%  (p=0.000 n=11+11)
PushSeriesLongMonotonicIncreaseWithResets/60-16   1.87µs ± 1%  1.38µs ± 2%  -26.42%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncreaseWithResets/120-16  3.61µs ± 3%  2.63µs ± 1%  -27.18%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncreaseWithResets/180-16  5.42µs ± 2%  3.94µs ± 1%  -27.19%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncreaseWithResets/240-16  7.09µs ± 1%  5.19µs ± 1%  -26.84%  (p=0.000 n=11+11)
PushSeriesLongMonotonicIncrease/1-16               108ns ± 1%    86ns ± 1%  -20.48%  (p=0.000 n=10+12)
PushSeriesLongMonotonicIncrease/10-16              365ns ± 1%   286ns ± 1%  -21.73%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncrease/30-16              709ns ± 1%   562ns ± 1%  -20.79%  (p=0.000 n=10+11)
PushSeriesLongMonotonicIncrease/60-16             1.25µs ± 1%  0.97µs ± 1%  -22.27%  (p=0.000 n=12+12)
PushSeriesLongMonotonicIncrease/120-16            2.35µs ± 1%  1.73µs ± 1%  -26.42%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncrease/180-16            3.43µs ± 1%  2.57µs ± 1%  -25.04%  (p=0.000 n=11+12)
PushSeriesLongMonotonicIncrease/240-16            4.50µs ± 2%  3.33µs ± 1%  -26.04%  (p=0.000 n=12+12)
PushSeriesLongSawtooth/1-16                        109ns ± 1%    87ns ± 1%  -20.47%  (p=0.000 n=11+11)
PushSeriesLongSawtooth/10-16                       453ns ± 1%   365ns ± 1%  -19.38%  (p=0.000 n=11+12)
PushSeriesLongSawtooth/30-16                      1.04µs ± 1%  0.84µs ± 1%  -19.49%  (p=0.000 n=10+12)
PushSeriesLongSawtooth/60-16                      1.91µs ± 1%  1.53µs ± 1%  -19.61%  (p=0.000 n=11+12)
PushSeriesLongSawtooth/120-16                     3.64µs ± 1%  2.92µs ± 1%  -19.87%  (p=0.000 n=10+11)
PushSeriesLongSawtooth/180-16                     5.55µs ± 1%  4.52µs ± 1%  -18.52%  (p=0.000 n=11+11)
PushSeriesLongSawtooth/240-16                     7.42µs ± 1%  6.04µs ± 1%  -18.56%  (p=0.000 n=10+12)
PushSeriesLongSawtoothWithFlats/1-16               108ns ± 1%    87ns ± 1%  -19.33%  (p=0.000 n=10+12)
PushSeriesLongSawtoothWithFlats/10-16              455ns ± 3%   367ns ± 1%  -19.28%  (p=0.000 n=11+11)
PushSeriesLongSawtoothWithFlats/30-16             1.04µs ± 1%  0.84µs ± 1%  -18.96%  (p=0.000 n=11+12)
PushSeriesLongSawtoothWithFlats/60-16             1.91µs ± 1%  1.54µs ± 1%  -19.45%  (p=0.000 n=9+12)
PushSeriesLongSawtoothWithFlats/120-16            3.74µs ± 1%  2.98µs ± 1%  -20.40%  (p=0.000 n=11+12)
PushSeriesLongSawtoothWithFlats/180-16            5.71µs ± 2%  4.60µs ± 1%  -19.57%  (p=0.000 n=12+12)
PushSeriesLongSawtoothWithFlats/240-16            7.64µs ± 1%  6.17µs ± 1%  -19.32%  (p=0.000 n=12+12)
PushSeriesLongSteps/1-16                           109ns ± 2%    86ns ± 1%  -20.68%  (p=0.000 n=12+11)
PushSeriesLongSteps/10-16                          413ns ± 2%   331ns ± 1%  -19.74%  (p=0.000 n=11+12)
PushSeriesLongSteps/30-16                         1.16µs ± 0%  0.86µs ± 1%  -25.79%  (p=0.000 n=10+12)
PushSeriesLongSteps/60-16                         1.93µs ± 1%  1.46µs ± 0%  -24.45%  (p=0.000 n=12+10)
PushSeriesLongSteps/120-16                        3.76µs ± 2%  2.78µs ± 1%  -26.17%  (p=0.000 n=12+12)
PushSeriesLongSteps/180-16                        5.62µs ± 1%  4.20µs ± 1%  -25.27%  (p=0.000 n=10+12)
PushSeriesLongSteps/240-16                        7.49µs ± 2%  5.47µs ± 1%  -26.87%  (p=0.000 n=11+11)
PushSeriesLongRealWorldCPU/1-16                    109ns ± 1%    89ns ± 1%  -17.95%  (p=0.000 n=11+12)
PushSeriesLongRealWorldCPU/10-16                   447ns ± 4%   336ns ± 2%  -24.87%  (p=0.000 n=12+12)
PushSeriesLongRealWorldCPU/30-16                  1.10µs ± 1%  0.85µs ± 1%  -23.30%  (p=0.000 n=11+12)
PushSeriesLongRealWorldCPU/60-16                  2.15µs ± 2%  1.62µs ± 1%  -24.52%  (p=0.000 n=11+12)
PushSeriesLongRealWorldCPU/120-16                 4.16µs ± 1%  3.09µs ± 1%  -25.72%  (p=0.000 n=12+12)
PushSeriesLongRealWorldCPU/180-16                 6.08µs ± 1%  4.49µs ± 1%  -26.20%  (p=0.000 n=11+12)
PushSeriesLongRealWorldCPU/240-16                 8.25µs ± 1%  6.17µs ± 1%  -25.17%  (p=0.000 n=11+12)

Dieterbe and others added 5 commits February 1, 2022 12:43
for 2 reasons:
- we want precise control of number of points in a chunk, and testing
  multiple sizes in one run.
- the b.N mechanism is good to get long benchmark runtimes which
  improves the reliability of the results
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Feb 1, 2022

Nice work, btw @shanson7 ! shall we merge this ?

@Dieterbe Dieterbe changed the title revamp tsz refactor revamp sean's bstream write optimizations Feb 1, 2022
Copy link
Collaborator

@shanson7 shanson7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dieterbe Dieterbe merged commit d47932a into master Feb 1, 2022
@Dieterbe Dieterbe deleted the more-tsz-opt-refactor branch February 1, 2022 13:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants