Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use []byte when reading postings offset table #3436

Merged

Conversation

colega
Copy link
Contributor

@colega colega commented Nov 11, 2022

What this PR does

TL;DR: don't try to be too clever, compiler already is.

In a previous PR I changed the labels reading to use an unsafe string backed by the original buffer slice, and then copying those strings manually when needed to store them.

It happens that those optimizations already exist in the compiler, and looking up map[string([]byte{...})] does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Which issue(s) this PR fixes or relates to

Followup on #3397

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

TL;DR: don't try to be too clever, compiler already is.

In a previous PR #3397 I changed
the labels reading to use an unsafe string backed by the original buffer
slice, and then copying those strings manually when needed to store
them.

It happens that those optimizations already exist in the compiler, and
looking up `map[string([]byte{...})]` does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even
less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega requested a review from a team as a code owner November 11, 2022 10:02
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega
Copy link
Contributor Author

colega commented Nov 11, 2022

It's interesting to note the overall improvement after the three PRs (#3393 #3397 #3436) for this:

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    2.72ms ± 4%    1.05ms ± 2%  -61.34%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark    1.70MB ± 0%    0.13MB ± 0%  -92.54%  (p=0.029 n=4+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     40.2k ± 0%      0.7k ± 0%  -98.27%  (p=0.008 n=5+5)

@colega colega merged commit 5b115e6 into main Nov 11, 2022
@colega colega deleted the dont-use-unsafe-strings-when-reading-postings-offset-table branch November 11, 2022 12:17
masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022
* Use []byte when reading postings offset table

TL;DR: don't try to be too clever, compiler already is.

In a previous PR grafana#3397 I changed
the labels reading to use an unsafe string backed by the original buffer
slice, and then copying those strings manually when needed to store
them.

It happens that those optimizations already exist in the compiler, and
looking up `map[string([]byte{...})]` does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even
less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants