Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdb: Investigate Postings Compression #5876

Open
gouthamve opened this issue Dec 27, 2017 · 8 comments
Open

tsdb: Investigate Postings Compression #5876

gouthamve opened this issue Dec 27, 2017 · 8 comments

Comments

@gouthamve
Copy link
Member

We have large postings lists stored that are intersected for every single query, we can do better in terms of compression and intersection.

BP-128 is a good candidate with a Golang implementation. The issue here is that a native golang implementation doesn't exist and that means it doesn't work on ARM and other architectures.

@krasi-georgiev
Copy link
Contributor

Postings is lists of series references that contain a given label pair associated with the list.
map[labels.Label][]uint64 - foo,bar - 1,3,12,14 - this means label foo with value bar is in series with ID 1,3….. - this is used as a reference table to get the series we need.

@geekodour
Copy link
Member

The Golang implementation link no longer works. I think this link points to the same implementation but the implementation linked by @gouthamve has been replaced.

I am planning to add this to my gsoc proposal, I tested a couple of things and I have a couple of doubts.
Since goland has no inbuilt support for SIMD this will involve writing assembly, right? One of the tasks will be to improve the current postings intersection implementation, I am still trying to understand the compression part with relation to postings and the linked paper.

I found one example for using assembly with go in the crypto/blake2b package. as far as I understand this will require writing different assembly for different architectures either entirely by hand or like this using asm2plan9s.The golang reference implementation that @gouthamve linked used PeachPy but I've not looked into it properly yet.

As far as the implementation goes, I'd like to rewrite parts of the mentioned library to fit our needs and also to understand it better. Please provide me some ideas on how the implementation should be done.Also, benchmarking would be a follow up task for this i assume.

cc @krasi-georgiev

@brian-brazil
Copy link
Contributor

brian-brazil commented Apr 8, 2019 via email

@krasi-georgiev
Copy link
Contributor

@geekodour yes compression will not need assembly. I think @gouthamve mentioned SIMD as this will speed up the sorting and intersection which is performed on every Posting selections.
You shouldn't need to mention any technical details in your proposal. Something like Research -> implementation -> unit tests should be sufficient.

@codesome
Copy link
Member

codesome commented Apr 8, 2019

If you do need ASM for any part, you can checkout go assembler if it provides any architecture independent virtual assembly (Arch specific ASM is a no-go accorging to me). But given ASM would not be simple to maintain, you can give it a lower priority.

@bwplotka bwplotka changed the title Investigate Postings Compression tsdb: Investigate Postings Compression Aug 13, 2019
@bwplotka bwplotka transferred this issue from prometheus-junkyard/tsdb Aug 13, 2019
@GiedriusS
Copy link
Contributor

I have been looking a bunch into this topic so I'd be interested in this. BP128 seems like it would be a good candidate as it is relatively close to what we already have and wouldn't entail a huge rewrite. How would someone approach implementing this change in Prometheus? Of course, a pre-requisite is to write a good BP128 implementation with the constraints mentioned previously but what then? Do I just raise a pull request to Prometheus and we go from there?

@beorn7
Copy link
Member

beorn7 commented Feb 27, 2024

Hello from the bug scrub.

#13242 is in now. It seems to be used by Thanos. Would that enable optimization of Prometheus itself, too? @GiedriusS would you like to work on this as suggested above? And yes, you could just create a PR against Prometheus.

@GiedriusS
Copy link
Contributor

We also need #13567 to be able to pass a custom decoder. After this, my plan is to finish implementing it in Thanos, test it out in prod, and then I will come back to Prometheus itself to port those changes if the results are good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants