-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsdb: Investigate Postings Compression #5876
Comments
Postings is lists of series references that contain a given label pair associated with the list. |
The Golang implementation link no longer works. I think this link points to the same implementation but the implementation linked by @gouthamve has been replaced. I am planning to add this to my gsoc proposal, I tested a couple of things and I have a couple of doubts. I found one example for using assembly with go in the crypto/blake2b package. as far as I understand this will require writing different assembly for different architectures either entirely by hand or like this using asm2plan9s.The golang reference implementation that @gouthamve linked used PeachPy but I've not looked into it properly yet. As far as the implementation goes, I'd like to rewrite parts of the mentioned library to fit our needs and also to understand it better. Please provide me some ideas on how the implementation should be done.Also, benchmarking would be a follow up task for this i assume. |
I don't think we should be using assembly, compression shouldn't require it
and we'd need to support all platforms Go supports in the future
…On Mon 8 Apr 2019, 04:28 Hrishikesh Barman, ***@***.***> wrote:
The Golang implementation link no longer works. I think this link
<https://github.com/robskie/bp128> points to the same implementation but
the implementation linked by @gouthamve <https://github.com/gouthamve> has
been replaced <dgraph-io/dgraph#2719>.
I am planning to add this to my gsoc proposal, I tested a couple of things
and I have a couple of doubts.
Since goland has no inbuilt support for SIMD this will involve writing
assembly, right? One of the tasks will be to improve the current postings intersection
implementation
<https://github.com/prometheus/tsdb/blob/master/index/postings.go#L299>,
I am still trying to understand the compression part with relation to
postings and the linked paper.
I found one example for using assembly with go in the crypto/blake2b
<https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.go>
package. as far as I understand this will require writing different
assembly for different architectures either entirely by hand or like this
<https://github.com/minio/sha256-simd> using asm2plan9s
<https://github.com/minio/asm2plan9s>.The golang reference implementation
that @gouthamve <https://github.com/gouthamve> linked used PeachPy
<https://pypi.org/project/PeachPy/> but I've not looked into it properly
yet.
As far as the implementation goes, I'd like to rewrite parts of the
mentioned library to fit our needs and also to understand it better. Please
provide me some ideas on how the implementation should be done.Also,
benchmarking would be a follow up task for this i assume.
cc @krasi-georgiev <https://github.com/krasi-georgiev>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/prometheus/tsdb/issues/234#issuecomment-480661895>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdjIQHSuhW5cT8KPgS46tOJzRdO4oks5veqlggaJpZM4RNN8e>
.
|
@geekodour yes compression will not need assembly. I think @gouthamve mentioned SIMD as this will speed up the sorting and intersection which is performed on every Posting selections. |
If you do need ASM for any part, you can checkout go assembler if it provides any architecture independent virtual assembly (Arch specific ASM is a no-go accorging to me). But given ASM would not be simple to maintain, you can give it a lower priority. |
I have been looking a bunch into this topic so I'd be interested in this. BP128 seems like it would be a good candidate as it is relatively close to what we already have and wouldn't entail a huge rewrite. How would someone approach implementing this change in Prometheus? Of course, a pre-requisite is to write a good BP128 implementation with the constraints mentioned previously but what then? Do I just raise a pull request to Prometheus and we go from there? |
Hello from the bug scrub. #13242 is in now. It seems to be used by Thanos. Would that enable optimization of Prometheus itself, too? @GiedriusS would you like to work on this as suggested above? And yes, you could just create a PR against Prometheus. |
We also need #13567 to be able to pass a custom decoder. After this, my plan is to finish implementing it in Thanos, test it out in prod, and then I will come back to Prometheus itself to port those changes if the results are good. |
We have large postings lists stored that are intersected for every single query, we can do better in terms of compression and intersection.
BP-128 is a good candidate with a Golang implementation. The issue here is that a native golang implementation doesn't exist and that means it doesn't work on ARM and other architectures.
The text was updated successfully, but these errors were encountered: