Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyperloglog sketches #253

Closed
spinkney opened this issue Sep 1, 2022 · 2 comments
Closed

hyperloglog sketches #253

spinkney opened this issue Sep 1, 2022 · 2 comments

Comments

@spinkney
Copy link

spinkney commented Sep 1, 2022

Is it possible to cache sketches of sets which you then want to do count distinct computations of? For example, I want users to be able to compose any number of groups which have each individually been passed through a hyperloglog fit. It's the same hyperloglog storage but each set is stored as a "sketch". Then I can get count distinct of each sketch or I can sum any combination of sketches to get the estimated count distinct of the union of those sketches.

This is relevant for audience estimation in digital and television advertising. See this google paper https://storage.googleapis.com/pub-tools-public-publication-data/pdf/54a28925b11e05b1d8d1cc5c03f171666dc77e8e.pdf.

@spinkney
Copy link
Author

spinkney commented Sep 1, 2022

I believe the merge function will work which I didn't see in the documentation (let me know if I happened to miss it). Found by looking at the source code.

@joshday
Copy link
Owner

joshday commented Sep 1, 2022

merge! is one of the core pieces of OnlineStats. It's covered in the first page of the docs.

@joshday joshday closed this as completed Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants