Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashmod uses fnv #981

Closed
miekg opened this Issue Aug 12, 2015 · 13 comments

Comments

Projects
None yet
4 participants
@miekg
Copy link

miekg commented Aug 12, 2015

Hi,

We see a huge discrepancy in the load hitting our scraping prometheus-ussss. We use hashmod to spread the load. hashmod used fnv which is a "non-cryptographic hash function". I think it makes sense to just use a crypto hash for hashmod, like md5 or sha1 which should result in a more even load distribution.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 12, 2015

What sort of skew are you seeing, and what sort of cardinality does the label you're hashing have?

@miekg

This comment has been minimized.

Copy link
Author

miekg commented Aug 12, 2015

eyeballing some graphs: the skew is 100%. The load of prometheus1 is twice the load of prometheus0. The string we hash is (ultimately) defined by the user and frequently starts with the some prefix, we could do some transformations here, but "just" using a crypto hash function would solve this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 12, 2015

That sounds like a misconfiguration rather than a skew issue. Can you share your configuration?

It's recommended to hash on something like __address__ or instance.

@miekg

This comment has been minimized.

Copy link
Author

miekg commented Aug 12, 2015

No, we can't split this on __address__ or the like, we want to keep stuff that is related in a job on the same prometheus.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 12, 2015

The sharding functions are intended for use when a single job has gotten so big that it won't fit on a single prometheus server. Up to that point we recommend splitting off Prometheus servers based on functionality/team/level of stack.

Given this you're best off running separate Prometheus servers and managing the split yourself, which also avoids all the complexity of a master/slave setup.

@miekg

This comment has been minimized.

Copy link
Author

miekg commented Aug 12, 2015

But why being so hesitant of changing the hash function? Is that because of performance reasons?

I don't think how we structured our setup is particularly strange and it would work perfectly (out-of-the-box) is hashmod was just doing a crypto hash.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 12, 2015

I've no problem changing the hash function if there's significant skew. There's no performance concerns here as it's only calculated when a target is created or updated. How many jobs do you have?

It doesn't sound like this is a good way to do your setup though. It's a very advanced feature meant for a specific use case, It's not meant to split things out on something as coarse as job, rack is about the limit. If you want to use this you need to shard based on address, and then manage the master/slave setup that requires. That's a lot of work so we recommend manually choosing which jobs go where, until you get to a stage that you have jobs that won't fit on one prometheus and you're forced to use sharding.

@miekg

This comment has been minimized.

Copy link
Author

miekg commented Aug 12, 2015

It already does not fit on one prometheus instance :) That's why we are using a federated setup and use hashmod to shared on the job level.

Thanks for considering this request!

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 12, 2015

It already does not fit on one prometheus instance :)

The problem comes when a single job doesn't fit. All your jobs not fitting is much easier to deal with..

Can you give examples of the job names that all got put on the same shard, and how many shards you're using?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Aug 12, 2015

Totally agree, this shouldn't be using FNV. That hash is notorious for only changing the output a little if the input changes only a little. If you have two places that change only a little in the input, you can often even get the same output again, which is what might be biting you here (still would be interesting to get some colliding examples though to confirm). Let's switch this to a better hash. I'm on vacation for the rest of the week, but 👍 in general.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 12, 2015

👍 there are no downsides in switching the hash function. Especially if
it allows for another use-case.

On Wed, Aug 12, 2015, 2:22 PM Julius Volz notifications@github.com wrote:

Totally agree, this shouldn't be using FNV. That hash is notorious for
only changing the output a little if the input changes only a little. If
you have two places that change only a little in the input, you can often
even get the same output again, which is what might be biting you here
(still would be interesting to get some colliding examples though to
confirm). Let's switch this to a better hash. I'm on vacation for the rest
of the week, but [image: 👍] in general.


Reply to this email directly or view it on GitHub
#981 (comment)
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 13, 2015

#982 is merged now. It's not clear this allows for another use case, as job sizes are generally going to be too variable to shard based on.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.