feat(metrics): Another statsd metric to measure bucket duplication [INGEST-421] #1128
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a set metric that measures the number of unique bucket keys observed at a point/interval in time. Combined with other metrics this can help us in measuring how much bucket duplication happens because of horizontal scaling.
This requires us to implement some way of hashing bucket keys such that statsd can consume them. BucketKey already implements Hash to be used in hashmaps. We cannot however use the std hasher:
docs for DefaultHasher explicitly state that the hashes may change across rust releases (i guess this would not be too much of a blocker for our purpose, but it's annoying to keep in mind)
SipHasher is deprecated (but could probably be used)
we already have crc32fast in our dependency tree so let's depend on it
explicitly and just use that
there's still one caveat in that the impls of Hash may call different methods depending on cpu architecture (as stated in https://docs.rs/deterministic-hash/1.0.1/deterministic_hash/), but I think we can live with that for now.