Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent hashing is different than Python #335

Open
github-vincent-miszczak opened this issue Nov 30, 2018 · 10 comments
Open

Consistent hashing is different than Python #335

github-vincent-miszczak opened this issue Nov 30, 2018 · 10 comments

Comments

@github-vincent-miszczak
Copy link
Contributor

I found out that Carbon in Python has different behavior than this project carbon_ch.

This is a severe issue because a tool like carbonate does not sees the same world than carbon-relay-ng. I almost messed a 5TB cluster wanting to purge metrics after a rebalance.

Using server list :

"10.20.34.114:12003", "10.20.39.104:12003", "10.20.40.161:12003", "10.20.35.158:12003", "10.20.37.70:12003", "10.20.40.126:12003", "10.20.33.78:12003", "10.20.39.19:12003", "10.20.42.66:12003", "10.20.34.131:12003", "10.20.38.55:12003", "10.20.41.75:12003", "10.20.32.8:12003", "10.20.37.165:12003"

gives ring positions including:

Position: (uint16) 59417,
Hostname: (string) (len=12) "10.20.40.126",
Instance: (string) "",
DestinationIndex: (int) 5
}
(route.hashRingEntry) {
Position: (uint16) 59482,
Hostname: (string) (len=12) "10.20.40.126",
Instance: (string) "",
DestinationIndex: (int) 5
}

Python gives:

(59417, ('10.20.40.126', None)),
(59418, ('10.20.34.131', None)),
(59482, ('10.20.40.126', None)),

Notice (59418, ('10.20.34.131', None)) exists in Python and not in this implementation.

Having a metric that hashes to 59418 gives destination 10.20.40.126 with carbon-relay-ng and 10.20.34.131 with Python/Carbonate.
If you filter out metrics with carbon-sieve to delete after a rebalance, you may delete metrics that you should not.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 3, 2018

the issue where that python code is introduced is graphite-project/carbon#196 (comment)
it describes 2 different hosts hashing to the same position. but here you're showing the same ip hashing to two different positions (which seems like normal behavior) and a position not existing. how is this the same problem?

@github-vincent-miszczak
Copy link
Contributor Author

Here I'm showing there is a drift between the current implementation and Python one making the two implementations incompatible.
I just picked a range that illustrates the drift. With the provided host list, the range 59417-59482 should have 3 values according to reference implementation.
As you stated, there are only 2 with this one.
I'm not solving graphite-project/carbon#196. I'm solving compatibility issues with the reference implementation.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 3, 2018

but in your PR which fixes this issue, you claim to introduce the equivalent for graphite-project/carbon@024f9e6#diff-1486787206e06af358b8d935577e76f5 which is the solution to graphite-project/carbon#196

so how it is possible that we're not aiming to solve graphite-project/carbon#196 here ?

@github-vincent-miszczak
Copy link
Contributor Author

Because we aim to solve #335 does not mean we aim to solve graphite-project/carbon#196. It may do, but that's a non goal. I never claimed the issues are the same.

I don't know why graphite-project/carbon#196 appeared in this conversation. I just pointed the commit on carbon side that results in a drift, not the whole discussion. While the Python patch came from the discussion, I'm not sure this is relevant to this issue. Python discussion is about muting the hash-ring and replication side effects. carbon-relay-ng uses immutable hash-ring and has not replication support.

The issue we are discussing here is incompatible algorithm between carbon-relay-ng and carbon.

@hamelg
Copy link

hamelg commented Apr 9, 2019

I have multiple carbon-caches receiving metrics from a relay-ng route in consitentHashing mode. Thanks to the parameter CARBONLINK_HOSTS, the Graphite webapp may query the caches for data that has not yet been persisted.
Without the vmiszczak-teads's fix, the webapp looks up data in the wrong cache because the hashing algorithm is not compatible.

@Dieterbe
Copy link
Contributor

so @hamelg you can independently confirm that the behavior from this patch is the right one? i will bump this on my priority list then.

@hamelg
Copy link

hamelg commented Apr 11, 2019

my bad !
No, i don't confirm.
I double checked and I made a mistake.
Here, the current consistent hashing in relay-ng works fine and gives the same result than the graphite stack. Forget my last comment.
I have tried to reproduce the issue with several test cases without success.

@Dieterbe
Copy link
Contributor

so @hamelg to be clear you're saying then the code in master works fine and we should not merge this PR?

@hamelg
Copy link

hamelg commented Apr 12, 2019

Finally, I have succeeded in reproducing the issue with the specific setup provided by vmiszczak-teads.

carbon-relay-ng.conf

[[route]]
key = 'default'
type = 'consistentHashing'
destinations = [
'10.20.34.114:12003 spool=false pickle=false',
'10.20.39.104:12003 spool=false pickle=false',
'10.20.40.161:12003 spool=false pickle=false',
'10.20.35.158:12003 spool=false pickle=false',
'10.20.37.70:12003 spool=false pickle=false',
'10.20.40.126:12003 spool=false pickle=false',
'10.20.33.78:12003 spool=false pickle=false',
'10.20.39.19:12003 spool=false pickle=false',
'10.20.42.66:12003 spool=false pickle=false',
'10.20.34.131:12003 spool=false pickle=false',
'10.20.38.55:12003 spool=false pickle=false',
'10.20.41.75:12003 spool=false pickle=false',
'10.20.32.8:12003 spool=false pickle=false',
'10.20.37.165:12003 spool=false pickle=false'
]

carbonate.conf

[main]
DESTINATIONS = 10.20.34.114:12003,10.20.39.104:12003,10.20.40.161:12003,10.20.35.158:12003,10.20.37.70:12003,10.20.40.126:12003,10.20.33.78:12003,10.20.39.19:12003,10.20.42.66:12003,10.20.34.131:12003,10.20.38.55:12003,10.20.41.75:12003,10.20.32.8:12003,10.20.37.165:12003
REPLICATION_FACTOR = 1
HASHING_TYPE = carbon_ch

Without his patch, the hashing result is different on these nodes :
10.20.32.8
10.20.34.131
10.20.35.158
10.20.39.104
10.20.40.126
10.20.42.66

After applying the patch, carbonate and relay-ng give the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants