Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent data corruption upon GUID duplication between master and slave netdata instances #5511

Conversation

@paulkatsoulakis
Copy link
Contributor

paulkatsoulakis commented Feb 26, 2019

Summary

patch to mitigate data corruption during replication of metrics, when duplication of machine GUID occurs between master/slave instances of netdata.
Reference issue #5488

Component Name

netdata/netdata/streaming

Additional Information

None

… processing when the GUID is mistakenly set same on both

We found this issue while creating a container, set it up with netdata and then clone it to get another one up and running with the same setup.
We need this check to prevent data conflicts as the GUID is considered to be unique across the universe.

TODO: I dont have enough visibility over the rates that this string comparison will be executed, we might need to reconsider another faster approach to compare the GUIDs
@paulkatsoulakis paulkatsoulakis requested review from ktsaou and mfundul as code owners Feb 26, 2019
@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Feb 26, 2019

CLA assistant check
All committers have signed the CLA.

@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Feb 26, 2019

In @ktsaou's PR I also see a close(fd) on that branch of the code. Isn't that needed?

@paulkatsoulakis

This comment has been minimized.

Copy link
Contributor Author

paulkatsoulakis commented Feb 26, 2019

In @ktsaou's PR I also see a close(fd) on that branch of the code. Isn't that needed?
Yikes, you 're right. The fd is passed through and if not used need to be released. I 'll update the PR

@paulkatsoulakis

This comment has been minimized.

Copy link
Contributor Author

paulkatsoulakis commented Feb 26, 2019

Not quite sure what happened with LGTM after the second commit.
I can't access its logs for the details

@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Feb 26, 2019

Retrying analysis, it seems to have stopped without a message.

@@ -870,6 +870,12 @@ static int rrdpush_receive(int fd

if(!strcmp(machine_guid, "localhost"))
host = localhost;

This comment has been minimized.

Copy link
@ktsaou

ktsaou Feb 26, 2019

Member

This is a case I don't recall.
If host == localhost, then should we accept the metrics?

I think I had implemented this case to support the possibility of remote data collection for parts of a system. But I don't think this is used anywhere...

This comment has been minimized.

Copy link
@cakrit

cakrit Feb 27, 2019

Contributor

I don't see any way for machine_guid to get the value localhost, other than a user explicitly putting localhost in /var/lib/netdata/registry/netdata.public.unique.id. Agree that it looks like a useless piece of code.

This comment has been minimized.

Copy link
@paulkatsoulakis

paulkatsoulakis Feb 27, 2019

Author Contributor

I was wondering myself what is that check over there. I assumed is some kind of "hack" to replay some kind of a use case when machine_guid string is manually set to "localhost".

If that is not the case, then it is very likely we can just remove that check

This comment has been minimized.

Copy link
@cakrit

cakrit Feb 27, 2019

Contributor

Yes, remove it please, I approve and you merge.

@cakrit
cakrit approved these changes Feb 27, 2019
@ktsaou
ktsaou approved these changes Feb 28, 2019
@cakrit cakrit mentioned this pull request Feb 28, 2019
@cakrit cakrit merged commit c05954c into netdata:master Mar 1, 2019
12 checks passed
12 checks passed
Header rules - netdata No header rules processed
Details
Pages changed - netdata 2 new files uploaded
Details
Redirect rules - netdata No redirect rules processed
Details
Codacy/PR Quality Review Up to standards. A positive pull request.
Details
LGTM analysis: C/C++ No new or fixed alerts
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No code changes detected
Details
Mixed content - netdata No mixed content detected
Details
Travis CI - Pull Request Build Passed
Details
WIP Ready for review
Details
license/cla Contributor License Agreement is signed.
Details
netlify/netdata/deploy-preview Deploy preview ready!
Details
Kiku-Reise added a commit to Kiku-Reise/netdata that referenced this pull request Mar 4, 2019
…e netdata instances (netdata#5511)

* Do a GUID comparison between slave and receiver to avoid conflicts in processing when the GUID is mistakenly set same on both

We found this issue while creating a container, set it up with netdata and then clone it to get another one up and running with the same setup.
We need this check to prevent data conflicts as the GUID is considered to be unique across the universe.

TODO: I dont have enough visibility over the rates that this string comparison will be executed, we might need to reconsider another faster approach to compare the GUIDs

* Dont forget to cleanup resources properly upon early exit

* Remove mysterious string compare, fix indentation in other part of the code
@paulkatsoulakis paulkatsoulakis deleted the paulkatsoulakis:patch-masterslave-guid-duplication-issue branch Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.