Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Mesos-DNS should stay running even when disconnected from Master/ZK #284

Closed
sepiroth887 opened this issue Sep 24, 2015 · 5 comments

Comments

@sepiroth887
Copy link

Currently 2 issues exist:

1.) If the initial ZK connection doesn't success mesos-dns will not start
2.) If the master detection doesn't find a master in 30sec it will crash

Issue 2 is probably the most critical as it has a chance to be fatal during a network partition and will not recover automatically after the network is restored.

Suggestion:

If ZK connection doesn't succeed keep retrying (possibly with a capped backoff to reduce chatter) indefinitely. Continue service requests (e.g. to allow Resolvers to work)

If Master detection fails, don't panic :) just log it and wait for the next reload cycle to try again.

@tsenart
Copy link
Contributor

tsenart commented Sep 24, 2015

SGTM

@sepiroth887
Copy link
Author

Ugh. this actually bit me hard today when trying to migrate a ZK cluster on a running mesos. I will try and see if I can fix this quickly ^^ any suggestions?

What i'm thinking right now is to do a reload regardless of master/zk connection and ensure that the data structures holding the records are kept around.

Also making the first reload async of the connection to masters may be an option i will look into.

@sepiroth887
Copy link
Author

Ok quick update.

It seems to be relatively simple as most of the logic is async already (yay go!)

I think i can get a clean PR for review and feedback ready pretty quickly.

Basically whats can be done is:

Dont log fatal, instead just very verbose.
Dont return on parseState when no master is found. Instead continue on but with an error sent back to ensure its logged and further logic can be placed in the future.

E.g. For static records after the error is checked those could still be added to the rrs maps but outside the InsertState mehod to avoid recreating the maps.

@tsenart
Copy link
Contributor

tsenart commented Sep 28, 2015

Please keep the static records proposal apart from this one.

@sepiroth887
Copy link
Author

I am no worries. Just making sure that the logic is sound and compatible for any non-master driven records if there are/will be any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants