Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't allow proxying requests to self #497

Merged
merged 1 commit into from
Apr 8, 2016

Conversation

ryane
Copy link
Contributor

@ryane ryane commented Jul 16, 2015

Even with the change in #463, it still appears to be possible to end up in a state where a supposed non-leader instance ends up getting itself as the leader path. When this happens, in RedirectFilter, jobScheduler.isLeader == false but jobScheduler.getLeader returns the path of the current instance. This results in chronos redirecting all requests to itself, and, ultimately an unresponsive REST api. It never appears to be able to recover when in this state. If the other chronos instances think that the stuck node is the leader, they will also end up being unresponsive resulting in an unusable chronos cluster.

Unfortunately, it is a bit difficult to reproduce the problem. One way I have found where I can duplicate it with some consistency is to have 3 servers, each running zookeeper, chronos, and mesos. Then, reboot each server serially. Often (but not always), one of the Chronos instances will have proxied all requests to itself and will be unresponsive.

I am not very familiar with the chronos code (or scala for that matter) so there may be a better way to handle this. And, it does not address the root cause of why and how JobScheduler ends up thinking it is not a leader but yet still returns the current instance's path from getLeader - I have not been able to figure that out. But, this commit does seem to prevent chronos from proxying requests to itself and ending up in the unresponsive state.

@kolloch
Copy link

kolloch commented Aug 5, 2015

Hi @ryane, thanks for your pull request.

If we get a request and we do not have consistent leadership information, we should probably wait for consistent leadership information or reject the request.

That's what we have done for Marathon. See here:

https://github.com/mesosphere/marathon/blob/master/src/main/scala/mesosphere/marathon/api/LeaderProxyFilter.scala#L121-L152

@pdericson
Copy link

👍

@brndnmtthws
Copy link
Member

Thanks!

@brndnmtthws brndnmtthws merged commit 8c77669 into mesos:master Apr 8, 2016
gkleiman pushed a commit that referenced this pull request Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants