Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework holding onto offers and blocking cluster #3

Open
ianburrell opened this issue Aug 31, 2016 · 0 comments
Open

Framework holding onto offers and blocking cluster #3

ianburrell opened this issue Aug 31, 2016 · 0 comments

Comments

@ianburrell
Copy link

Resources on the Mesos console show this when running mesos-distcc with "-j8" and 16 CPU cluster (and one other task running).

Total 16 121.4 GB
Used 9 6.0 GB
Offered 7 115.4 GB
Idle 0 0 B

mesos-distcc is using 8 CPU as expected, but holding onto 7 offered CPUs and blocking use of cluster by other users (including other mesos-distcc runs).

One problem is that the declineOffer when tasks have already been started does "return". Any remaining offers in list won't be declined.

Even with that bug fixed, the framework doesn't seem to be declining the offers. My suspicion is that starting the sub-processs in statusUpdate is blocking any communication with Mesos. It is possible that the declineOffer could not be sent. The framework docs mention that Scheduler callbacks should not block.

My guess is that mesos-distcc needs to run the command either in the background and catch signal when it exits. Or run the scheduler and runner in parallel and use multiprocessing.Condition to signal that ready to run command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant