New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shut down an idle pilot #27
Conversation
@dsschult could you rebase this on |
7eab86d
to
5e65e85
Compare
Rebased. And now I see tests running. |
@matyasselmeci @rynge what do you think about this? This would change the behavior of the containers quite a bit for our other users so we'd have to communicate it to them |
The main behavior change is |
At this point users just start up the container when they want work from the access points and whether or not they get it is our problem. With this change, sites will have to monitor whether or not their container is running to account for transient lulls in usage. |
This seems similar to https://opensciencegrid.atlassian.net/browse/SOFTWARE-4608. |
That's in a similar vein but I think it means we only want |
…requiring the variable to be set
Wouldn't that delete the log files of that pilot, making it harder to troubleshoot? |
I added a change to make it configurable. Then you can decide which policy you like better at runtime. It now takes Could potentially name it better if you can think of something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a change to make it configurable. Then you can decide which policy you like better at runtime.
All disagreements can be satisfied by more configuration :). This works for me!
Wouldn't that delete the log files of that pilot, making it harder to troubleshoot?
We do tell folks to mount something to /pilot
so the logs should still be around. I think it's probably better than the alternative where you need to be an expert in condor and this container to realize that you're running the container but it's not advertising back to the OSPool.
Currently
STARTD_NOCLAIM_SHUTDOWN
is fixed at 30 minutes, and supervisord restarts condor repeatedly, never exiting the container.This PR allows the idle time to be configured, and shuts down the container if
condor_master
exits normally.