-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track a job's memory usage and abort if the memory usage gets too high #271
Comments
The pros and cons how I see them: Pros:
Cons:
Therefore I don't think it makes sense to implement this. Instead I think it would make sense to:
|
We can get access to memory usage of containers via cgroups, as well as put limits on them: https://docs.docker.com/v1.8/articles/runmetrics/ It's all exposed via the docker API: https://docs.docker.com/v1.8/reference/api/docker_remote_api_v1.21/ I would put that the specific jobs fail consistently as a pro not a con. Deterministic behavior is much better than relying on "self healing" of retrying later when hopefully there's more memory available. Especially as we've seen the issue trigger repeatedly when things like the ros_comm devel jobs trigger in parallel. And hitting OOM conditions causes side effects across the whole machine, such as taking the jenkins slave offline. If we set reasonable max allowed memory values and then give good errors when we go overm you can know that a failed build meant something. This is yet another area where unrepeatability will lead to people ignoreing the results and assuming that they will get better next time. We can easily increase the memory available by either decreasing the number of executors per machine or paying for bigger machines. The challenge is to identify the memory footprint of the jobs. |
Closing due to inactivity. Please consider to contribute a PR if you are interested in this feature. |
There's now the ability to limit memory for docker containers. https://docs.docker.com/config/containers/resource_constraints/ As well as tracking usage: https://docs.docker.com/config/containers/runmetrics/ Reviving this as we're running into issues with running out of RAM here: |
We discovered that one of the reasons for jenkins nodes going offline has been due to OOM killer: #265
We run on high performance machines and there's typically overhead available but we should be able to set a maximum limit to make sure that we don't interfere with other jobs running on the same executor.
We already have a background script which uses psutil to make sure that all subprocesses are cleaned up. I think that script could be extended or a parallel script created to also monitor the total memory usage of the job and abort it if the job uses too much memory. This would allow us to set a maximum memory usage policy and enforce it.
Methods which would work from a quick search:
The text was updated successfully, but these errors were encountered: