-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM-kill on large PRs #220
Comments
opam.ci.ocaml.org has temporarily moved to toxis.caelum.ci.dev. An HTTP redirect is in place. |
Inspecting with memtrace shows that (among other things probably) the revdeps of all the JS packages are so big that computing that list and sending the build orders to the ocluster creates an enormous amount of in-memory data, that can't be garbage collected since the builds are so numerous (and non trivial). The team has identified this line https://github.com/ocurrent/opam-repo-ci/blob/master/service/pipeline.ml#L99 as being a source of a lot of builds being created. Possible solutions that have been proposed are:
In the meantime the service has been temporarily relocated to a host with sufficient memory. |
The service suffers from repeated segfaults after having been moved to toxis. Increasing stack and files ulimit from 8K to 64K has helped. After some days the process is still running at 46GB. |
Additionally we noticed that the |
We have permanently moved the A full writeup of the incident will appear on http://infra.ocaml.org in the coming week. |
Looks like this has been resolved for now. |
ocaml/opam-repository#23742 takes all of the 32GB of RAM of the server opam-repo-ci is running on and is just crashing and restarting in a loop.
The text was updated successfully, but these errors were encountered: