-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch s2i build containers that are killed due to OOM #15032
Comments
/cc |
If the build container were to be started by kubernetes, would the |
if it were started by k8s and if it were being killed by k8s due to resource constraints, i would expect the prestop hook to be honored (though i'm not certain). However in this case neither of those things are true. The container isn't managed by k8s, and it's being killed by the host operating system (I believe) which is obviously not going to invoke any hooks even if it were part of a k8s pod. |
I noticed we've now got However the build status returns "GenericBuildFailed" but where as the web console seems to be smart enough to return Oom killed. Should build status also be updated to reflect it was killed due to Oom? |
Where do you see the oom killed status being reported? And are you sure it was the assemble container that was oom killed (not the build pod container) in that case? |
(the web console has no awareness of the existence of the manually launched assemble container, nor will the pod resource have any information about it, so I'm a bit surprised by what you're saying the web console is detecting/reporting). |
The build pod got the status OOMKilled. I guess this is not the same as the assemble container. The web console was just reporting the build pod status. |
ok yeah that makes sense. We could potentially do a better job of reporting the build failure reason when the build pod is oom killed, but that's separate from the main (and imho more likely because the assemble container uses more resources than the build pod container) problem this issue describes, where the assemble container gets oom killed. |
If an assemble container is started for the build process, then what would be the reason the build pod gets OOMKilled? |
well it's still a pod like any other, subject to memory pressure the system could decide to oom kill it to free resources. why that particular pod would be chosen, i'm not sure, it doesn't seem like the most likely candidate. |
i'm not sure if we have a way to tell that the container we launched (e.g. assemble container) got OOM killed. This is probably just worth some quick experimentation/research and if it's not clearly possible for us to know why the container died, close this as won't/can't fix. |
Automatic merge from submit-queue (batch tested with PRs 16777, 16811, 16823, 16808, 16833). bump(github.com/openshift/source-to-image): a0e78cce863f296bfb9bf77ac… …5acd152dc059e32 Fixes #15032 @openshift/devex fyi / ptal
As the s2i assemble container is launched by direct access to the docker socket, it's currently not possible to catch or log any error message to the user on why their build failed.
Based on what I am seeing, a SIGKILL is being sent to the container, so it's not possible to catch anything like a SIGTERM from within the container to at least display an error message in the logs. Users are often left confused wondering why their build suddenly died.
Instead they seem to currently only get "Assemble failed".
This probably also doesn't provide the best experience as the web console doesn't currently provide a way to configure the build resources (except through yaml).
The text was updated successfully, but these errors were encountered: