-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker driver OOM killed detection flaky on cgroups v2 #13119
Comments
I've been able to reproduce this locally on a VM running ubuntu 22.04 (Jammy), which is configured with cgroups v2, but I've been unable to reproduce with any cgroups v1 configuration. With the following patch: $ git diff
diff --git a/drivers/docker/handle.go b/drivers/docker/handle.go
index 0783bd901..c0f787a44 100644
--- a/drivers/docker/handle.go
+++ b/drivers/docker/handle.go
@@ -11,6 +11,7 @@ import (
"time"
"github.com/armon/circbuf"
+ "github.com/davecgh/go-spew/spew"
docker "github.com/fsouza/go-dockerclient"
"github.com/hashicorp/consul-template/signals"
hclog "github.com/hashicorp/go-hclog"
@@ -239,6 +240,11 @@ func (h *taskHandle) run() {
container, ierr := h.waitClient.InspectContainerWithOptions(docker.InspectContainerOptions{
ID: h.containerID,
})
+
+ scs := spew.NewDefaultConfig()
+ scs.DisableMethods = true
+ h.logger.Warn("container.State", "state", scs.Sdump(container.State))
+
oom := false
if ierr != nil {
h.logger.Error("failed to inspect container", "error", ierr) I can see the
I traced this through go-dockerclient to moby/moby and ended finding this very similar test failure on the moby project: moby/moby#41929. From there I dug down to this PR in containerd containerd/containerd#6323 where the field they're reading from was changed. This was released in So we can't really fix this even by updating our moby dependency until upstream has done the same. I'm going to add a flag to the test to disable it in the case of cgroups v2 with a pointer to this issue, and then mark this issue as waiting for upstream. I'll also drop a note over in moby/moby#41929 pointing them to the containerd PR. |
#13928 will remove the test flake. I've updated the labels for this issue to make it clear this is a problem with the driver and not the test. |
Wow nice detective work @tgross, thanks |
We recently faced an issue in prod (on cgroups v2, Ubuntu 22.04), where the docker container restarted just few seconds after a restart. We think a memory build-up by the app in such a short span of time is impossible (as the normal memory usage is ~1GB, while the limits are 24GB). Just wanted to confirm, is the issue we faced related to the above issue being discussed? |
Hi @mr-karan, I believe a docker container can exit with 137 whether the issue is
Usually at least one of That said, I'm currently in the process of upgrading our docker libraries which are a couple of years old, and may contain bug fixes / improvements in this area. |
This test has started failing a lot, maybe 22.04 related
The text was updated successfully, but these errors were encountered: