Bug 1875946: Pod process container does not correctly reap zombie process with shareProcessNamespace: true #27

psykulsk · 2020-08-16T18:54:11Z

The purpose of this pull request is to add handling the SIGCHLD signal and proper reaping of the zombie processes to the pod binary.

This is to avoid exhaustion of the process table slots in some long running scenarios and to keep the behavior of pods consistent with the vanilla kubernetes distribution, when using shareProcessNamespace: true.

Why

The default configuration of the cri-o deployed by Openshift/OKD (version 4) specifies the pause_command = "/usr/bin/pod". This means that when the user spawns a pod with shareProcessNamespace: true, the pod binary will be launched as the init process in the pod's pid namespace. If one of the containers in that pod has an exec readiness probe configured, that executes some kind of command that may sometimes exceed the readiness probe's timeout, all processes launched by this command will be left as zombie/defunct processes and will never be cleaned up. That's because the /usr/bin/pod, running as the PID 1 process, does not currently handle the SIGCHLD signal and does not reap the zombie processes. Over a longer time span this may fill the process table size available inside the container namespace and cause errors like "Cannot fork" and cause other problems during the pod's main container runtime.

The pause container used in the vanilla kubernetes distribution handles the SIGCHLD and reaps the zombies as can be seen in the source code.

Steps to reproduce on OKD/Openshift 4 cluster

Deploy Openshift/OKD 4 cluster
Create a pod with shareProcessNamespace: true and an exec readiness probe that will timeout. Example of a pod:

apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  shareProcessNamespace: true
  containers:
  - name: ubuntu
    image: ubuntu
    command:
      - /bin/tail
      - -f
      - /dev/null
    readinessProbe:
      exec:
        command:
          - /bin/sh
          - -c
          - sleep 5
      periodSeconds: 3
      timeoutSeconds: 1

Do kubectl exec -it test bash, run the top command, and observe how a new zombie process appears every 3 seconds.

Steps to reproduce with Docker (or other container runtime)

Run a container with the image containing the pod binary. For example, use the image from the latest release of OKD (pod image specified here).

docker run -d --name pod quay.io/openshift/okd-content@sha256:1207114d4db1bdb9431fdcc890b6813e889fdcfba94900396f0ab9ca4f0c5dbd

Exec into the container and run top.

docker exec -it pod top

From a different terminal, create another container in the same pid namespace.

docker run -it  --rm  --pid=container:pod ubuntu bash

From the new container, run a command that will spawn a few children and then exit without waiting.

for i in {1..10}; do $(sleep 2) & done; exit

Observe how new zombie processes appear in the top output.

When you repeat steps mentioned above, but with an image that contains pod binary with the added SIGCHLD handling, no zombie processes are left hanging.

References:

openshift-ci-robot · 2020-08-16T18:54:19Z

Hi @psykulsk. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

psykulsk · 2020-08-17T20:13:43Z

/assign @smarterclayton

psykulsk · 2020-08-18T16:48:35Z

/cc @soltysh

openshift-ci-robot · 2020-08-18T16:48:36Z

@psykulsk: GitHub didn't allow me to request PR reviews from the following users: soltysh.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @soltysh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

psykulsk · 2020-08-21T22:31:13Z

/assign @soltysh

openshift-ci-robot · 2020-08-21T22:31:14Z

@psykulsk: GitHub didn't allow me to assign the following users: soltysh.

Note that only openshift members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @soltysh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

psykulsk · 2020-08-24T10:58:46Z

@smarterclayton @derekwaynecarr @soltysh could you have a look at this or assign someone else? Thanks!

smarterclayton · 2020-09-04T16:57:07Z

@mrunalp can you asisgn a reviewer for this and verify consistency with kube pause.c?

smarterclayton · 2020-09-04T16:57:19Z

/ok-to-test

openshift-ci-robot · 2020-09-04T17:00:12Z

@psykulsk: This pull request references Bugzilla bug 1875946, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.6.0) matches configured target release for branch (4.6.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1875946: Pod process container does not correctly reap zombie process with shareProcessNamespace: true

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mrunalp · 2020-09-04T17:42:39Z

Yeah, looking!

pod/pod.go

mrunalp · 2020-09-10T14:28:40Z

Could you either squash or clean up the commits?

pod/pod.go

psykulsk · 2020-09-10T22:12:32Z

/retest

pod/pod.go

psykulsk · 2020-09-11T13:10:17Z

/retest

pod/pod.go

psykulsk · 2020-09-14T09:57:27Z

/retest

psykulsk · 2020-09-14T10:05:00Z

/retest

psykulsk · 2020-09-14T11:47:42Z

/retest

pod/pod.go

mrunalp · 2020-09-14T18:59:07Z

This looks fine besides 2 final nits. Thanks 👍

mrunalp · 2020-09-14T22:21:32Z

/lgtm

openshift-ci-robot · 2020-09-14T22:21:33Z

@mrunalp: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mrunalp · 2020-09-15T00:36:24Z

/test e2e-aws-upgrade

soltysh

Tagging based on @mrunalp review
/lgtm
/approve

openshift-ci-robot · 2020-09-15T14:16:09Z

@soltysh: changing LGTM is restricted to collaborators

In response to this:

Tagging based on @mrunalp review
/lgtm
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mrunalp · 2020-09-15T15:53:38Z

@smarterclayton ptal for approve/lgtm

smarterclayton · 2020-09-15T18:18:10Z

/lgtm
/approve

openshift-ci-robot · 2020-09-15T18:18:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, psykulsk, smarterclayton, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pod/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-09-15T18:21:46Z

@psykulsk: All pull requests linked via external trackers have merged:

openshift/images#27

Bugzilla bug 1875946 has been moved to the MODIFIED state.

In response to this:

Bug 1875946: Pod process container does not correctly reap zombie process with shareProcessNamespace: true

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

haircommander · 2020-11-23T21:35:47Z

/cherry-pick release-4.5

openshift-cherrypick-robot · 2020-11-23T21:35:53Z

@haircommander: new pull request created: #57

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from derekwaynecarr and smarterclayton August 16, 2020 18:54

openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 16, 2020

openshift-ci-robot assigned smarterclayton Aug 17, 2020

openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 4, 2020

smarterclayton changed the title ~~Handling of SIGCHLD and reaping zombie processes in pod.go~~ Bug 1875946: Pod process container does not correctly reap zombie process with shareProcessNamespace: true Sep 4, 2020

openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Sep 4, 2020

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 4, 2020

mrunalp reviewed Sep 10, 2020

View reviewed changes