Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

Merged
merged 2 commits into from Dec 15, 2022

Conversation

jhrozek
Copy link

@jhrozek jhrozek commented Dec 14, 2022

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

  • the holiday season starts soon and reviews might take longer
  • the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram
  • I want to get the PR through the OCP CI to see if there's any failures
  • I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Dec 14, 2022
@openshift-ci-robot
Copy link

@jhrozek: This pull request references Jira Issue OCPBUGS-3690, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @xiaojiey

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

  • the holiday season starts soon and reviews might take longer
  • the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram
  • I want to get the PR through the OCP CI to see if there's any failures
  • I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from xiaojiey December 14, 2022 13:39
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 14, 2022
@openshift-ci openshift-ci bot requested a review from Wiharris December 14, 2022 13:41
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2022
@jhrozek
Copy link
Author

jhrozek commented Dec 14, 2022

@xiaojiey to test, you can deploy with make deploy-openshift-dev as there are no packaging changes. Then the steps to reproduce in your bug should no longer be reproducable.

@jhrozek
Copy link
Author

jhrozek commented Dec 14, 2022

flaky tests are flaky, sometimes
/test e2e-flaky

@jhrozek
Copy link
Author

jhrozek commented Dec 14, 2022

/test e2e-flaky

@jhrozek
Copy link
Author

jhrozek commented Dec 14, 2022

OK, turns out that the flaky tests are failing in the upstream PR as well.
@xiaojiey when you haeve a chance, testing the PR might be welcome to test the general approach, but it looks like some CI debugging is needed.

This is mostly concerning SELinux because even containers that don't
seemingly do anything at least record some syscalls, e.g. exec.

But with SELinux, if a container only performs actions that are allowed
by the container policy in the first place, no AVCs would be generated.
In that case, the call to get AVCs would return with a rather generic
error, which would then be retried by the profilerecording reconciler.
As a consequence, if a pod had multiple containers and the one with no
AVCs was collected before others, the profilerecorder would never reach
the other containers.

Instead, let's return an error matching the situation and handle the
absence of AVCs gracefully, by emptying whatever there might be in the
grpc cache and moving on to the next container.
In the flaky tests, I've sometimes seen failures like:
```
  Warning  Failed          1s               kubelet            Error: setup seccomp: from field: unable to load local profile "/var/lib/kubelet/seccomp/operator/spo-binding-enabled/profile-allow-unsafe.json": open /var/lib/kubelet/seccomp/operator/spo-binding-enabled/profile-allow-unsafe.json: no such file or directory
```

I think they are caused by us creating the profiles and not checking if
they actually exist before using them.
@openshift-ci
Copy link

openshift-ci bot commented Dec 15, 2022

@jhrozek: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xiaojiey
Copy link

With payload 4.13.0-0.nightly-2022-12-12-210406 and code in the PR, tried several times, I didn't reproduce the bug.
Steps:

  1. install SPO
  2. Enable log Enrisher
  3. Create a new namespace mytest. To record by using the enricher, create a ProfileRecording which is using recorder
  4. Create the severice account with privileged permission:
  5. Add label for ns and create deployment to be recorded:
    $ oc label ns mytest security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged --overwrite=true
    $ oc apply -f -<<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: hello-openshift
    namespace: mytest
    spec:
    replicas: 3
    selector:
    matchLabels:
    app: hello-openshift
    template:
    metadata:
    labels:
    app: hello-openshift
    spec:
    serviceAccountName: spo-record-sa
    initContainers:
    • name: wait
      image: quay.io/openshifttest/centos:centos7
      command: ["/bin/sh", "-c", "env"]
      containers:
    • name: hello-openshift
      image: quay.io/openshifttest/hello-openshift:multiarch
      ports:
      • containerPort: 8080
        readinessProbe:
        tcpSocket:
        port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
    • name: hello-openshift2
      image: quay.io/openshifttest/hello-openshift:multiarch-fedora
      ports:
      • containerPort: 8081
        readinessProbe:
        tcpSocket:
        port: 8081
        initialDelaySeconds: 5
        periodSeconds: 5
        EOF
        deployment.apps/hello-openshift created
        $ oc get selinuxprofiles.security-profiles-operator.x-k8s.io
        NAME USAGE STATE
        test-recording-hello-openshift-2hhdj test-recording-hello-openshift-2hhdj_mytest.process Partial
        test-recording-hello-openshift-kv29z test-recording-hello-openshift-kv29z_mytest.process Partial
        test-recording-hello-openshift-vg9r2 test-recording-hello-openshift-vg9r2_mytest.process Partial
        test-recording-hello-openshift2-2hhdj test-recording-hello-openshift2-2hhdj_mytest.process Partial
        test-recording-hello-openshift2-kv29z test-recording-hello-openshift2-kv29z_mytest.process Partial
        test-recording-hello-openshift2-vg9r2 test-recording-hello-openshift2-vg9r2_mytest.process Partial

@xiaojiey
Copy link

/qe-approved

Copy link

@Vincent056 Vincent056 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2022
@openshift-ci
Copy link

openshift-ci bot commented Dec 15, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jhrozek jhrozek added qe-approved Signifies that QE has signed off on this PR px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels Dec 15, 2022
@jhrozek
Copy link
Author

jhrozek commented Dec 15, 2022

Adding no-FF labels as QE tested the fix and there is no impact on docs or PX.

@jhrozek
Copy link
Author

jhrozek commented Dec 15, 2022

/hold cancel
In order to speed up downstream, I'm going to cancel the hold and merge.
In case upstream requests changes to the code, I'm going to revert this PR and send another one with the upstreamed code.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2022
@openshift-merge-robot openshift-merge-robot merged commit 042bb06 into openshift:main Dec 15, 2022
@openshift-ci-robot
Copy link

@jhrozek: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-3690 has been moved to the MODIFIED state.

In response to this:

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

  • the holiday season starts soon and reviews might take longer
  • the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram
  • I want to get the PR through the OCP CI to see if there's any failures
  • I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. docs-approved Signifies that Docs has signed off on this PR jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants