OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

jhrozek · 2022-12-14T13:39:33Z

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

the holiday season starts soon and reviews might take longer
the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram
I want to get the PR through the OCP CI to see if there's any failures
I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

openshift-ci-robot · 2022-12-14T13:39:38Z

@jhrozek: This pull request references Jira Issue OCPBUGS-3690, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.13.0) matches configured target version for branch (4.13.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @xiaojiey

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

the holiday season starts soon and reviews might take longer

the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram

I want to get the PR through the OCP CI to see if there's any failures

I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jhrozek · 2022-12-14T13:43:42Z

@xiaojiey to test, you can deploy with make deploy-openshift-dev as there are no packaging changes. Then the steps to reproduce in your bug should no longer be reproducable.

jhrozek · 2022-12-14T16:02:36Z

flaky tests are flaky, sometimes
/test e2e-flaky

jhrozek · 2022-12-14T17:28:01Z

/test e2e-flaky

jhrozek · 2022-12-14T21:11:41Z

OK, turns out that the flaky tests are failing in the upstream PR as well.
@xiaojiey when you haeve a chance, testing the PR might be welcome to test the general approach, but it looks like some CI debugging is needed.

This is mostly concerning SELinux because even containers that don't seemingly do anything at least record some syscalls, e.g. exec. But with SELinux, if a container only performs actions that are allowed by the container policy in the first place, no AVCs would be generated. In that case, the call to get AVCs would return with a rather generic error, which would then be retried by the profilerecording reconciler. As a consequence, if a pod had multiple containers and the one with no AVCs was collected before others, the profilerecorder would never reach the other containers. Instead, let's return an error matching the situation and handle the absence of AVCs gracefully, by emptying whatever there might be in the grpc cache and moving on to the next container.

In the flaky tests, I've sometimes seen failures like: ``` Warning Failed 1s kubelet Error: setup seccomp: from field: unable to load local profile "/var/lib/kubelet/seccomp/operator/spo-binding-enabled/profile-allow-unsafe.json": open /var/lib/kubelet/seccomp/operator/spo-binding-enabled/profile-allow-unsafe.json: no such file or directory ``` I think they are caused by us creating the profiles and not checking if they actually exist before using them.

openshift-ci · 2022-12-15T14:16:38Z

@jhrozek: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

xiaojiey · 2022-12-15T14:35:33Z

With payload 4.13.0-0.nightly-2022-12-12-210406 and code in the PR, tried several times, I didn't reproduce the bug.
Steps:

install SPO
Enable log Enrisher
Create a new namespace mytest. To record by using the enricher, create a ProfileRecording which is using recorder
Create the severice account with privileged permission:
Add label for ns and create deployment to be recorded:
$ oc label ns mytest security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged --overwrite=true
$ oc apply -f -<<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-openshift
namespace: mytest
spec:
replicas: 3
selector:
matchLabels:
app: hello-openshift
template:
metadata:
labels:
app: hello-openshift
spec:
serviceAccountName: spo-record-sa
initContainers:
- name: wait
  image: quay.io/openshifttest/centos:centos7
  command: ["/bin/sh", "-c", "env"]
  containers:
- name: hello-openshift
  image: quay.io/openshifttest/hello-openshift:multiarch
  ports:
  - containerPort: 8080
    readinessProbe:
    tcpSocket:
    port: 8080
    initialDelaySeconds: 5
    periodSeconds: 5
- name: hello-openshift2
  image: quay.io/openshifttest/hello-openshift:multiarch-fedora
  ports:
  - containerPort: 8081
    readinessProbe:
    tcpSocket:
    port: 8081
    initialDelaySeconds: 5
    periodSeconds: 5
    EOF
    deployment.apps/hello-openshift created
    $ oc get selinuxprofiles.security-profiles-operator.x-k8s.io
    NAME USAGE STATE
    test-recording-hello-openshift-2hhdj test-recording-hello-openshift-2hhdj_mytest.process Partial
    test-recording-hello-openshift-kv29z test-recording-hello-openshift-kv29z_mytest.process Partial
    test-recording-hello-openshift-vg9r2 test-recording-hello-openshift-vg9r2_mytest.process Partial
    test-recording-hello-openshift2-2hhdj test-recording-hello-openshift2-2hhdj_mytest.process Partial
    test-recording-hello-openshift2-kv29z test-recording-hello-openshift2-kv29z_mytest.process Partial
    test-recording-hello-openshift2-vg9r2 test-recording-hello-openshift2-vg9r2_mytest.process Partial

xiaojiey · 2022-12-15T14:35:49Z

/qe-approved

Vincent056

/lgtm

openshift-ci · 2022-12-15T19:23:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Vincent056,jhrozek]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jhrozek · 2022-12-15T20:04:22Z

Adding no-FF labels as QE tested the fix and there is no impact on docs or PX.

jhrozek · 2022-12-15T20:05:27Z

/hold cancel
In order to speed up downstream, I'm going to cancel the hold and merge.
In case upstream requests changes to the code, I'm going to revert this PR and send another one with the upstreamed code.

openshift-ci-robot · 2022-12-15T20:08:50Z

@jhrozek: All pull requests linked via external trackers have merged:

openshift/security-profiles-operator#20

Jira Issue OCPBUGS-3690 has been moved to the MODIFIED state.

In response to this:

Backports kubernetes-sigs#1378 which fixes https://issues.redhat.com/browse/OCPBUGS-3690

It's unusual to send a backport before an upstream patch is merged, but I'm doing that because:

the holiday season starts soon and reviews might take longer

the upstream CI tests that run the upstream e2e tests on Flatcar are not working at the moment which is blocking merges upstram

I want to get the PR through the OCP CI to see if there's any failures

I want to give QE a head start testing the PR

We can
/hold
the PR until the original upstream PR is reviewed and merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested a review from xiaojiey December 14, 2022 13:39

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 14, 2022

openshift-ci bot requested a review from Wiharris December 14, 2022 13:41

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2022

jhrozek force-pushed the ocp-record_empty branch from 3e19d23 to 7d9fe55 Compare December 15, 2022 10:09

jhrozek added 2 commits December 15, 2022 14:06

jhrozek force-pushed the ocp-record_empty branch from 7d9fe55 to 4adf95a Compare December 15, 2022 13:10

Vincent056 approved these changes Dec 15, 2022

View reviewed changes

openshift-ci bot assigned Vincent056 Dec 15, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2022

jhrozek added qe-approved Signifies that QE has signed off on this PR px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels Dec 15, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2022

openshift-merge-robot merged commit 042bb06 into openshift:main Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

jhrozek commented Dec 14, 2022

openshift-ci-robot commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

openshift-ci bot commented Dec 15, 2022

xiaojiey commented Dec 15, 2022

xiaojiey commented Dec 15, 2022

Vincent056 left a comment

openshift-ci bot commented Dec 15, 2022

jhrozek commented Dec 15, 2022

jhrozek commented Dec 15, 2022

openshift-ci-robot commented Dec 15, 2022

OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

OCPBUGS-3690: profilerecorder: gracefully skip containers that did not record anything #20

Conversation

jhrozek commented Dec 14, 2022

openshift-ci-robot commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

jhrozek commented Dec 14, 2022

openshift-ci bot commented Dec 15, 2022

xiaojiey commented Dec 15, 2022

xiaojiey commented Dec 15, 2022

Vincent056 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 15, 2022

jhrozek commented Dec 15, 2022

jhrozek commented Dec 15, 2022

openshift-ci-robot commented Dec 15, 2022