Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1887488: e2e: node: fix ping tester pod #25688

Merged
merged 1 commit into from Dec 2, 2020

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Nov 16, 2020

Pin the busybox image setting a known-good tag.
It was not done previously because of an oversight, no other reason.

Generalize the code which creates the testing pod to make the
code more flexible and a bit easier to read.

Last, use privileged pods for ping tests. This is needed because of the busybox images used
by the pinger pods.
Another approach could be to change base image (e.g. using centos images) but considering
the nature of the test and the fact that the busybox image is tiny, and much smaller than the closest
good alternative (centos), this approach seems good enough.

Bug-Url: https://bugzilla.redhat.com/1887488
Signed-off-by: Francesco Romani fromani@redhat.com

@ffromani
Copy link
Contributor Author

/retest

2 similar comments
@ffromani
Copy link
Contributor Author

/retest

@ffromani
Copy link
Contributor Author

/retest

@ffromani
Copy link
Contributor Author

this

fail [k8s.io/kubernetes@v1.19.2/test/e2e/apps/daemon_set.go:102]: Unexpected error:
    <*errors.errorString | 0xc0027641b0>: {
        s: "error while waiting for pods to become inactive daemon-set: there are 3 active pods. E.g. \"daemon-set-qkj95\" on node \"ip-10-0-130-238.us-west-2.compute.internal\"",
    }
    error while waiting for pods to become inactive daemon-set: there are 3 active pods. E.g. "daemon-set-qkj95" on node "ip-10-0-130-238.us-west-2.compute.internal"
occurred

can't be caused by this PR changing how we use an image on an unrelated test. Will report as flake.

@ffromani ffromani changed the title e2e: node: generalize pod and pin image Bug 1887488: e2e: node: generalize pod and pin image Nov 18, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Nov 18, 2020
@openshift-ci-robot
Copy link

@fromanirh: This pull request references Bugzilla bug 1887488, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1887488: e2e: node: generalize pod and pin image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 18, 2020
@ffromani
Copy link
Contributor Author

/retest

@ffromani
Copy link
Contributor Author

/assign @adambkaplan

@ffromani ffromani changed the title Bug 1887488: e2e: node: generalize pod and pin image Bug 1887488: e2e: node: fix ping tester pod Nov 18, 2020
@openshift-ci-robot
Copy link

@fromanirh: This pull request references Bugzilla bug 1887488, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1887488: e2e: node: fix ping tester pod

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor Author

/retest

@rphillips
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2020
@rphillips rphillips removed their assignment Nov 19, 2020
@ffromani
Copy link
Contributor Author

/cherry-pick release-4.6

@openshift-cherrypick-robot

@fromanirh: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -24,6 +24,7 @@ import (
const (
networkAttachmentAnnotation string = "k8s.v1.cni.cncf.io/networks"
sriovInterfaceName string = "sriov1"
busyboxImage string = "busybox:1.31.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is pulling from DockerHub without a pull secret, and thus can fail to deploy due to a rate limit.

Can registry.redhat.io/ubi8/ubi-minimal be used instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything which contains ping and it is maintained is fine for this task. I'll research for alternatives.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 24, 2020
Comment on lines 560 to 565
if cp.Privileged {
isTrue := true
cnt.SecurityContext = &corev1.SecurityContext{
Privileged: &isTrue,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need to run a pod that's fully privileged? This bypasses every baked in security check.

If the issue is that on busybox you now need root, you can use runAsUser: 0 in the security context. Not awesome, but better than running a privileged container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Let me narrow down the privileges.

Copy link
Contributor Author

@ffromani ffromani Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So RunAsUser: 0 is not enough, it seems I need at least CAP_NET_RAW (which is still too much?)
if I remove Privileged: true and I replace with RunAsUser: 0 I get:

Nov 24 17:52:36.585: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-topology-manager-xhjcv --kubeconfig=/home/fromani/clusters/cnflab/kubeconfig rsh -n default -c test-0 test-29pj7 ping -c 3 10.56.217.171:
StdOut>
bind: Permission denied
PING 10.56.217.171 (10.56.217.171): 56 data bytes
command terminated with exit code 2
StdErr>
bind: Permission denied
PING 10.56.217.171 (10.56.217.171): 56 data bytes
command terminated with exit code 2

@ffromani
Copy link
Contributor Author

/retest

1 similar comment
@ffromani
Copy link
Contributor Author

/retest

Copy link
Contributor

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits, otherwise looks good.

Pin the busybox image setting a known-good tag.
It was not done previously because of an oversight, no other reason.
Generalize the code which creates the testing pod to make the
code more flexible and a bit easier to read.

Any public-available, reasonably recent (>= 1.21) busybox image is fine.
We use the same image other extended tests (DNS) are already using.
This solves the issues for all but one tests: the connectivity test.

One of the e2e topology manager tests wants to run a basic network
check between two NUMA-aligned pods, to get a basic signal if
the network is working. This is done to test that resource alignment
is not preventing the network to work.

This test is intentionally minimal: proper network check is demanded
to other testsuites (e.g. the SRIOV testsuite), but we need some basic
coverage in this testsuite and we have this intentional, minimal,
overlap.

To do the minimal testing, the test what to ping two resource-aligned
pod from each other. To do so we need a container image which contains
the ping tool and which not require any special privilege.

So, this patch requires the pull URL of the network-check image
to be supplied using a (documented) environment variable. This way:
1. users can use existing images not part of openshift
   (quay.io/openshift-kni/cnf-tests) which they may want to use to
   validate their cluster anyway
2. users can easily opt out from this test (do not supply the env
   variable to skip the test)
3. in the future we can provide a very simple base image in from
   openshift itself to fill this role.

Bug-Url: https://bugzilla.redhat.com/1887488
Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani
Copy link
Contributor Author

/retest

1 similar comment
@ffromani
Copy link
Contributor Author

/retest

@rphillips
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 30, 2020
Copy link
Contributor

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, fromanirh, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2020
@ffromani
Copy link
Contributor Author

/test e2e-gcp

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

13 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@ffromani
Copy link
Contributor Author

ffromani commented Dec 2, 2020

/retest

@openshift-merge-robot openshift-merge-robot merged commit 92d10e1 into openshift:master Dec 2, 2020
@openshift-ci-robot
Copy link

@fromanirh: All pull requests linked via external trackers have merged:

Bugzilla bug 1887488 has been moved to the MODIFIED state.

In response to this:

Bug 1887488: e2e: node: fix ping tester pod

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@fromanirh: new pull request created: #25729

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani ffromani deleted the fix-bz1887488 branch December 9, 2020 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants