ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

Gal-Zaidman · 2020-03-12T15:59:49Z

This patch adds the workaround suggested on [1]
to make nodeport work and avoid the conformance
tests failures.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Signed-off-by: Gal Zaidman gzaidman@redhat.com

openshift-ci-robot · 2020-03-12T16:00:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Gal-Zaidman
To complete the pull request process, please assign wking
You can assign the PR to them by writing /assign @wking in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

ci-operator/templates/openshift/installer/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This patch adds the workaround suggested on [1] to make nodeport work and avoid the conformance tests failures. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714 Signed-off-by: Gal Zaidman <gzaidman@redhat.com>

Gal-Zaidman · 2020-03-15T15:11:04Z

/test pj-rehearse

Gal-Zaidman · 2020-03-15T16:36:52Z

/assign @wking

wking · 2020-03-18T19:23:41Z

ci-operator/templates/openshift/installer/cluster-launch-installer-ovirt-e2e.yaml

        ${TEST_COMMAND}

    # Runs an install
    - name: setup
-      # A midstep till we have the installer work merged, then we
-      # can use the CI artifact


Looks like this is your comment from #4340. Can you explain (ideally in the commit message), why you're removing it here?

I will edit the commit message

wking · 2020-03-18T19:25:03Z

ci-operator/templates/openshift/installer/cluster-launch-installer-ovirt-e2e.yaml

          done
-          oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed","storage":{"emptyDir":{}}}}'
+          oc --insecure-skip-tls-verify patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed","storage":{"emptyDir":{}}}}'


Why are you adding these? Don't we have a kubeconfig with the CA for the Kube API? If that CA doesn't match what the cluster serves us, I'd rather error out here instead of ignoring the mismatch.

+1
I will remove it, it worked fine before.
I have added it because for some reason the "get nodes" fail for me locally without it.

wking · 2020-03-18T19:27:24Z

ci-operator/templates/openshift/installer/cluster-launch-installer-ovirt-e2e.yaml

@@ -441,7 +438,11 @@ objects:
        wait "$!"
        install_exit_status=$?
        sleep 10m
-        oc get co/image-registry
+        # This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1794714
+        for n in $(oc --insecure-skip-tls-verify get nodes|awk '{print $1}'|grep worker);do


I don't like operating on the table output of get nodes. Can you use -o jsonpath=... to have it spit out the field you want?

sure didn't know the option, I will try it

wking · 2020-03-18T19:38:04Z

ci-operator/templates/openshift/installer/cluster-launch-installer-ovirt-e2e.yaml

-        oc get co/image-registry
+        # This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1794714
+        for n in $(oc --insecure-skip-tls-verify get nodes|awk '{print $1}'|grep worker);do
+          oc -n default --insecure-skip-tls-verify debug node/$n --image=centos/tools -- ethtool --offload vxlan_sys_4789 tx off


I feel like this is the sort of change that should be handled by a MachineConfig, although I don't know what you'd put in the config to apply this specific ethtool tweak. Dropping into a debug session on each node and bumping things yourself seems very brittle.

When we talked about it we said that we don't want to apply it by default on a cluster because it is a tweak that disables checksum and it seems to us like a decision that we don't want by default, plus it is a high priority bug which affects most platforms, happens only when using OpenshiftSDN , and has network people working on it.
We wanted to apply the fix in the CI because it made network test flaky during conformance.

Gal-Zaidman · 2020-04-15T08:37:38Z

Not relevant any more, decided to implement a MCO fix
openshift/machine-config-operator#1628
openshift/machine-config-operator#1606

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 12, 2020

openshift-ci-robot requested review from jcpowermac and patrickdillon March 12, 2020 16:01

Gal-Zaidman force-pushed the fix-node-port branch from 7edcdca to fe6c0d2 Compare March 12, 2020 16:22

openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 12, 2020

Gal-Zaidman force-pushed the fix-node-port branch 2 times, most recently from 1bb6881 to 5d1df3f Compare March 15, 2020 12:29

ovirt e2e: fix nodeport issue on CI bug 1794714

260124d

This patch adds the workaround suggested on [1] to make nodeport work and avoid the conformance tests failures. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714 Signed-off-by: Gal Zaidman <gzaidman@redhat.com>

Gal-Zaidman force-pushed the fix-node-port branch from 5d1df3f to 260124d Compare March 15, 2020 14:16

openshift-ci-robot assigned wking Mar 15, 2020

wking reviewed Mar 18, 2020

View reviewed changes

Gal-Zaidman closed this Apr 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

Gal-Zaidman commented Mar 12, 2020

openshift-ci-robot commented Mar 12, 2020

Gal-Zaidman commented Mar 15, 2020

Gal-Zaidman commented Mar 15, 2020

wking Mar 18, 2020

Gal-Zaidman Mar 18, 2020

wking Mar 18, 2020

Gal-Zaidman Mar 18, 2020

wking Mar 18, 2020

Gal-Zaidman Mar 18, 2020

wking Mar 18, 2020

Gal-Zaidman Mar 18, 2020

Gal-Zaidman commented Apr 15, 2020

ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

Conversation

Gal-Zaidman commented Mar 12, 2020

openshift-ci-robot commented Mar 12, 2020

Gal-Zaidman commented Mar 15, 2020

Gal-Zaidman commented Mar 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gal-Zaidman commented Apr 15, 2020