Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ovirt e2e: fix nodeport issue on CI bug 1794714 #7614

Closed
wants to merge 1 commit into from

Conversation

Gal-Zaidman
Copy link
Contributor

This patch adds the workaround suggested on [1]
to make nodeport work and avoid the conformance
tests failures.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Signed-off-by: Gal Zaidman gzaidman@redhat.com

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 12, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Gal-Zaidman
To complete the pull request process, please assign wking
You can assign the PR to them by writing /assign @wking in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 12, 2020
@Gal-Zaidman Gal-Zaidman force-pushed the fix-node-port branch 2 times, most recently from 1bb6881 to 5d1df3f Compare March 15, 2020 12:29
This patch adds the workaround suggested on [1]
to make nodeport work and avoid the conformance
tests failures.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Signed-off-by: Gal Zaidman <gzaidman@redhat.com>
@Gal-Zaidman
Copy link
Contributor Author

/test pj-rehearse

@Gal-Zaidman
Copy link
Contributor Author

/assign @wking

${TEST_COMMAND}

# Runs an install
- name: setup
# A midstep till we have the installer work merged, then we
# can use the CI artifact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is your comment from #4340. Can you explain (ideally in the commit message), why you're removing it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will edit the commit message

done
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed","storage":{"emptyDir":{}}}}'
oc --insecure-skip-tls-verify patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed","storage":{"emptyDir":{}}}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding these? Don't we have a kubeconfig with the CA for the Kube API? If that CA doesn't match what the cluster serves us, I'd rather error out here instead of ignoring the mismatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I will remove it, it worked fine before.
I have added it because for some reason the "get nodes" fail for me locally without it.

@@ -441,7 +438,11 @@ objects:
wait "$!"
install_exit_status=$?
sleep 10m
oc get co/image-registry
# This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1794714
for n in $(oc --insecure-skip-tls-verify get nodes|awk '{print $1}'|grep worker);do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like operating on the table output of get nodes. Can you use -o jsonpath=... to have it spit out the field you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure didn't know the option, I will try it

oc get co/image-registry
# This is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1794714
for n in $(oc --insecure-skip-tls-verify get nodes|awk '{print $1}'|grep worker);do
oc -n default --insecure-skip-tls-verify debug node/$n --image=centos/tools -- ethtool --offload vxlan_sys_4789 tx off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is the sort of change that should be handled by a MachineConfig, although I don't know what you'd put in the config to apply this specific ethtool tweak. Dropping into a debug session on each node and bumping things yourself seems very brittle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we talked about it we said that we don't want to apply it by default on a cluster because it is a tweak that disables checksum and it seems to us like a decision that we don't want by default, plus it is a high priority bug which affects most platforms, happens only when using OpenshiftSDN , and has network people working on it.
We wanted to apply the fix in the CI because it made network test flaky during conformance.

@Gal-Zaidman
Copy link
Contributor Author

Not relevant any more, decided to implement a MCO fix
openshift/machine-config-operator#1628
openshift/machine-config-operator#1606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
3 participants