Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1940207: create the ovs-config-executed file to signal ovs is running on the host #2506

Merged
merged 1 commit into from Apr 12, 2021

Conversation

fedepaol
Copy link
Member

- What I did
The file is not needed by 4.7+ ovs pods, but when rolling back from 4.7
to 4.6 CNO is updated before MCO. The 4.6 version of the pod does not
find the file, executes ovs in a pod with the result that two ovs
instances are running at the same time.

- How to verify it

Need to test the downgrade with this PR in

- Description for the changelog

The file is not needed by 4.7+ ovs pods, but when rolling back from 4.7
to 4.6 CNO is updated before MCO. The 4.6 version of the pod does not
find the file, executes ovs in a pod with the result that two ovs
instances are running at the same time.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
@fedepaol
Copy link
Member Author

/hold
Need to verify it really fixes the issue

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2021
@fedepaol
Copy link
Member Author

/cc @trozet

@fedepaol
Copy link
Member Author

fedepaol commented Apr 2, 2021

/retest

@fedepaol
Copy link
Member Author

fedepaol commented Apr 6, 2021

I managed to test this.
Spawned a 4.7 cluster, overrode the mco following the instructions in https://github.com/openshift/machine-config-operator/blob/master/docs/HACKING.md#running-in-a-cluster , forced the downgrade to 4.6

oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.6.0-0.ci-2021-03-31-003732 --allow-explicit-upgrade --force

ovs says it's running in systemd:

oc -n openshift-sdn logs ovs-cdd9b
openvswitch is running in systemd
==> /host/var/log/openvswitch/ovs-vswitchd.log <==
2021-04-06T14:01:45.335Z|00403|connmgr|INFO|br0<->unix#1003: 2 flow_mods in the last 0 s (2 deletes)
2021-04-06T14:01:45.377Z|00404|connmgr|INFO|br0<->unix#1006: 4 flow_mods in the last 0 s (4 deletes)
2021-04-06T14:01:45.416Z|00405|bridge|INFO|bridge br0: deleted interface vethb03eb98a on port 10
2021-04-06T14:01:55.710Z|00406|bridge|INFO|bridge br0: added interface veth1cc74392 on port 57

Upgrade is progressing:

➜  Downloads oc get clusteroperators           
NAME                                       VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.ci-2021-04-02-023150   True        False         False      23m
baremetal                                  4.7.0-0.ci-2021-04-02-034219   True        False         False      171m
cloud-credential                           4.6.0-0.ci-2021-04-02-023150   True        False         False      177m
cluster-autoscaler                         4.6.0-0.ci-2021-04-02-023150   True        False         False      170m
config-operator                            4.6.0-0.ci-2021-04-02-023150   True        False         False      171m
console                                    4.6.0-0.ci-2021-04-02-023150   True        False         False      22m
csi-snapshot-controller                    4.6.0-0.ci-2021-04-02-023150   True        False         False      69m
dns                                        4.6.0-0.ci-2021-04-02-023150   True        False         False      161m
etcd                                       4.6.0-0.ci-2021-04-02-023150   True        False         False      170m
image-registry                             4.6.0-0.ci-2021-04-02-023150   True        False         False      163m
ingress                                    4.6.0-0.ci-2021-04-02-023150   True        False         False      162m
insights                                   4.6.0-0.ci-2021-04-02-023150   True        False         False      164m
kube-apiserver                             4.6.0-0.ci-2021-04-02-023150   True        False         False      169m
kube-controller-manager                    4.6.0-0.ci-2021-04-02-023150   True        False         False      169m
kube-scheduler                             4.6.0-0.ci-2021-04-02-023150   True        False         False      169m
kube-storage-version-migrator              4.6.0-0.ci-2021-04-02-023150   True        False         False      73m
machine-api                                4.6.0-0.ci-2021-04-02-023150   True        False         False      164m
machine-approver                           4.6.0-0.ci-2021-04-02-023150   True        False         False      170m
machine-config                             4.7.0-0.ci-2021-04-02-034219   False       True          False      6m51s
marketplace                                4.6.0-0.ci-2021-04-02-023150   True        False         False      22m
monitoring                                 4.6.0-0.ci-2021-04-02-023150   True        False         False      21m
network                                    4.6.0-0.ci-2021-04-02-023150   True        False         False      171m
node-tuning                                4.6.0-0.ci-2021-04-02-023150   True        False         False      22m
openshift-apiserver                        4.6.0-0.ci-2021-04-02-023150   True        False         False      23m
openshift-controller-manager               4.6.0-0.ci-2021-04-02-023150   True        False         False      98m
openshift-samples                          4.6.0-0.ci-2021-04-02-023150   True        False         False      22m
operator-lifecycle-manager                 4.6.0-0.ci-2021-04-02-023150   True        False         False      170m
operator-lifecycle-manager-catalog         4.6.0-0.ci-2021-04-02-023150   True        False         False      170m
operator-lifecycle-manager-packageserver   4.6.0-0.ci-2021-04-02-023150   True        False         False      22m
service-ca                                 4.6.0-0.ci-2021-04-02-023150   True        False         False      171m
storage                                    4.6.0-0.ci-2021-04-02-023150   True        False         False      69m

The upgrade is stuck on MCO with the following error:

  extension:
    lastSyncError: 'pool master has not progressed to latest configuration: controller version mismatch for rendered-master-cb2db7df54e993c796b76a2242b3e08a expected d5dc2b519aed5b3ed6a6ab9e7f70f33740f9f8af has b5723620cfe40e2e4e8cbdcb105d6ae534be1753: pool is degraded because rendering fails with "": "Failed to render configuration for pool master: parsing Ignition config failed: unknown version. Supported spec versions: 2.2, 3.0, 3.1", retrying'
    master: 'pool is degraded because rendering fails with "": "Failed to render configuration for pool master: parsing Ignition config failed: unknown version. Supported spec versions: 2.2, 3.0, 3.1"'
    worker: 'pool is degraded because rendering fails with "": "Failed to render configuration for pool worker: parsing Ignition config failed: unknown version. Supported spec versions: 2.2, 3.0, 3.1"'

But I think this fixed the network part.

@fedepaol fedepaol changed the title Create the ovs-config-executed file to signal ovs is running on the host Bug 1940207: create the ovs-config-executed file to signal ovs is running on the host Apr 6, 2021
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. label Apr 6, 2021
@openshift-ci-robot
Copy link
Contributor

@fedepaol: This pull request references Bugzilla bug 1940207, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1940207: create the ovs-config-executed file to signal ovs is running on the host

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 6, 2021
@fedepaol
Copy link
Member Author

fedepaol commented Apr 6, 2021

/retest

@fedepaol
Copy link
Member Author

fedepaol commented Apr 6, 2021

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Apr 6, 2021
@openshift-ci-robot
Copy link
Contributor

@fedepaol: This pull request references Bugzilla bug 1940207, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fedepaol
Copy link
Member Author

fedepaol commented Apr 6, 2021

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2021
@kikisdeliveryservice
Copy link
Contributor

/assign @trozet

@fedepaol
Copy link
Member Author

fedepaol commented Apr 7, 2021

/retest

@fedepaol
Copy link
Member Author

fedepaol commented Apr 7, 2021

test e2e-aws-serial

@fedepaol
Copy link
Member Author

fedepaol commented Apr 7, 2021

/test e2e-aws-serial

@trozet
Copy link
Contributor

trozet commented Apr 7, 2021

@kikisdeliveryservice @zhaozhanqi any feedback on the MCO error @fedepaol saw here?
#2506 (comment)

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2021
@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented Apr 7, 2021

The error is expected. The MCO doesn't support downgrades, so the 4.6 MCO doesn't understand how to parse ignition 3.2 configs (4.7). This in turn means unfortunately all downgrades from 4.7->4.6 will fail

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

24 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit c7538a2 into openshift:master Apr 12, 2021
@openshift-ci-robot
Copy link
Contributor

@fedepaol: All pull requests linked via external trackers have merged:

Bugzilla bug 1940207 has been moved to the MODIFIED state.

In response to this:

Bug 1940207: create the ovs-config-executed file to signal ovs is running on the host

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fedepaol
Copy link
Member Author

/cherry-pick release-4.7

@openshift-cherrypick-robot

@fedepaol: new pull request created: #2532

In response to this:

/cherry-pick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants