Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1898178: [release-4.6] Handle egress IP assignment for node IPs #349

Merged

Conversation

alexanderConstantinescu
Copy link
Contributor

- What this PR does and why is it needed

This back-ports the essential bits for the egress IP assignment procedure to guard against node IP assignment (as outlined in ovn-org/ovn-kubernetes#1668). This PR requires a couple of additional commits that came after ovn-org/ovn-kubernetes#1668, as that PR had some flaws that needed to be fixed afterwards.

/assign @danwinship

- Special notes for reviewers

- How to verify it

- Description for the changelog

When an egress IP is specified equaling a node's IP address, we
can end up in the situation where the assignment procedure assigns the
egress IP to another cluster node, and thus missing the node actually hosting
that IP address. This patch thus fixes this by invoking an error during the
assignment procedure and triggering an event notifying him/her of this
unsupported request.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Previously we only logged an error when the specified
egress IP was invalid. Having worked on the previous commit
I realized that it is more coherent to return an error to the user
notifying him/her of this incorrect definition. The risk is that
if a user specified multiple egress IPs and one is incorrect and we trigger
an event, that this event goes unoticied but causes confusion later on.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
The test "should skip populating egress node data for nodes that have incorrect IP address"
explicitly defines bad IP addresses for nodes as to make sure we don't use such nodes for
egress assignment, however: addEgressNode did not take into account that the node in question
might not exist in the map due to the previous condition. The test flakes as the checks it
performs do not change state and thus the Eventually calls might pass faster than the code
path is executed, i.e: it all depends on the CPU

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
PR: ovn-org/ovn-kubernetes#1668 re-worked
the way egress IP assignment is done, taking into account that users
do not accidentally assign a node IP as egress IP and screws up cluster
networking. However, it introduced a bug, namely: we now have a problem
with reconciliation of nodes for egress, ultimately skipping any UPDATE
event for nodes. Thus, if ovnkube-node does not label a node in question with
k8s.ovn.org/node-primary-ifaddr before ovnkube-master received the
ADD for that node, then we never proceed to update our internal
data with that node when we receive the UPDATE. This commit fixes that

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 16, 2020
@openshift-ci-robot
Copy link
Contributor

@alexanderConstantinescu: This pull request references Bugzilla bug 1898178, which is invalid:

  • expected dependent Bugzilla bug 1898174 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1898178: [release-4.6] Handle egress IP assignment for node IPs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 17, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderConstantinescu, danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: This pull request references Bugzilla bug 1898178, which is invalid:

  • expected dependent Bugzilla bug 1898174 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@alexanderConstantinescu
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 18, 2020
@openshift-ci-robot
Copy link
Contributor

@alexanderConstantinescu: This pull request references Bugzilla bug 1898178, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.z) matches configured target release for branch (4.6.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1898174 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1898174 targets the "4.7.0" release, which is one of the valid target releases: 4.7.0
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 18, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sdodson
Copy link
Member

sdodson commented Dec 7, 2020

Going to go ahead and clear cherry-pick-approved on the new failures. Lets make sure we understand why these are failing and pick back up on Wednesday.

@sdodson sdodson removed the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 7, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sdodson
Copy link
Member

sdodson commented Dec 7, 2020

/hold
Just to prevent endless tests, once someone has reviewed the failing jobs feel free to clear this.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2020
@alexanderConstantinescu
Copy link
Contributor Author

/retest

@alexanderConstantinescu
Copy link
Contributor Author

@sdodson could you remove you /hold? All CI jobs on 4.6 have started passing, so I see no point in holding this anymore.

@sdodson
Copy link
Member

sdodson commented Dec 18, 2020

/retest
/hold cancel

Sure, just in general I'm fine with you clearing the hold as long as you've reviewed the persistent failures and have reason to believe they've been resolved, so don't block on me. Holds can be cleared by anyone.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@markmc
Copy link
Contributor

markmc commented Dec 21, 2020

(patch manager) looks like quite a large and potentially risky change, but it's a clean backport from 4.7, and it prevents what sounds like a nasty impact from a misconfiguration

@markmc markmc added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 21, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a0fead4 into openshift:release-4.6 Dec 22, 2020
@openshift-ci-robot
Copy link
Contributor

@alexanderConstantinescu: All pull requests linked via external trackers have merged:

Bugzilla bug 1898178 has been moved to the MODIFIED state.

In response to this:

Bug 1898178: [release-4.6] Handle egress IP assignment for node IPs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet