Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1817594: [release-4.4] Nodeip retry on failure #1616

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #1601

/assign celebdor

celebdor and others added 2 commits April 4, 2020 09:55
It is possible for NM-wait-online to let our node ip configuration
service go through before the control plane IP Address and/or Route is
configured. In such cases it would be great to have the systemd service
be able to retry on failure. Unfortunately, the current version of RHCOS
does not have a new enough systemd version, so we implement the retry
mechanism in the script itself.

Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
In my deployments I am seeing routes that include fields the
non_virtual_ip script can't handle. This causes it to fail and anything
relying on it to function incorrectly.

This change adds a **kwargs parameter to the class constructor so
it will take arbitrary params that we will then ignore. It also
filters out \ characters from the routes because I'm seeing that
as well and we don't want to try to parse it. The \ appears in
multi-line routes that can't be handled correctly by the existing
structure of the class because they have multiple 'via' values and
the class can only handle one per route. However, this is happening
only on the default route in my case, which we ignore anyway, and
this script is being replaced by a Go implementation in the near
future so I don't think it's worth rewriting it to handle multi-line
routes.
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Bugzilla bug 1819484, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "4.5.0" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead
  • expected Bugzilla bug 1819484 to depend on a bug targeting the "4.5.0" release and in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.4] Bug 1819484: Nodeip retry on failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 4, 2020
@celebdor
Copy link
Contributor

celebdor commented Apr 4, 2020

/retitle Bug 1817594: Nodeip retry on failure

@openshift-ci-robot openshift-ci-robot changed the title [release-4.4] Bug 1819484: Nodeip retry on failure Bug 1817594: Nodeip retry on failure Apr 4, 2020
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Bugzilla bug 1817594, which is invalid:

  • expected dependent Bugzilla bug 1819484 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1817594: Nodeip retry on failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@runcom
Copy link
Member

runcom commented Apr 5, 2020

/lgtm
/approve
/retest
/refresh

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 5, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@runcom runcom removed the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2020
@runcom
Copy link
Member

runcom commented Apr 6, 2020

/skip

@runcom runcom changed the title Bug 1817594: Nodeip retry on failure Bug 1817594: [release-4.4] Nodeip retry on failure Apr 6, 2020
@sinnykumari
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Apr 7, 2020
@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Bugzilla bug 1817594, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.4.0) matches configured target release for branch (4.4.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1819484 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1819484 targets the "4.5.0" release, matching the expected (4.5.0) release
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 7, 2020
@sinnykumari
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2020
@celebdor
Copy link
Contributor

celebdor commented Apr 7, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@celebdor: This pull request references Bugzilla bug 1817594, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.4.0) matches configured target release for branch (4.4.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1819484 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1819484 targets the "4.5.0" release, matching the expected (4.5.0) release
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yboaron
Copy link
Contributor

yboaron commented Apr 7, 2020

I manage to verify the fix on dev-scripts with openshift4.4

[kni@worker-0 dev-scripts]$ oc version Client Version: 4.4.0-0.ci-2020-04-06-174309 Server Version: 4.4.0-0.ci-2020-04-06-174309 Kubernetes Version: v1.17.1 [kni@worker-0 dev-scripts]$

And checking the logs nodeip-configuration service in master-2 node, we can see that retry was activated .

[core@master-2 ~]$ sudo journalctl -u nodeip-configuration.service -- Logs begin at Tue 2020-04-07 19:43:06 UTC, end at Tue 2020-04-07 20:25:00 UTC. -- Apr 07 19:43:48 master-2.ostest.test.metalkube.org systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP... Apr 07 19:43:49 master-2 nodeip-finder[1718]: Filtering out Address(127.0.0.1/8, dev=lo) due to it having host scope Apr 07 19:43:49 master-2 nodeip-finder[1718]: Filtering out Address(::1/128, dev=lo) due to it having host scope Apr 07 19:43:49 master-2 nodeip-finder[1718]: **Failed to find suitable node ip. Retrying...** [core@master-2 ~]$

@yboaron
Copy link
Contributor

yboaron commented Apr 7, 2020

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, runcom, sinnykumari, yboaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1817594, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.4.0) matches configured target release for branch (4.4.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1819484 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1819484 targets the "4.5.0" release, matching the expected (4.5.0) release
  • bug has dependents

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor

@ashcrow this is ready for cherrypick

@ashcrow ashcrow added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 7, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 8, 2020

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 145b337 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 0c37ee4 into openshift:release-4.4 Apr 8, 2020
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1616. Bugzilla bug 1817594 has been moved to the MODIFIED state.

In response to this:

Bug 1817594: [release-4.4] Nodeip retry on failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet