Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2005464: [4.8z] Fixes skipping pods accidentally in retry #758

Merged
merged 2 commits into from Sep 23, 2021

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Sep 17, 2021

The code was using the initially stored pod object to determine if the
pod was scheduled or not, rather than the current state of the pod.

girishmg and others added 2 commits September 17, 2021 13:55
we have seen instances where the pod delete event gets queued up and the
pod's DeleteFunc() callback is delayed. in the meantime, we are trying
to do pod setup of the failed pod again and again. since the pod is gone
we fail in setting pod annotations in addLogicalPort(). we add the pod
back to the retryLoop and retry again after 1 minute.

we continue this until the pod's DeleteFunc() gets called and it removes
the deleted pod from the retryPod map.

(cherry picked from commit f5ec566)
The code was using the initially stored pod object to determine if the
pod was scheduled or not, rather than the current state of the pod.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit 151eb77)
@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Sep 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 17, 2021

@trozet: This pull request references Bugzilla bug 2005464, which is invalid:

  • expected dependent Bugzilla bug 2005462 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2005464: [4.8z] Fixes skipping pods accidentally in retry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 17, 2021
@trozet
Copy link
Contributor Author

trozet commented Sep 17, 2021

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 17, 2021

@trozet: This pull request references Bugzilla bug 2005464, which is invalid:

  • expected dependent Bugzilla bug 2005462 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dcbw
Copy link
Member

dcbw commented Sep 18, 2021

/lgtm
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 18, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2021

@openshift-bot: This pull request references Bugzilla bug 2005464, which is invalid:

  • expected dependent Bugzilla bug 2005462 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

12 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@dhellmann
Copy link

/hold

@trozet Is the e2e-metal-ipi-ovn-dualstack job important for this patch or do we want to override that job and merge it?

I'm adding a hold for now to stop the bot from retrying indefinitely.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2021
@dhellmann
Copy link

[patch-manager] ⌛ This pull request was not picked by the patch manager for the current z-stream window and have to wait for the next window.

skipped for today

  • Score: 1.70
  • Reason: priority request from OVN team, but CI is not passing

NOTE: This message was automatically generated, if you have questions please ask on #forum-release

@trozet
Copy link
Contributor Author

trozet commented Sep 23, 2021

@dhellmann we dont need to override dualstack, its not a required job right? This PR is not specific to dualstack. The failures in the dualstack job:

[sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel]
[Conformance][sig-api-machinery][Feature:APIServer] local kubeconfig "localhost.kubeconfig" should be present on all masters and work [Suite:openshift/conformance/parallel/minimal]

looks like they have nothing to do with ovn-kube or this PR. I'm not sure why the job was removed as required. @stbenjam are you aware of this?

@trozet
Copy link
Contributor Author

trozet commented Sep 23, 2021

@jluhrsen ^

@dhellmann
Copy link

I'm not sure why the bot kept retrying, or whether the PR would merge if I approved it without skipping the failing job. I'm happy to apply whatever labels we need, if someone confirms it's safe to take manual action.

@trozet
Copy link
Contributor Author

trozet commented Sep 23, 2021

/retest

@trozet
Copy link
Contributor Author

trozet commented Sep 23, 2021

@dhellmann its just retrying cause it doesnt have cherry-pick approved. Its a bug with the bot. Even if all jobs were passed, it would keep just printing /retest-required forever.

@dhellmann dhellmann added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 23, 2021
@dhellmann
Copy link

OK, let's see if this goes through, then.

@openshift-merge-robot openshift-merge-robot merged commit f1ee037 into openshift:release-4.8 Sep 23, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 23, 2021

@trozet: All pull requests linked via external trackers have merged:

Bugzilla bug 2005464 has been moved to the MODIFIED state.

In response to this:

Bug 2005464: [4.8z] Fixes skipping pods accidentally in retry

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@anuragthehatter
Copy link

/bugzilla cc-qa

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 23, 2021

@anuragthehatter: Bugzilla bug 2005464 is in an unrecognized state (MODIFIED) and will not be moved to the MODIFIED state.

In response to this:

/bugzilla cc-qa

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@anuragthehatter
Copy link

/lgtm
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants