Do not use NumReques in job controller to count backoff #64787

soltysh · 2018-06-05T20:03:44Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Currently our job controller is using NumRequeues on a queue to figure out how many times an object has been requeued which means how many retries were processed against that object. This is ok in the simple use case, but not in job controller where requeues happen because of a failed status update or a few other cases that should not bum the backoff counter.

We need to figure out a better way to handle the backoff.

The text was updated successfully, but these errors were encountered:

crimsonfaith91 · 2018-07-03T22:25:33Z

/cc

fejta-bot · 2018-11-22T00:20:04Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-12-22T01:03:22Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-01-21T01:49:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-01-21T01:49:51Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

soltysh · 2019-04-25T13:13:23Z

This is still a thing.
/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2019-04-25T13:13:24Z

@soltysh: Reopened this issue.

In response to this:

This is still a thing.
/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2019-07-24T13:17:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-08-23T13:18:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

goodluckbot · 2019-08-27T11:07:45Z

/remove-lifecycle rotten

fejta-bot · 2019-11-25T11:58:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-12-25T12:42:32Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-01-24T13:27:52Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-01-24T13:28:01Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

janetkuo · 2020-10-10T03:34:05Z

/remove-lifecycle rotten
/lifecycle frozen

janetkuo · 2020-10-10T03:35:49Z

/triage accepted

soltysh · 2020-11-26T17:56:49Z

With the new controller being written (see #93370) this is not relevant any more.

soltysh added area/batch kind/technical-debt sig/apps Categorizes an issue or PR as relevant to SIG Apps. area/workload-api/job labels Jun 5, 2018

soltysh self-assigned this Jun 5, 2018

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 5, 2018

soltysh mentioned this issue Jun 5, 2018

Never clean backoff in job controller #63650

Merged

kow3ns added this to Backlog in Workloads Jun 5, 2018

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/technical-debt labels Aug 23, 2018

soltysh mentioned this issue Oct 26, 2018

Job's backoffLimit needs to take number of failed pods into account #70251

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 22, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 22, 2018

k8s-ci-robot closed this as completed Jan 21, 2019

Workloads automation moved this from Backlog to Done Jan 21, 2019

k8s-ci-robot reopened this Apr 25, 2019

Workloads automation moved this from Done to Backlog Apr 25, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 25, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2019

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 23, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 25, 2019

tanjunchen mentioned this issue Jan 1, 2020

/test: resolve pending TODOs #86756

Closed

k8s-ci-robot closed this as completed Jan 24, 2020

Workloads automation moved this from Backlog to Done Jan 24, 2020

yasongxu mentioned this issue Jun 18, 2020

Job's backoffLimit not correct #92245

Closed

yodarshafrir1 mentioned this issue Aug 7, 2020

Fix job backoff limit for restart policy Never #93779

Merged

janetkuo reopened this Oct 10, 2020

Workloads automation moved this from Done to Backlog Oct 10, 2020

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 10, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Oct 10, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 10, 2020

soltysh closed this as completed Nov 26, 2020

Workloads automation moved this from Backlog to Done Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not use NumReques in job controller to count backoff #64787

Do not use NumReques in job controller to count backoff #64787

soltysh commented Jun 5, 2018

crimsonfaith91 commented Jul 3, 2018

fejta-bot commented Nov 22, 2018

fejta-bot commented Dec 22, 2018

fejta-bot commented Jan 21, 2019

k8s-ci-robot commented Jan 21, 2019

soltysh commented Apr 25, 2019

k8s-ci-robot commented Apr 25, 2019

fejta-bot commented Jul 24, 2019

fejta-bot commented Aug 23, 2019

goodluckbot commented Aug 27, 2019

fejta-bot commented Nov 25, 2019

fejta-bot commented Dec 25, 2019

fejta-bot commented Jan 24, 2020

k8s-ci-robot commented Jan 24, 2020

janetkuo commented Oct 10, 2020

janetkuo commented Oct 10, 2020

soltysh commented Nov 26, 2020

Do not use NumReques in job controller to count backoff #64787

Do not use NumReques in job controller to count backoff #64787

Comments

soltysh commented Jun 5, 2018

crimsonfaith91 commented Jul 3, 2018

fejta-bot commented Nov 22, 2018

fejta-bot commented Dec 22, 2018

fejta-bot commented Jan 21, 2019

k8s-ci-robot commented Jan 21, 2019

soltysh commented Apr 25, 2019

k8s-ci-robot commented Apr 25, 2019

fejta-bot commented Jul 24, 2019

fejta-bot commented Aug 23, 2019

goodluckbot commented Aug 27, 2019

fejta-bot commented Nov 25, 2019

fejta-bot commented Dec 25, 2019

fejta-bot commented Jan 24, 2020

k8s-ci-robot commented Jan 24, 2020

janetkuo commented Oct 10, 2020

janetkuo commented Oct 10, 2020

soltysh commented Nov 26, 2020