Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

must-gather: use exponential backoff for polling #523

Closed

Conversation

yuvalturg
Copy link
Contributor

When running must-gather on a bare metal cluster, the command will fail
to find the pod after a 1 minute duration, and exit with an error
message in the form of Get: $pod-url: unexpected EOF.

This patch replaces the normal way of polling (every 1 minute), with an
exponential backoff starting with 1 second, and a factor of 2. The
timeout value is unchanged.

Bug 1859972

Signed-off-by: Yuval Turgeman yturgema@redhat.com

When running must-gather on a bare metal cluster, the command will fail
to find the pod after a 1 minute duration, and exit with an error
message in the form of `Get: $pod-url: unexpected EOF`.

This patch replaces the normal way of polling (every 1 minute), with an
exponential backoff starting with 1 second, and a factor of 2.  The
timeout value is unchanged.

Bug 1859972

Signed-off-by: Yuval Turgeman <yturgema@redhat.com>
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yuvalturg
To complete the pull request process, please assign soltysh
You can assign the PR to them by writing /assign @soltysh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuvalturg
Copy link
Contributor Author

/retest

1 similar comment
@yuvalturg
Copy link
Contributor Author

/retest

func (o *MustGatherOptions) waitForPodRunning(pod *corev1.Pod) error {
phase := pod.Status.Phase
err := wait.PollImmediate(time.Minute, time.Duration(o.Timeout)*time.Second, func() (bool, error) {
err := wait.ExponentialBackoff(o.getBackoff(), func() (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does shortening this initial timeout like here #527 does the trick for you? Why you're proposing the exponential backoff?

Copy link
Contributor Author

@yuvalturg yuvalturg Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you needed a longer interval, so I figured this was a decent compromise. If #527 works for you, then yes, it will solve our issue as well and we can close this.

@soltysh soltysh self-assigned this Aug 25, 2020
@yuvalturg
Copy link
Contributor Author

Closing this in favor of #527

@yuvalturg yuvalturg closed this Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants