Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-0.58] Use exponential backoff for failing migrations #8784

Conversation

kubevirt-bot
Copy link
Contributor

This is an automated cherry-pick of #8530

/assign acardace

Use exponential backoff for failing migrations

With this patch when migrating a VM fails an increasingly exponential
backoff will be applied before retrying.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Nov 15, 2022
Copy link
Contributor

@enp0s3 enp0s3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 16, 2022
@kubevirt-bot
Copy link
Contributor Author

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enp0s3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2022
@kubevirt-bot
Copy link
Contributor Author

kubevirt-bot commented Nov 16, 2022

@kubevirt-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-goveralls cc00896 link false /test pull-kubevirt-goveralls-0.58
pull-kubevirt-e2e-k8s-1.22-sig-performance cc00896 link false /test pull-kubevirt-e2e-k8s-1.22-sig-performance-0.58

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in general not too concerned about this change, but it looks like this would in general add a slightly weird cross-object relationship which is hard to understand from outside and very uncommon from a general usage perspective. The eviction problem is however real and definitely deserves a solution.

I discussed a few options with @xpivarc and I think, in order to backport this it would make sense to limit this for now the migrations which are solely created by the eviction webhook (e.g. adding a label and filtering by it) and leaving the general migration behaviour unchanged.

On main we could then decide if we want to keep it this way, or try moving the whole backoff logic to the webhook.

What do you think?

@enp0s3
Copy link
Contributor

enp0s3 commented Nov 16, 2022

/hold
looking

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2022
@acardace
Copy link
Member

I am in general not too concerned about this change, but it looks like this would in general add a slightly weird cross-object relationship which is hard to understand from outside and very uncommon from a general usage perspective. The eviction problem is however real and definitely deserves a solution.

I see your point, it does make sense.

I discussed a few options with @xpivarc and I think, in order to backport this it would make sense to limit this for now the migrations which are solely created by the eviction webhook (e.g. adding a label and filtering by it) and leaving the general migration behaviour unchanged.

Yes, I think this could work.

On main we could then decide if we want to keep it this way, or try moving the whole backoff logic to the webhook.

What do you think?

So I'd first introduce this change here in release-0.58 and then implement it in main? It would be a first.

@rmohr
Copy link
Member

rmohr commented Nov 16, 2022

I am in general not too concerned about this change, but it looks like this would in general add a slightly weird cross-object relationship which is hard to understand from outside and very uncommon from a general usage perspective. The eviction problem is however real and definitely deserves a solution.

I see your point, it does make sense.

I discussed a few options with @xpivarc and I think, in order to backport this it would make sense to limit this for now the migrations which are solely created by the eviction webhook (e.g. adding a label and filtering by it) and leaving the general migration behaviour unchanged.

Yes, I think this could work.

On main we could then decide if we want to keep it this way, or try moving the whole backoff logic to the webhook.
What do you think?

So I'd first introduce this change here in release-0.58 and then implement it in main? It would be a first.

ah no, lets' follow the usual flow: main and back-port.

@acardace
Copy link
Member

@xpivarc @rmohr
I created the new PR #8808, can you guys take a look?

@acardace
Copy link
Member

/retest-required

@acardace
Copy link
Member

@enp0s3 @rmohr can we now unhold this?

@enp0s3
Copy link
Contributor

enp0s3 commented Nov 20, 2022

/unhold

@acardace I Don't know what is the desired strategy in this case - to have a manual backport that contains the 2 PRs, or to have 2 automated backports. IMO we don't take a high risk by separating it into 2 separate automated backports.

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2022
@enp0s3
Copy link
Contributor

enp0s3 commented Nov 20, 2022

/retest-required

@rmohr
Copy link
Member

rmohr commented Nov 21, 2022

/unhold

@acardace I Don't know what is the desired strategy in this case - to have a manual backport that contains the 2 PRs, or to have 2 automated backports. IMO we don't take a high risk by separating it into 2 separate automated backports.

+1 two should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants