Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #84181: Lower AWS DescribeVolume frequency #85675: Fix AWS eventual consistency of AttachDisk #89894

Conversation

johanneswuerbach
Copy link
Contributor

@johanneswuerbach johanneswuerbach commented Apr 6, 2020

Cherry pick of #84181 #85675 on release-1.16.

#84181: Lower AWS DescribeVolume frequency
#85675: Fix AWS eventual consistency of AttachDisk

For details on the cherry pick process, see the cherry pick requests page.

We are hitting a similar issue @jsafrane described and AWS EBS attach operations randomly failing the first attach after a detach with "timed out waiting for the condition" just to be retried and working afterwards.

As we are currently running on 1.16, this is similar to #89891, but also includes #84181 so it applied cleanly.

Does this PR introduce a user-facing change?:

Reduced frequency of DescribeVolumes calls of AWS API when attaching/detaching a volume.
Fixed "requested device X but found Y" attach error on AWS.

Call DescribeVolumes less frequently so controller-manager is not throttled
by AWS. DescribeVolumes is basically the only Kubernetes call that suffers API
throttling by AWS.
AWS eventual consistency can go back in time. It can return that a volume
is detached and then that it is attached.

When this happens during attachment of the same volume to the same node,
but with a different device name, retry DescribeVolumes a few times before
reporting an error. 10 retries should be enough to get a consistent result.
In case DescribeVolumes returns stale attachment and the volume was
previously attached to a different node.
@k8s-ci-robot k8s-ci-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Apr 6, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Apr 6, 2020
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 6, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @johanneswuerbach. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 6, 2020
@k8s-ci-robot k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 6, 2020
@tpepper
Copy link
Member

tpepper commented Apr 9, 2020

/kind bug
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. kind/bug Categorizes issue or PR as related to a bug. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 9, 2020
@tpepper
Copy link
Member

tpepper commented Apr 9, 2020

@johanneswuerbach due to the combined pick it looks like you'll need to manually populate the release note stanza for this PR using the info from the two master branch PRs' stanzas.

@johanneswuerbach
Copy link
Contributor Author

Thank you @tpepper, added a release note block.

/retest

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 10, 2020
@johanneswuerbach
Copy link
Contributor Author

/assign @jsafrane
/retest

@jsafrane
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johanneswuerbach, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 16, 2020
@johanneswuerbach
Copy link
Contributor Author

/cc @kubernetes/patch-release-team

I would appreciate if you could take a look :-)

@tpepper tpepper added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 28, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Apr 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit 7f72444 into kubernetes:release-1.16 Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants