Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry PV create in e2e-test on API quota failure #105910

Merged
merged 1 commit into from
Oct 27, 2021

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Oct 26, 2021

While running e2e tests on gcp in many parallel jobs, we see numerous failures like

fail [k8s.io/kubernetes@v1.22.1/test/e2e/storage/framework/volume_resource.go:259]: PVC, PV creation failed
Unexpected error:
<*errors.errorString | 0xc000b5b100>: {
s: "PV Create API error: persistentvolumes "gcepd-" is forbidden: error querying GCE PD volume e2e-afbc2157-cec6-42cf-b5e1-1ed4d05f97e1: googleapi: Error 403: Quota exceeded for quota group 'ReadGroup' and limit 'Read requests per 100 seconds' of service 'compute.googleapis.com' for consumer 'project_number:711936183532'., rateLimitExceeded",
}
PV Create API error: persistentvolumes "gcepd-" is forbidden: error querying GCE PD volume e2e-afbc2157-cec6-42cf-b5e1-1ed4d05f97e1: googleapi: Error 403: Quota exceeded for quota group 'ReadGroup' and limit 'Read requests per 100 seconds' of service 'compute.googleapis.com' for consumer 'project_number:711936183532'., rateLimitExceeded
occurred

A normal client would simply retry on such a failure after a backoff up to a specified timeout. This PR provides that functionality to the e2e tests and leaves open the possibility for adding congruent checks for other platforms.

/kind bug
/priority important-soon
@kubernetes/sig-storage-bugs
@jsafrane @gnufied

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 26, 2021
@k8s-ci-robot
Copy link
Contributor

@deads2k: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 26, 2021
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 26, 2021
@deads2k
Copy link
Contributor Author

deads2k commented Oct 26, 2021

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1452720981427621888 shows a sample failure on [sig-storage] In-tree Volumes [Driver: gcepd] [Testpattern: Pre-provisioned PV (block volmode)] volumeMode should not mount / map unused volumes in a pod

I don't know the exact flow that relays this error back (admission perhaps?), but "PV Create API error" appears in createPV as wrapping the error.

@deads2k
Copy link
Contributor Author

deads2k commented Oct 26, 2021

@jsafrane
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants