New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Tests: [sig-storage] 5000-node correctness tests #66376

Closed
shyamjvs opened this Issue Jul 19, 2018 · 16 comments

Comments

@shyamjvs
Member

shyamjvs commented Jul 19, 2018

Failing Job:

sig-release-master-blocking#gce-scale-correctness

These tests started flaking recently (they were consistently green for a long time):

  • Mounted volume expand[Slow] Should verify mounted devices can be resized
  • Volume expand [Slow] Verify if editing PVC allows resize
  • Subpath [Volume type: gcePDPVC] should support restarting containers using file as subpath [Slow]
  • Subpath [Volume type: gcePDPVC] should fail for new directories when readOnly specified in the volumeSource
  • Subpath [Volume type: gcePDPVC] should fail if subpath with backstepping is outside the volume [Slow]
  • Volumes PD should be mountable with ext4

Affected runs:

/kind bug
/sig storage
/priority important-soon
/milestone v1.12
/cc @tpepper @AishSundar @mohammedzee1000

@msau42 - Could you PTAL or delegate? Thanks.

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Jul 19, 2018

Member

Volume expansion tests have been fixed by #66324

Member

msau42 commented Jul 19, 2018

Volume expansion tests have been fixed by #66324

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Jul 19, 2018

Member

/assign @davidz627
can you investigate the remaining failures?

Member

msau42 commented Jul 19, 2018

/assign @davidz627
can you investigate the remaining failures?

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Jul 19, 2018

Contributor

[MILESTONENOTIFIER] Milestone Issue Needs Approval

@davidz627 @shyamjvs @kubernetes/sig-storage-misc

Action required: This issue must have the status/approved-for-milestone label applied by a SIG maintainer. If the label is not applied within 4 days, the issue will be moved out of the v1.12 milestone.

Issue Labels
  • sig/storage: Issue will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help
Contributor

k8s-merge-robot commented Jul 19, 2018

[MILESTONENOTIFIER] Milestone Issue Needs Approval

@davidz627 @shyamjvs @kubernetes/sig-storage-misc

Action required: This issue must have the status/approved-for-milestone label applied by a SIG maintainer. If the label is not applied within 4 days, the issue will be moved out of the v1.12 milestone.

Issue Labels
  • sig/storage: Issue will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the issue owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

k8s-merge-robot added a commit that referenced this issue Jul 20, 2018

Merge pull request #66405 from davidz627/fix/subpathTestTimeout
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump subpath test pod timeout to 5 minutes

Fixes: #66376 

It seems that sometimes `Attach` on GCP side can take multiple minutes (when recently detached from another node) and therefore the old timeout of 1 minute is not long enough. 5 minutes is standard elsewhere and seems to be long enough for the long tail of GCE Attach time.

Also the ext4 test was really hard to debug because all the old PD tests call all the pods "pd-injector" and all the volume mounts "pd-volume." Made a change to make that more readable by adding a length 4 random string suffix.

/kind flake
/sig storage

/assign @msau42 

```release-note
NONE
```
@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Aug 16, 2018

Member

Reopening this issue, as a bunch of storage-related e2es started flaking in our large-scale jobs recently:

/reopen

Member

shyamjvs commented Aug 16, 2018

Reopening this issue, as a bunch of storage-related e2es started flaking in our large-scale jobs recently:

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Aug 16, 2018

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Aug 16, 2018

Member

I think the latest failures are caused by gce API rate limiting: #67348

/assign @bowei @jingax10

Member

msau42 commented Aug 16, 2018

I think the latest failures are caused by gce API rate limiting: #67348

/assign @bowei @jingax10

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Aug 16, 2018

Member

It seems to only be failing on scalability test suites. I wonder if it's due to some different cidr configuration

Member

msau42 commented Aug 16, 2018

It seems to only be failing on scalability test suites. I wonder if it's due to some different cidr configuration

@tpepper

This comment has been minimized.

Show comment
Hide comment
@tpepper

tpepper Aug 20, 2018

Contributor

/sig scalability

Contributor

tpepper commented Aug 20, 2018

/sig scalability

@guineveresaenger

This comment has been minimized.

Show comment
Hide comment
@guineveresaenger

guineveresaenger Aug 20, 2018

Contributor

/kind failing-test

Contributor

guineveresaenger commented Aug 20, 2018

/kind failing-test

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Aug 27, 2018

Member

/status approved-for-milestone

Friendly ping. This is affecting release-blocking scalability job.

Member

shyamjvs commented Aug 27, 2018

/status approved-for-milestone

Friendly ping. This is affecting release-blocking scalability job.

@dims

This comment has been minimized.

Show comment
Hide comment
@dims

dims Aug 27, 2018

Member

hey @shyamjvs what needs to be done here? by whom?

Member

dims commented Aug 27, 2018

hey @shyamjvs what needs to be done here? by whom?

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Aug 27, 2018

Member

@michelle192837 Could you comment on what needs to be done here and by whom?

Member

shyamjvs commented Aug 27, 2018

@michelle192837 Could you comment on what needs to be done here and by whom?

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Aug 27, 2018

Member

@bowei @jingax10 can you guys take a look at the API quota issue? This is happening consistently in the scale test environment

Member

msau42 commented Aug 27, 2018

@bowei @jingax10 can you guys take a look at the API quota issue? This is happening consistently in the scale test environment

@spiffxp spiffxp changed the title from Flaky Tests: [sig-storage] 5000-node correctness tests to Failing Tests: [sig-storage] 5000-node correctness tests Aug 27, 2018

@guineveresaenger

This comment has been minimized.

Show comment
Hide comment
@guineveresaenger

guineveresaenger Sep 4, 2018

Contributor

@shyamjvs @msau42 as Code Freeze is today, this needs to be of priority critical-urgent. Please triage!

cc @saad-ali @childsb @wojtek-t @countspongebob

Contributor

guineveresaenger commented Sep 4, 2018

@shyamjvs @msau42 as Code Freeze is today, this needs to be of priority critical-urgent. Please triage!

cc @saad-ali @childsb @wojtek-t @countspongebob

@msau42

This comment has been minimized.

Show comment
Hide comment
@msau42

msau42 Sep 4, 2018

Member

It looks like the latest run on 9/1 has passed.

Member

msau42 commented Sep 4, 2018

It looks like the latest run on 9/1 has passed.

@davidz627

This comment has been minimized.

Show comment
Hide comment
@davidz627

davidz627 Sep 4, 2018

Collaborator

Marking closed as the fixes have been merged and the test runs are now passing
/close

Collaborator

davidz627 commented Sep 4, 2018

Marking closed as the fixes have been merged and the test runs are now passing
/close

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Sep 4, 2018

Contributor

@davidz627: Closing this issue.

In response to this:

Marking closed as the fixes have been merged and the test runs are now passing
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor

k8s-ci-robot commented Sep 4, 2018

@davidz627: Closing this issue.

In response to this:

Marking closed as the fixes have been merged and the test runs are now passing
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment