New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit test flake: controller/persistentvolume #25967

Closed
pwittrock opened this Issue May 20, 2016 · 7 comments

Comments

Projects
None yet
5 participants
@pwittrock
Member

pwittrock commented May 20, 2016

Not many details in the logs:
FAIL k8s.io/kubernetes/pkg/controller/persistentvolume 2.721s

https://console.cloud.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/25916/kubernetes-pull-test-unit-integration/26923/

@ncdc

This comment has been minimized.

Member

ncdc commented May 20, 2016

--- FAIL: TestControllerSync (0.15s)
    framework_test.go:711: Test "5-2 - complete bind": Claim check failed [A-expected, B-got result]: {"claim5-2":{"metadata":{"name":"claim5-2","namespace":"default","selfLink":"/api/v1/pvc/claim5-2","uid":"uid5-2","creationTimestamp":null

        A: ,"annotations":{"pv.kubernetes.io/bind-completed":"yes","pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"accessModes":["ReadWriteOnce","ReadOnlyMany"],"resources":{"requests":{"storage":"1Gi"}},"volumeName":"volume5-2"},"status":{"phase":"Bound"}}}

        B: },"spec":{"accessModes":["ReadWriteOnce","ReadOnlyMany"],"resources":{"requests":{"storage":"1Gi"}}},"status":{"phase":"Pending"}}}

    framework_test.go:715: Test "5-2 - complete bind": Volume check failed [A-expected, B-got]: {"volume5-2":{"metadata":{"name":"volume5-2","creationTimestamp":null

        A: ,"annotations":{"pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"capacity":{"storage":"10Gi"},"gcePersistentDisk":{"pdName":""},"accessModes":["ReadWriteOnce","ReadOnlyMany"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"claim5-2","uid":"uid5-2","apiVersion":"v1"},"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Bound"}}}

        B: },"spec":{"capacity":{"storage":"10Gi"},"gcePersistentDisk":{"pdName":""},"accessModes":["ReadWriteOnce","ReadOnlyMany"],"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Available"}}}
@ncdc

This comment has been minimized.

Member

ncdc commented May 20, 2016

@kubernetes/sig-storage

@rootfs

This comment has been minimized.

Member

rootfs commented May 20, 2016

I ran tests a few times against recent master but couldn't see this, could it be an edgy case?
@jsafrane

@jsafrane jsafrane self-assigned this May 21, 2016

@jsafrane

This comment has been minimized.

Member

jsafrane commented May 21, 2016

i'll look at it

@jsafrane jsafrane changed the title from integration flake: controller/persistentvolume to unit test flake: controller/persistentvolume May 21, 2016

@jsafrane

This comment has been minimized.

Member

jsafrane commented May 23, 2016

#25881 should help a lot, still there will be a short window where the test could get wrong. Patch will follow when #25881 is merged.

@quinton-hoole

This comment has been minimized.

Member

quinton-hoole commented May 24, 2016

Today I saw a very similar error to the one reported in #26067, but seeing as that was closed as a dupe of this, adding details here:

#26105 (comment)

https://console.cloud.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/26105/kubernetes-pull-test-unit-integration/27563/

E0523 22:12:02.268570   29064 controller_base.go:259] PersistentVolumeController could not update claim "default/fake-pvc": persistentvolumeclaims "fake-pvc" not found
panic: test timed out after 10m0s

goroutine 19857 [running]:
panic(0x1b5e200, 0xc82a5a1b10)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
testing.startAlarm.func1()
    /usr/local/go/src/testing/testing.go:725 +0x14b
created by time.goFunc
    /usr/local/go/src/time/sleep.go:129 +0x3a

goroutine 1 [chan receive, 7 minutes]:
testing.RunTests(0x2940538, 0x3567fa0, 0x32, 0x32, 0xc8202c5501)
    /usr/local/go/src/testing/testing.go:583 +0x8d2
testing.(*M).Run(0xc823baff08, 0xc820172d80)
    /usr/local/go/src/testing/testing.go:515 +0x81
main.main()
    k8s.io/kubernetes/test/integration/_test/_testmain.go:152 +0x117

https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/26105/kubernetes-pull-test-unit-integration/27563/

@jsafrane

This comment has been minimized.

Member

jsafrane commented May 25, 2016

@quinton-hoole, this looks like something new. Let's keep this issue open for this unit test flake:

--- FAIL: TestControllerSync (0.15s)
    framework_test.go:711: Test "5-2 - complete bind": Claim check failed [A-expected, B-got result]: {"claim5-2":{"metadata":{"name":"claim5-2","namespace":"default","selfLink":"/api/v1/pvc/claim5-2","uid":"uid5-2","creationTimestamp":null

        A: ,"annotations":{"pv.kubernetes.io/bind-completed":"yes","pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"accessModes":["ReadWriteOnce","ReadOnlyMany"],"resources":{"requests":{"storage":"1Gi"}},"volumeName":"volume5-2"},"status":{"phase":"Bound"}}}

        B: },"spec":{"accessModes":["ReadWriteOnce","ReadOnlyMany"],"resources":{"requests":{"storage":"1Gi"}}},"status":{"phase":"Pending"}}}

    framework_test.go:715: Test "5-2 - complete bind": Volume check failed [A-expected, B-got]: {"volume5-2":{"metadata":{"name":"volume5-2","creationTimestamp":null

        A: ,"annotations":{"pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"capacity":{"storage":"10Gi"},"gcePersistentDisk":{"pdName":""},"accessModes":["ReadWriteOnce","ReadOnlyMany"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"claim5-2","uid":"uid5-2","apiVersion":"v1"},"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Bound"}}}

        B: },"spec":{"capacity":{"storage":"10Gi"},"gcePersistentDisk":{"pdName":""},"accessModes":["ReadWriteOnce","ReadOnlyMany"],"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Available"}}}

I filled #26256 for integration test flake you reported.

k8s-merge-robot added a commit that referenced this issue May 25, 2016

Merge pull request #26123 from brendandburns/flaker
Automatic merge from submit-queue

Add some extra checking in the tests to prevent flakes.

Attempts to fix #25967

The hypothesis is that somehow waitTest() catches an idle that occurs before all changes have been applied.  This will block until the expected number of changes have arrived.

jsafrane added a commit to jsafrane/kubernetes that referenced this issue May 30, 2016

Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/
deleting or provisioning volumes and claims. This was done using blocking
initial 'xxx added' from going through syncClaim/syncVolume. However, when
the caches were full, the controller waited for the next sync period to do
actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then
processes initial 'xxx added' events to reconcile the world and bind/recycle/
delete/provision stuff, resulting in faster binding after startup.

Fixes kubernetes#25967 (properly)

k8s-merge-robot added a commit that referenced this issue Jun 3, 2016

Merge pull request #26518 from jsafrane/initial-sync
Automatic merge from submit-queue

Fill controller caches on startup

The controller needs to fill its caches before it starts binding/recycling/ deleting or provisioning volumes and claims. This was done using blocking initial 'xxx added' from going through syncClaim/syncVolume. However, when the caches were full, the controller waited for the next sync period to do actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then processes initial 'xxx added' events to reconcile the world and bind/recycle/ delete/provision stuff, resulting in faster binding after startup.

Fixes #25967 (properly)

mtaufen added a commit to mtaufen/kubernetes that referenced this issue Jun 6, 2016

Fill controller caches on startup
The controller needs to fill its caches before it starts binding/recycling/
deleting or provisioning volumes and claims. This was done using blocking
initial 'xxx added' from going through syncClaim/syncVolume. However, when
the caches were full, the controller waited for the next sync period to do
actual binding/recycling etc.

In this patch, the controller fills its caches directly from etcd and then
processes initial 'xxx added' events to reconcile the world and bind/recycle/
delete/provision stuff, resulting in faster binding after startup.

Fixes kubernetes#25967 (properly)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment