Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexvolumes ci-kubernetes-e2e-gci-gce-alpha-features failing #51123

Closed
wongma7 opened this issue Aug 22, 2017 · 11 comments · Fixed by #51166
Closed

Flexvolumes ci-kubernetes-e2e-gci-gce-alpha-features failing #51123

wongma7 opened this issue Aug 22, 2017 · 11 comments · Fixed by #51166
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Milestone

Comments

@wongma7
Copy link
Contributor

wongma7 commented Aug 22, 2017

tests are failing atm on GCI https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-alpha-features/14884 because VOLUME_PLUGIN_DIR=/home/kubernetes/flexvolume set in cluster/common.sh takes precedence over KUBELET_TEST_VOLUME_PLUGIN_DIR="/etc/srv/kubernetes/kubelet-plugins/volume/exec". As far as I can tell /home/kubernetes/flexvolume is not readable yet by containerized kubelet so the test needs to keep using "/etc/srv/kubernetes/kubelet-plugins/volume/exec" and the bash needs to be shuffled around to solve this.

cc @verult

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 22, 2017

/sig storage

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 22, 2017
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 22, 2017

This is more an acknowledgment of the issue than anything, can't write a PR until later

@ericchiang
Copy link
Contributor

cc @kubernetes/sig-storage-bugs
/kind e2e-test-failure

@ericchiang ericchiang added kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Aug 22, 2017
@ericchiang ericchiang added this to the v1.8 milestone Aug 22, 2017
@jdumars
Copy link
Member

jdumars commented Aug 23, 2017

@kubernetes/sig-storage-bugs this is currently blocking alpha.3 release

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 23, 2017

@jdumars we are just waiting for #51166 approval

k8s-github-robot pushed a commit that referenced this issue Aug 24, 2017
Automatic merge from submit-queue (batch tested with PRs 51113, 46597, 50397, 51052, 51166)

Changing Flexvolume plugin directory to a location reachable by containerized k8s components.

**What this PR does / why we need it**: Testing Flexvolume requires plugins to be installed at a location which is accessible by containerized k8s components (such as controller-manager).

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #51123

```release-note
NONE
```

/assign @wongma7 @msau42
/release-note-none
/sig storage
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

@jdumars I think they’re still failing, not sure what the release process is, but these tests started running only yesterday and so failure indicates something wrong with the test themselves, they shouldn’t block release. (They succeed on gce cluster with Debian nodes but fail on gci)

@jdumars
Copy link
Member

jdumars commented Aug 24, 2017

@wongma7 well, let's see if we can get the kinks worked out before the beta release on 9/27 or we will need to disable it from release-blocking

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

/assign
/reopen

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

non-attachable:

I0824 17:49:28.426001 8562 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume "flex-volume-0" (UniqueName: "flexvolume-k8s/dummy-e2e-tests-flexvolume-wf48v/934e3eba-88f4-11e7-a3f9-42010a800002-flex-volume-0") pod "flex-client" (UID: "934e3eba-88f4-11e7-a3f9-42010a800002")

edit:
VerifyControllerAttachedVolume must succeed since the plugin is non-attachalbe and no error is reported. But the voluem is absent from node.volumesInUse

attachable:

controller-manager.log:

I0824 17:44:06.955342 4 operation_generator.go:275] AttachVolume.Attach succeeded for volume "flex-volume-0" (UniqueName: "flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-j8962/flex-volume-0") from node "bootstrap-e2e-minion-group-9913"
W0824 17:44:06.955385 4 plugin-defaults.go:32] flexVolume driver k8s/dummy-attachable-e2e-tests-flexvolume-j8962: using default GetVolumeName for volume 0x1c5df60

kubelet.log:

I0824 17:43:59.619109 3728 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume "flex-volume-0" (UniqueName: "flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-j8962/flex-volume-0") pod "flex-client" (UID: "cf5d743d-88f3-11e7-a3f9-42010a800002")
E0824 17:43:59.619238 3728 nestedpendingoperations.go:262] Operation for ""flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-j8962/flex-volume-0"" failed. No retries permitted until 2017-08-24 17:44:00.119202945 +0000 UTC (durationBeforeRetry 500ms). Error: Volume has not been added to the list of VolumesInUse in the node's volume status for volume "flex-volume-0" (UniqueName: "flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-j8962/flex-volume-0") pod "flex-client" (UID: "cf5d743d-88f3-11e7-a3f9-42010a800002")

Seems there is something wrong with VerifyControllerAttachedVolume, still at a loss as to why it's showing up in GCI but not Debian

edit:
get node -o yaml shows:

volumesInUse:
    - flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-z8xc4/flex-volume-0

yet it errors because the volume hasn't been added to volumesInUse???

Aug 24 20:31:42 mawong-e2e-minion-group-npjh kubelet[9431]: E0824 20:31:42.410308 9431 nestedpendingoperations.go:262] Operation for ""flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-z8xc4/flex-volume-0"" failed. No retries permitted until 2017-08-24 20:31:44.410269002 +0000 UTC (durationBeforeRetry 2s). Error: Volume has not been added to the list of VolumesInUse in the node's volume status for volume "flex-volume-0" (UniqueName: "flexvolume-k8s/dummy-attachable-e2e-tests-flexvolume-z8xc4/flex-volume-0") pod "flex-client" (UID: "3c60da17-890b-11e7-9392-42010a800002")

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

Never mind the GCI vs Debian issue, our test fix would have worked but in between 6bb928a3dff60c and 2b08d1e5a18ce3, something else changed and is causing both GCI and Debian tests to fail.

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 24, 2017

#50843 might be the cause, so we might be looking at legitimate failures caught by the test at this point (yay?) and not faulty tests. investigating...

mtanino pushed a commit to mtanino/kubernetes that referenced this issue Aug 28, 2017
Automatic merge from submit-queue

Set flexvolumeplugin.host so that it's not nil

@TerraTech @MikaelCluseau  @chakri-nelluri @verult

I assume this line was removed inadvertently, without plugin.host set the flexvolume silently fails at Mount/Attach* time. kubernetes#50843

kubernetes#51123

Please review, thanks!

```release-note
NONE
```
@wongma7 wongma7 closed this as completed Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
4 participants