Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud NFS PVC needs to specify volumeName with k8s 1.11 #2475

Closed
jlewi opened this issue Feb 14, 2019 · 6 comments
Closed

Cloud NFS PVC needs to specify volumeName with k8s 1.11 #2475

jlewi opened this issue Feb 14, 2019 · 6 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Feb 14, 2019

Here's a PV and PVC spec for Cloud NFS.
https://github.com/jlewi/community/blob/6448276eb77ea06f971dbec470565b0b3088c349/devstats/k8s_manifests/nfs_pvc.yaml

This was based on our ksonnet prototype
https://github.com/kubeflow/kubeflow/blob/master/kubeflow/gcp/google-cloud-filestore-pv.libsonnet

On K8s 1.10 and kubeflow-testing this pattern appears to work.

But on K8s 1.11 the PVC was giving me the error

storageclass.storage.k8s.io "nfs-storage" not found

I fixed this by modifying the PVC to use the field storageClassName (instead of the annotation) and setting the field volumeName.

Here's the fixed manifest.
https://github.com/jlewi/community/blob/ed9b1673febd88e31d4baf76c6053a303d4eeed0/devstats/k8s_manifests/nfs_pvc.yaml

I think we will need to update our ksonnet prototype to make the same changes.

@jlewi jlewi added this to New in 0.5.0 via automation Feb 14, 2019
jlewi added a commit to jlewi/testing that referenced this issue Mar 7, 2019
k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Mar 8, 2019
* Improve auto_deploy to support changing zone and testing changes.

* Explicitly delete the storage deployment; the delete script won't delete it
  by design because we don't want to destroy data.

* Instead of depending on a dummy kubeconfig file call generate/apply for
  platform and then for k8s.

* For repos take them in the form ${ORG}/${NAME}@Branch. This matches
  what the other test script does. It also allows us to check out the
  repos from people's forks which makes testing changes easier.

* Move logic in checkout-snapshot.sh into repo_clone_snapshot.py
  This is cleaner then having the python script shell out to a shell script.

* Move the logic in deployment-workflow.sh into create_kf_instance.py

  * Add an option in create_kf_instance.py to read and parse the snapshot.json
    file rather than doing it in bash.

* Change the arg name to be datadir instead of nfs_mount because nfs is
  an assumption of how it is run on K8s.

* Check out the source into NFS to make it easier to debug.

* Add a bash script to set the images using YQ
* Add a wait operation for deletes and use it to wait for deletion of storage.

* Rename init.sh to auto_deploy.sh to make it more descriptive.

* Also modify auto_deploy.sh so we can run it locally.

* Use the existing worker ci image as a base image to deduplicate the
  Dockerfile.

* Attach labels to the deployment not the cluster
  * We want to use deployments not cluster to decide what to recycle
  * Deloyments are global but clusters are zonal and we want to
    be able to move to different zones to deal with issues like stockouts.

* The GCP API does return labels on deployments.

* We can figure out which deployment to recycle just by looking at the
  insertTime; we don't need to depend on deployment labels.

* Add retries to deal with kubeflow/kubeflow#1933

* Fix lint.

* * With K8s 1.11 we need to set volumeName otherwise we get storage class not found.
Related to kubeflow/kubeflow#2475

* Fix lint.

* * Change cron job to run every 12 hours.
  * This should be the current schedule but it looks like it was never checked
    in
  * We want to leave clusters up long enough to facilitate debugging.
@jlewi
Copy link
Contributor Author

jlewi commented Mar 13, 2019

/assign @zabbasi

@zabbasi
Copy link
Contributor

zabbasi commented Mar 15, 2019

#2710

@zabbasi
Copy link
Contributor

zabbasi commented Mar 15, 2019

/close

@k8s-ci-robot
Copy link
Contributor

@zabbasi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

0.5.0 automation moved this from New to Done Mar 15, 2019
@jlewi jlewi reopened this May 15, 2019
0.5.0 automation moved this from Done to New May 15, 2019
@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2019

#2710 didn't fix the issue. We need to set storageClassName on the volume as well.
We will need to cherry pick the fix onto the v0.5-branch.

jlewi added a commit to jlewi/kubeflow that referenced this issue May 15, 2019
jlewi added a commit to jlewi/kubeflow that referenced this issue May 15, 2019
jlewi added a commit to jlewi/kubeflow that referenced this issue May 16, 2019
… filestore.

Related to: kubeflow#2475

Update kf_is_ready_test to print out the namespace we are monitoring
for the deployments; makes it easier to debug issues like kubeflow#3273 that was
causing test failures.
k8s-ci-robot pushed a commit that referenced this issue May 17, 2019
… filestore. (#3268)

Related to: #2475

Update kf_is_ready_test to print out the namespace we are monitoring
for the deployments; makes it easier to debug issues like #3273 that was
causing test failures.
avdaredevil pushed a commit to avdaredevil/kubeflow that referenced this issue May 23, 2019
… filestore. (kubeflow#3268)

Related to: kubeflow#2475

Update kf_is_ready_test to print out the namespace we are monitoring
for the deployments; makes it easier to debug issues like kubeflow#3273 that was
causing test failures.
@jlewi jlewi added this to To Do in Needs Triage Jul 26, 2019
@jlewi
Copy link
Contributor Author

jlewi commented Aug 11, 2019

Fixed by #3268

@jlewi jlewi closed this as completed Aug 11, 2019
Needs Triage automation moved this from To Do to Closed Aug 11, 2019
@jlewi jlewi removed this from Closed in Needs Triage Aug 19, 2019
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
… filestore. (kubeflow#3268)

Related to: kubeflow#2475

Update kf_is_ready_test to print out the namespace we are monitoring
for the deployments; makes it easier to debug issues like kubeflow#3273 that was
causing test failures.
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 12, 2021
… filestore. (kubeflow#3268)

Related to: kubeflow#2475

Update kf_is_ready_test to print out the namespace we are monitoring
for the deployments; makes it easier to debug issues like kubeflow#3273 that was
causing test failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.5.0
  
New
Development

No branches or pull requests

3 participants