Deployment on AWS fails waiting for persistent volume #17

aktech · 2020-06-24T16:24:05Z

Steps to reproduce:

Render the aws project:
Run terraform init then terraform apply in the infrastructure

First I got this:

Error: namespaces "dev" not found

  on .terraform/modules/kubernetes-conda-store-mount/modules/kubernetes/nfs-mount/main.tf line 38, in resource "kubernetes_persistent_volume_claim" "main":
  38: resource "kubernetes_persistent_volume_claim" "main" {

Then I did terraform apply again, then I got the following every-time:

Error: timeout while waiting for state to become 'Bound' (last state: 'Pending', timeout: 5m0s)

  on .terraform/modules/kubernetes-conda-store-server/modules/kubernetes/services/conda-store/main.tf line 1, in resource "kubernetes_persistent_volume_claim" "main":
   1: resource "kubernetes_persistent_volume_claim" "main" {

The text was updated successfully, but these errors were encountered:

aktech · 2020-06-24T18:12:40Z

On terraform destroy:

Error: error deleting subnet (subnet-0513a3d069c573aa4): timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)

costrouc · 2020-06-30T22:38:22Z

I ran into this same issue and here is the solution/workaround that I found. I would like for qhub to automatically handle this use case but at the same time I "like" that this is difficult since resizing a pvc will delete the old one thus deleting all users data. Eventually there needs to be a better way to do this... gcp does not support disk resizing but other storage providers do e.g. rook/ceph.

Here is the workaround:

kubectl delete -n dev deployments conda-store-conda-store nfs-server-nfs
kubectl delete -n dev pvc conda-store-dev-share nfs-mount-dev-share
kubectl delete -n dev pv conda-store-dev-share nfs-mount-dev-share

Then reapply terraform deployment and it should create the needed volumes...what I sorta like from this is that we are forcing the users to delete the shared filesystems making losing data harder (kinda a "this bug is a feature :)" ).

aktech · 2020-07-01T08:13:41Z

Interesting, but the problem I faced with AWS was on a new deployment, which didn't had any cluster already present, so this problem shouldn't happen there, right?

aktech · 2020-07-01T14:05:24Z

It seems they are still there after running the above commands. Seems like following hacks works:

tylerpotts · 2020-09-21T14:53:48Z

Document solution, and how to delete the persistent volume. Will leave the solution as manual so that user data is not automatically deleted.

filippo82 · 2020-10-24T01:20:42Z

Hi all, is there any update on this issue?

tylerpotts · 2020-10-24T13:28:34Z

@filippo82 There is. With the newer terraform/kubernetes update, persistent volume side can be increased without being deleted/destroyed. I've verified this as of this past week

filippo82 · 2020-10-24T15:00:41Z

Hi @tylerpotts I believe that the issue which @aktech was experiencing in June (and which I was experiencing yesterday too) was related (somehow, I think) to the order of creation of resources by Terraform. This should be now fixed by this #129. First building the Kubernets cluster with terraform apply -auto-approve -target=module.kubernetes -target=module.kubernetes-initialization -target=module.kubernetes-ingress and then everything else with a general terraform apply.

So this is issue is now fixed for me and, I believe, it is fully taken care of by qhub deploy and probably can be closed.

The issue I am having is to terraform destroy the QHub deployment. It has been a nightmare so far :/

I've opened this issue to discuss that: #144

Best,
-Filippo

tylerpotts · 2020-10-24T15:02:17Z

@filippo82 Thanks for the clarification. I'll close out this issue

prasunanand · 2020-11-06T10:55:12Z

Solution: The issue is with conda enironment syntax. So the conda environment is not ready. Hence the PV fails. If the environment is taking too long to build, this error may still exist.

costrouc transferred this issue from Quansight/qhub-ops Aug 18, 2020

tylerpotts added the difficulty:low label Sep 21, 2020

filippo82 mentioned this issue Oct 24, 2020

terraform destroy fails - AWS #144

Closed

tylerpotts closed this as completed Oct 24, 2020

marcelovilla mentioned this issue Feb 21, 2024

[BUG] - Nebari init fails when using a GitLab repo for the GitOps approach #2268

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment on AWS fails waiting for persistent volume #17

Deployment on AWS fails waiting for persistent volume #17

aktech commented Jun 24, 2020

aktech commented Jun 24, 2020

costrouc commented Jun 30, 2020

aktech commented Jul 1, 2020

aktech commented Jul 1, 2020

tylerpotts commented Sep 21, 2020

filippo82 commented Oct 24, 2020

tylerpotts commented Oct 24, 2020

filippo82 commented Oct 24, 2020

tylerpotts commented Oct 24, 2020

prasunanand commented Nov 6, 2020 •

edited

Loading

Deployment on AWS fails waiting for persistent volume #17

Deployment on AWS fails waiting for persistent volume #17

Comments

aktech commented Jun 24, 2020

Steps to reproduce:

aktech commented Jun 24, 2020

costrouc commented Jun 30, 2020

aktech commented Jul 1, 2020

aktech commented Jul 1, 2020

tylerpotts commented Sep 21, 2020

filippo82 commented Oct 24, 2020

tylerpotts commented Oct 24, 2020

filippo82 commented Oct 24, 2020

tylerpotts commented Oct 24, 2020

prasunanand commented Nov 6, 2020 • edited Loading

prasunanand commented Nov 6, 2020 •

edited

Loading