Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgraph errors on restart #1909

Closed
tvvignesh opened this issue Oct 23, 2020 · 5 comments
Closed

dgraph errors on restart #1909

tvvignesh opened this issue Oct 23, 2020 · 5 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@tvvignesh
Copy link

tvvignesh commented Oct 23, 2020

What happened:

Hi. I am using kind within a pre-emptible VM from GCP (so, my VM and thereby kind restarts every 24 hours). I was trying to get Dgraph setup with Kind and it worked great. But, on restart of the VM or Docker service, the pod throws errors.

I am using the standard storage class (I guess it uses rancher's local path provisioner). I am not sure if it is an issue with Kind or Dgraph, so I have added all details here with all the details and logs: https://discuss.dgraph.io/t/dgraph-fails-to-start-on-restarts-with-kind-kubernetes/11104

What you expected to happen:

Dgraph works consistently even after restarts. I guess according to #148 this should work.

How to reproduce it (as minimally and precisely as possible):

  1. Create a kubernetes cluster locally using kind
  2. Deploy Dgraph (1 zero, 1 alpha, 1 ratel) using the helm chart using the inbuilt standard storage class in kind which uses rancher's provisioner (https://github.com/rancher/local-path-provisioner) with ReadWriteOnce set
helm upgrade --install --namespace <namespace> dgraph -f <path>/values.yaml  --set zero.replicaCount=1 --set alpha.replicaCount=1 --set alpha.persistence.size=200Mi --set zero.persistence.size=200Mi --set alpha.persistence.accessModes={ReadWriteOnce} --set zero.persistence.accessModes={ReadWriteOnce} --set zero.persistence.storageClass=standard --set alpha.persistence.storageClass=standard ./charts/dgraph
  1. Everything works great.
  2. Now, restart docker with sudo service docker restart
  3. Dgraph fails to start with errors (pods are running though)

It works again only after I destroy the entire cluster and create it again.

Just to validate if normal pod restart works, I tried running kubectl -n db rollout restart statefulset dgraph-dgraph-alpha and everything was great.

Anything else we need to know?:

More details, logs and other details have been added here: https://discuss.dgraph.io/t/dgraph-fails-to-start-on-restarts-with-kind-kubernetes/11104

Environment:

  • kind version: (use kind version): kind v0.9.0 go1.15.2 linux/amd64
  • Kubernetes version: (use kubectl version): v1.19.3
  • Docker version: (use docker info): 19.03.13
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.1 LTS
@tvvignesh tvvignesh added the kind/bug Categorizes issue or PR as related to a bug. label Oct 23, 2020
@BenTheElder BenTheElder changed the title Statefulset errors on restart ~Statefulset~ dgraph errors on restart Oct 28, 2020
@BenTheElder BenTheElder changed the title ~Statefulset~ dgraph errors on restart dgraph errors on restart Oct 28, 2020
@BenTheElder
Copy link
Member

Without involving dgraph, locally, if I create a PVC and a pod using it with the standard storage class, write some data to it, and then restart docker I see the pod come back with the data still there.

Since the pods are running, I don't think this is a kind bug or even a local-path-storage bug, this sounds like either an issue with your specific setup (the preemptible VMs?) or dgraph.

@BenTheElder BenTheElder added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 28, 2020
@tvvignesh
Copy link
Author

tvvignesh commented Oct 28, 2020

@BenTheElder Hmm. Thanks for confirming. I am not sure where to start looking. In my case I am using the same boot disk and attached disk with the pre-emptible VM, but I don't think the pre-emptible VM should be the issue cause this also occurs just during sudo service docker restart without the VM getting rebooted. As per your test, this suggests that the bug might be with dgraph (though I am not sure why it would happen if the PVC with same data exists even after cluster restart and Dgraph does work when doing rollout restart - strange).

Is there some place you would typically go to get the logs for debugging such failures? (I have already shared the Dgraph pod logs with the Dgraph team, anything else that would help in solving this?)

Btw, the data within the PVC does exist and is bound to the pod. For instance, if I exec inside, I find this:

1

If you notice the date, those are the folders from the PVC mounted to Dgraph pod which were created 1 day ago (before the pre-emptible VM restart). So, the data does exist in the PVC as you rightly mentioned but Dgraph fails.

Thanks. I will keep this open till the Dgraph team comes up with their side of the puzzle after which we can probably get an idea on where the issue is. Will continue debugging.

@BenTheElder
Copy link
Member

A rolling restart will normally not involve all of the pods simultaneously going down, something more like one at a time.

The data in the PVC is ultimately backed by a docker volume at the node level, which should persist with the container through restarts as long as docker's storage is persisted on the host.

Is there some place you would typically go to get the logs for debugging such failures? (I have already shared the Dgraph pod logs with the Dgraph team, anything else that would help in solving this?)

I don't think there's anything to debug on the kind end, and I don't work with dgraph.

@BenTheElder
Copy link
Member

This comment https://discuss.dgraph.io/t/dgraph-fails-to-start-on-restarts-with-kind-kubernetes/11104/13 makes it sound like dgraph cannot handly abruptly being shut down.

There's not a lot kind can do there if that's the case, we have no control over your manipulation of the docker service. KIND will not itself shut things down abruptly (unless you delete the cluster), and we can't exactly catch KILL from docker or similar so ...

KIND can guarantee that the cluster survives restarts just fine, but if applications break under less than graceful restarts, there's nothing we can do. We never force such a restart ourselves.

@tvvignesh
Copy link
Author

@BenTheElder Thanks for looking at this issue. With some help from Dgraph team as well, I have managed to get it working. Looks like the issue is neither with Kind nor with the Local path provisioner as you rightly said. I have updated the details here: https://discuss.dgraph.io/t/dgraph-fails-to-start-on-restarts-with-kind-kubernetes/11104/14?u=tvvignesh

Thanks again. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

2 participants