New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using OpenEBS on Azure Kubernetes Cluster #1335

Closed
jay-wilson opened this Issue Mar 13, 2018 · 9 comments

Comments

Projects
None yet
7 participants
@jay-wilson
Copy link

jay-wilson commented Mar 13, 2018

BUG/QUERY

What happened:
I was trying out the sample postgres use case as given in https://docs.openebs.io/docs/CrunchyPostgres.html

I created a three node Kubernetes cluster in AZURE of Standard_A0 type. At this point this is the sc output
kubectl get sc
NAME PROVISIONER AGE
default (default) kubernetes.io/azure-disk 59m
managed-premium kubernetes.io/azure-disk 59m

I used the openebs operator and storageclass yaml and then the postgress cluster yaml files to created the storage class and Statefulset upon which I see that one of the pod is not completely up. I believe the error points to the volume, I would appreciate any pointers.

kubectl get pods
NAME READY STATUS RESTARTS AGE
maya-apiserver-7b8f548dd8-67s6x 1/1 Running 0 36m
openebs-provisioner-7958c6d44f-g9qvr 1/1 Running 0 36m
pgset-0 0/1 ContainerCreating 0 32m
pvc-febcc15e-25d7-11e8-92c2-0a58ac1f1190-ctrl-7d7c98745-49qcm 2/2 Running 0 32m
pvc-febcc15e-25d7-11e8-92c2-0a58ac1f1190-rep-578b5bcc6b-5758m 1/1 Running 0 32m
pvc-febcc15e-25d7-11e8-92c2-0a58ac1f1190-rep-578b5bcc6b-zkhn8 1/1 Running 0 32m

Below are excerpts from kubectl describe pod pgset-0

Volumes:
pgdata:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pgdata-pgset-0
ReadOnly: false
default-token-cgpxf:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cgpxf
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s

Events:
Type Reason Age From Message


Warning FailedScheduling 25m (x3 over 25m) default-scheduler PersistentVolumeClaim is not bound: "pgdata-pgset-0" (repeated 3 times)
Normal Scheduled 25m default-scheduler Successfully assigned pgset-0 to aks-nodepool1-18777710-1
Normal SuccessfulMountVolume 25m kubelet, aks-nodepool1-18777710-1 MountVolume.SetUp succeeded for volume "default-token-cgpxf"
Warning FailedMount 43s (x11 over 23m) kubelet, aks-nodepool1-18777710-1 Unable to mount volumes for pod "pgset-0_default(febd7e43-25d7-11e8-92c2-0a58ac1f1190)": timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]
Warning FailedSync 43s (x11 over 23m) kubelet, aks-nodepool1-18777710-1 Error syncing pod

What you expected to happen:
pgset-0 pod to be running and all the pvc and pv to be available

How to reproduce it (as minimally and precisely as possible):
Create the azure resource group and kubernetes cluster
az group create --name JayPgResourceGroup --location canadacentral
az provider register -n Microsoft.ContainerService
az provider register -n Microsoft.Compute
az provider register -n Microsoft.Network

az aks create --resource-group JayPgResourceGroup --name JayPgDbReplicaSet --node-count 3 --kubernetes-version 1.8.7 --node-vm-size Standard_A0 --generate-ssh-keys

Create the openebs storage classes
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-operator.yaml
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-storageclasses.yaml

Create the crunchy data postgres as explained in https://docs.openebs.io/docs/CrunchyPostgres.html

Anything else we need to know?:
I am running on Ubuntu 16.04.3 LTS. From what I see looks like the iscsi service is running fine. Attaching the config for your perusal
sudo cat /etc/iscsi/initiatorname.iscsi (from each of the nodes)

InitiatorName=iqn.1993-08.org.debian:01:79c32a3367ac

InitiatorName=iqn.1993-08.org.debian:01:3a67dbe34663

InitiatorName=iqn.1993-08.org.debian:01:f7c3c1279dc3

Environment:

  • kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    aks-nodepool1-18777710-0 Ready agent 2h v1.8.7
    aks-nodepool1-18777710-1 Ready agent 2h v1.8.7
    aks-nodepool1-18777710-2 Ready agent 2h v1.8.7

  • kubectl get pods --all-namespaces
    NAMESPACE NAME READY STATUS RESTARTS AGE
    default maya-apiserver-7b8f548dd8-q6kqb 1/1 Running 0 2h
    default openebs-provisioner-7958c6d44f-2h9t6 1/1 Running 0 2h
    default pgset-0 0/1 ContainerCreating 0 2h
    default pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3-ctrl-6f4f6d6d7f-jzxqk 2/2 Running 0 2h
    default pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3-rep-7989bbd946-cswxf 1/1 Running 0 2h
    default pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3-rep-7989bbd946-np5xl 1/1 Running 0 2h
    kube-system heapster-669488959c-bnpst 2/2 Running 0 2h
    kube-system kube-dns-v20-5bf84586f4-hq7tz 3/3 Running 0 2h
    kube-system kube-dns-v20-5bf84586f4-nxcg6 3/3 Running 0 2h
    kube-system kube-proxy-cffj6 1/1 Running 0 2h
    kube-system kube-proxy-h6tn5 1/1 Running 0 2h
    kube-system kube-proxy-js5xt 1/1 Running 0 2h
    kube-system kube-svc-redirect-7w5jw 1/1 Running 0 2h
    kube-system kube-svc-redirect-tzs5d 1/1 Running 0 2h
    kube-system kube-svc-redirect-zdbhw 1/1 Running 0 2h
    kube-system kubernetes-dashboard-69bb965b88-hkmrk 1/1 Running 0 2h
    kube-system tunnelfront-696789ffb-df2bh 1/1 Running 0 2h

  • kubectl get services
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kubernetes ClusterIP 10.0.0.1 443/TCP 2h
    maya-apiserver-service ClusterIP 10.0.112.14 5656/TCP 2h
    pgset ClusterIP None 5432/TCP 2h
    pgset-primary ClusterIP 10.0.40.188 5432/TCP 2h
    pgset-replica ClusterIP 10.0.166.64 5432/TCP 2h
    pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3-ctrl-svc ClusterIP 10.0.20.229 3260/TCP,9501/TCP 2h

  • kubectl get sc
    NAME PROVISIONER AGE
    default (default) kubernetes.io/azure-disk 2h
    managed-premium kubernetes.io/azure-disk 2h
    openebs-cassandra openebs.io/provisioner-iscsi 2h
    openebs-es-data-sc openebs.io/provisioner-iscsi 2h
    openebs-jupyter openebs.io/provisioner-iscsi 2h
    openebs-kafka openebs.io/provisioner-iscsi 2h
    openebs-mongodb openebs.io/provisioner-iscsi 2h
    openebs-percona openebs.io/provisioner-iscsi 2h
    openebs-redis openebs.io/provisioner-iscsi 2h
    openebs-standalone openebs.io/provisioner-iscsi 2h
    openebs-standard openebs.io/provisioner-iscsi 2h
    openebs-zk openebs.io/provisioner-iscsi 2h

  • kubectl get pv
    NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
    pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3 400M RWO Delete Bound default/pgdata-pgset-0 openebs-standard 2h

  • kubectl get pvc
    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
    pgdata-pgset-0 Bound pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3 400M RWO openebs-standard 2h

  • OS (e.g. from /etc/os-release):
    cat /etc/os-release
    NAME="Ubuntu"
    VERSION="16.04.3 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.3 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a):
    Linux aks-nodepool1-18777710-2 4.13.0-1007-azure #9-Ubuntu SMP Thu Jan 25 10:47:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@ksatchit

This comment has been minimized.

Copy link
Member

ksatchit commented Mar 13, 2018

@jay-wilson , thanks for trying OpenEBS on AKS & sharing the logs ! Here is some initial analysis on this cluster:

The AKS cluster runs ubuntu 16.04 LTS with the kubelet running in a container (debian-jessie 8). The kubelet logs show the absence of the iSCSI initiator, due to which the volume is not attached to the node.

  • Kubelet logs from Node-1 (where the pgset pod is scheduled)
I0313 05:42:41.910525    7845 reconciler.go:257] operationExecutor.MountVolume started for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") 
I0313 05:42:41.910605    7845 operation_generator.go:416] MountVolume.WaitForAttach entering for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") DevicePath ""
E0313 05:42:41.910744    7845 iscsi_util.go:207] iscsi: could not read iface default error: 
E0313 05:42:41.910815    7845 nestedpendingoperations.go:264] Operation for "\"kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0\"" failed. No retries permitted until 2018-03-13 05:44:43.910784094 +0000 UTC (durationBeforeRetry 2m2s). Error: MountVolume.WaitForAttach failed for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") : executable file not found in $PATH
E0313 05:43:12.080406    7845 kubelet.go:1628] Unable to mount volumes for pod "pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)": timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]; skipping pod
E0313 05:43:12.081262    7845 pod_workers.go:182] Error syncing pod a9826973-2674-11e8-a384-0a58ac1f03e3 ("pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]

Configuring the kubelet to run w/ iSCSI utils should ideally fix this

@kmova kmova changed the title Trouble with pgset pod and volume in postgres sample Using OpenEBS on Azure Kubernetes Cluster Mar 17, 2018

@kmova kmova added the env/azure label Mar 17, 2018

@kmova

This comment has been minimized.

Copy link
Member

kmova commented Mar 17, 2018

Need to find a way to automate (#1149) installing the iscsi initiator on the kubelet container. The following steps were followed to manually install:

  • SSH into the Kubernetes Nodes
  • Identify the docker container running the kubelet using sudo docker ps.
  • Enter the kubelet container shell
    sudo docker exec -it kubelet_container_id bash
    
  • Install open-iscsi.
    apt-get update
    apt install -y open-iscsi
    

@kmova kmova added the documentation label Mar 17, 2018

@AmitKumarDas

This comment has been minimized.

Copy link
Member

AmitKumarDas commented Mar 17, 2018

@kmova @yudaykiran Just thinking aloud here.
1/ Can we patch the kubelet pod with some kubelet image that has all these?
The patch operator will do the justice of finding the exact kubelet image & replace that with iscsi-kubelet image.

2/ Can we check if pod probes can do the apt-get install stuff

All these to avoid plumbing code around ssh & thinking from docker perspective.

@yudaykiran

This comment has been minimized.

Copy link
Member

yudaykiran commented Mar 17, 2018

@AmitKumarDas kubelet is not deployed as a pod, rather, the kubelet service is run inside a hyperkube container. i don't think this can be fixed from the Kubernetes end. It is more of a cloud-init that brings the VM up and has some configuration for running the kubelet service inside this container. This is similar to how coreos runs all its services inside containers.

@AmitKumarDas

This comment has been minimized.

Copy link
Member

AmitKumarDas commented Mar 17, 2018

@yudaykiran These are more unknowns. I dont understand these... Hopefully you will explain each of them in some forms... BTW Is hyperkube same as virtual kubelet that azure guys are involved into?

@yudaykiran

This comment has been minimized.

Copy link
Member

yudaykiran commented Mar 17, 2018

@AmitKumarDas hyperkube and virtual kubelet are two different concepts.

@ksatchit

This comment has been minimized.

Copy link
Member

ksatchit commented Mar 23, 2018

@jpoon

This comment has been minimized.

Copy link

jpoon commented Apr 10, 2018

(just passing by)

To elaborate on @yudaykiran's comment. hyperkube and virtual kubelet are two different things.

  • hyperkube is the all-encompassing container that includes all of the k8s system services (kube-apiserver, kube-proxy, etc.) ref: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/hyperkube
  • virtual kubelet let's you use different runtimes (Azure Container Instances, AWS, Hyper.sh) to masquerade as a node. So instead of having your pod running on an actual server host, you can run it on one of the supported runtimes. A use case of this would be for instance handling burst traffic, instead of waiting for a new VM to spin up, you can quickly scale-out your pods and run them on ACI.
@yudaykiran

This comment has been minimized.

Copy link
Member

yudaykiran commented Jun 16, 2018

The kubernetes clusters being provisioned in AKS are now running the kubelet as a service by default. OpenEBS will get deployed without any issues w.r.t ISCSI connections. Hence closing this issue.

@yudaykiran yudaykiran closed this Jun 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment