Skip to content

fix(cleanup) : Fixed creation of cleanUp Job#363

Merged
akhilerm merged 4 commits intoopenebs-archive:masterfrom
rahulchheda:tolerations
Jan 10, 2020
Merged

fix(cleanup) : Fixed creation of cleanUp Job#363
akhilerm merged 4 commits intoopenebs-archive:masterfrom
rahulchheda:tolerations

Conversation

@rahulchheda
Copy link
Copy Markdown
Contributor

@rahulchheda rahulchheda commented Jan 9, 2020

- Added function to get Node Object using Cleaner Operator
- Added function to get Tolerations from the Node Taints
- Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.

Signed-off-by: Rahul M Chheda rahul.chheda@mayadata.io

Logs: These are sequential logs, will the steps I performed:

rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS   AGE
maya-apiserver-5865d7d548-bmzck                1/1     Running   0          2d
openebs-admission-server-684d455954-zvgrv      1/1     Running   0          2d
openebs-localpv-provisioner-58f5df5458-ttmw8   1/1     Running   0          2d
openebs-ndm-8c4cs                              1/1     Running   0          21s
openebs-ndm-dmf9f                              1/1     Running   0          21s
openebs-ndm-gw9bx                              1/1     Running   0          21s
openebs-ndm-operator-8456cf565c-kh9mk          1/1     Running   0          2m2s
openebs-provisioner-594bd6cf78-4zqf5           1/1     Running   0          2d
openebs-snapshot-operator-777cfb86df-kpzs8     2/2     Running   0          2d
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl apply -f bdc.yaml -n openebs
blockdeviceclaim.openebs.io/example-blockdeviceclaim created
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get bdc -n openebs
NAME                       BLOCKDEVICENAME                                PHASE   AGE
example-blockdeviceclaim   blockdevice-860ab5d1205c0f2891d5bbfea2e223e5   Bound   10s
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS   AGE
maya-apiserver-5865d7d548-bmzck                1/1     Running   0          2d
openebs-admission-server-684d455954-zvgrv      1/1     Running   0          2d
openebs-localpv-provisioner-58f5df5458-ttmw8   1/1     Running   0          2d
openebs-ndm-8c4cs                              1/1     Running   0          51s
openebs-ndm-dmf9f                              1/1     Running   0          51s
openebs-ndm-gw9bx                              1/1     Running   0          51s
openebs-ndm-operator-8456cf565c-kh9mk          1/1     Running   0          2m32s
openebs-provisioner-594bd6cf78-4zqf5           1/1     Running   0          2d
openebs-snapshot-operator-777cfb86df-kpzs8     2/2     Running   0          2d
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl taint nodes --all user=pass:NoSchedule
node/gke-standard-cluster-4-default-pool-ad827b74-jvcc tainted
node/gke-standard-cluster-4-default-pool-ad827b74-rj80 tainted
node/gke-standard-cluster-4-default-pool-ad827b74-w2j4 tainted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl delete bdc example-blockdeviceclaim -n openebs
kublockdeviceclaim.openebs.io "example-blockdeviceclaim" deleted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                                         READY   STATUS      RESTARTS   AGE
cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-gmn2q   0/1     Completed   0          3s
maya-apiserver-5865d7d548-bmzck                              1/1     Running     0          2d
openebs-admission-server-684d455954-zvgrv                    1/1     Running     0          2d
openebs-localpv-provisioner-58f5df5458-ttmw8                 1/1     Running     0          2d
openebs-ndm-8c4cs                                            1/1     Running     0          116s
openebs-ndm-dmf9f                                            1/1     Running     0          116s
openebs-ndm-gw9bx                                            1/1     Running     0          116s
openebs-ndm-operator-8456cf565c-kh9mk                        1/1     Running     0          3m37s
openebs-provisioner-594bd6cf78-4zqf5                         1/1     Running     0          2d
openebs-snapshot-operator-777cfb86df-kpzs8                   2/2     Running     0          2d
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                                         READY   STATUS      RESTARTS   AGE
cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-gmn2q   0/1     Completed   0          5s
maya-apiserver-5865d7d548-bmzck                              1/1     Running     0          2d
openebs-admission-server-684d455954-zvgrv                    1/1     Running     0          2d
openebs-localpv-provisioner-58f5df5458-ttmw8                 1/1     Running     0          2d
openebs-ndm-8c4cs                                            1/1     Running     0          118s
openebs-ndm-dmf9f                                            1/1     Running     0          118s
openebs-ndm-gw9bx                                            1/1     Running     0          118s
openebs-ndm-operator-8456cf565c-kh9mk                        1/1     Running     0          3m39s
openebs-provisioner-594bd6cf78-4zqf5                         1/1     Running     0          2d
openebs-snapshot-operator-777cfb86df-kpzs8                   2/2     Running     0          2d
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ 

    - Added function to get Node Object using Cleaner Operator
    - Added function to get Tolerations from the Node Taints
    - Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
Comment thread pkg/cleaner/jobcontroller.go Outdated
err := c.client.Get(context.TODO(), objKey, job)

err = c.client.Delete(context.TODO(), job, client.PropagationPolicy(metav1.DeletePropagationForeground))
//err = c.client.Delete(context.TODO(), job, client.PropagationPolicy(metav1.DeletePropagationForeground))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for testing purpose, let me fix this.

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 9, 2020

Codecov Report

Merging #363 into master will decrease coverage by 0.26%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #363      +/-   ##
========================================
- Coverage   51.27%    51%   -0.27%     
========================================
  Files          55     55              
  Lines        2637   2637              
========================================
- Hits         1352   1345       -7     
- Misses       1176   1182       +6     
- Partials      109    110       +1
Impacted Files Coverage Δ
cmd/ndm_daemonset/probe/udevprobe.go 60.3% <0%> (-5.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cdebca...9cb9da5. Read the comment docs.

Comment thread pkg/cleaner/cleaner.go Outdated
"github.com/openebs/node-disk-manager/pkg/apis/openebs/v1alpha1"
"sigs.k8s.io/controller-runtime/pkg/client"

"github.com/openebs/node-disk-manager/cmd/ndm_daemonset/controller"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, this shouldn't import i.e inside pkg structure cmd/ shouldn't be imported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observed this from pkg/cleanerjobcontroller.go, let me try to get around it.

Comment thread pkg/cleaner/cleaner.go Outdated
job, err := NewCleanupJob(bd, volumeMode, c.Namespace)

//retreive node Object to pass tolerations to the Job
selectedNode, err := c.getNodeObjectByNodeName(bd.Labels[controller.KubernetesHostNameLabel])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only place that you are using controller to get the constant, then it is better to directly use the constant then import it. Also should this be HostName or nodeName?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @akhilerm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give the hostname @kmova and @rahulchheda . The method should be getNodeObjectByHostName. I think in localpv also we are using the same method. We can filter from nodes based on the hostname label.

Copy link
Copy Markdown
Contributor

@akhilerm akhilerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either hostname should be used, or spec.nodeAttributes.nodeName can be used to get the exact node name.

Comment thread pkg/cleaner/cleaner.go Outdated
job, err := NewCleanupJob(bd, volumeMode, c.Namespace)

//retreive node Object to pass tolerations to the Job
selectedNode, err := c.getNodeObjectByNodeName(bd.Labels[controller.KubernetesHostNameLabel])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give the hostname @kmova and @rahulchheda . The method should be getNodeObjectByHostName. I think in localpv also we are using the same method. We can filter from nodes based on the hostname label.

Comment thread pkg/cleaner/cleaner.go
@rahulchheda rahulchheda changed the title [Fix] ndm-operator : Fixed creation of cleanUp pod (fix) ndm-operator : Fixed creation of cleanUp pod Jan 9, 2020
    - removed cmd dependencies created
    - To do the above, moved function to GetNodeName jobController.
    - NodeName now is fetched from spec.NodeAttributes.NodeName

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
Copy link
Copy Markdown
Contributor Author

@rahulchheda rahulchheda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Updated Testing Logs:

rahul_chheda_mayadata_io@rahulchheda:~$ kubectl get bd -n openebs
NAME                                           NODENAME                                            SIZE          CLAIMSTATE   STATUS   AGE
blockdevice-693cb5e38a3bae3ea9bf1a9854cdbad5   gke-standard-cluster-4-default-pool-ad827b74-jvcc   10737418240   Unclaimed    Active   5s
blockdevice-860ab5d1205c0f2891d5bbfea2e223e5   gke-standard-cluster-4-default-pool-ad827b74-w2j4   10737418240   Unclaimed    Active   14s
blockdevice-9912682a2d1c2f85e77a03d311c44c5a   gke-standard-cluster-4-default-pool-ad827b74-rj80   10737418240   Unclaimed    Active   17s
rahul_chheda_mayadata_io@rahulchheda:~$ ls
chaosEngine.yaml         litmus2SA.yaml          metac       nginxDefault.yaml  saDefault.yaml        snap
litmus2ClusterRole.yaml  litmusClusterRole.yaml  new_e.yaml  openebs            saDefaultLitmus.yaml
rahul_chheda_mayadata_io@rahulchheda:~$ cd openebs/
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl apply -f bdc.yaml -n openebs
kblockdeviceclaim.openebs.io/example-blockdeviceclaim created
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get bd -n openebs
NAME                                           NODENAME                                            SIZE          CLAIMSTATE   STATUS   AGE
blockdevice-693cb5e38a3bae3ea9bf1a9854cdbad5   gke-standard-cluster-4-default-pool-ad827b74-jvcc   10737418240   Unclaimed    Active   52s
blockdevice-860ab5d1205c0f2891d5bbfea2e223e5   gke-standard-cluster-4-default-pool-ad827b74-w2j4   10737418240   Claimed      Active   1m
blockdevice-9912682a2d1c2f85e77a03d311c44c5a   gke-standard-cluster-4-default-pool-ad827b74-rj80   10737418240   Unclaimed    Active   1m
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n opeenbs
No resources found in opeenbs namespace.
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                           READY   STATUS    RESTARTS   AGE
maya-apiserver-5865d7d548-bmzck                1/1     Running   0          2d17h
openebs-admission-server-684d455954-zvgrv      1/1     Running   0          2d17h
openebs-localpv-provisioner-58f5df5458-ttmw8   1/1     Running   0          2d17h
openebs-ndm-6s4r2                              1/1     Running   0          87s
openebs-ndm-j7jhm                              1/1     Running   0          89s
openebs-ndm-operator-8456cf565c-tq98h          1/1     Running   0          2m4s
openebs-ndm-pkp7s                              1/1     Running   0          78s
openebs-provisioner-594bd6cf78-4zqf5           1/1     Running   0          2d17h
openebs-snapshot-operator-777cfb86df-kpzs8     2/2     Running   0          2d17h
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl taint nodes --all user=pass:NoSchedule
node/gke-standard-cluster-4-default-pool-ad827b74-jvcc tainted
node/gke-standard-cluster-4-default-pool-ad827b74-rj80 tainted
node/gke-standard-cluster-4-default-pool-ad827b74-w2j4 tainted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl taint nodes --all user123=pass123:NoSchedule
node/gke-standard-cluster-4-default-pool-ad827b74-jvcc tainted
node/gke-standard-cluster-4-default-pool-ad827b74-rj80 tainted
node/gke-standard-cluster-4-default-pool-ad827b74-w2j4 tainted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get bdc -n openebs
NAME                       BLOCKDEVICENAME                                PHASE   AGE
example-blockdeviceclaim   blockdevice-860ab5d1205c0f2891d5bbfea2e223e5   Bound   1m
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl delete bdc example-blockdeviceclaim -n openebs
blockdeviceclaim.openebs.io "example-blockdeviceclaim" deleted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME                                                         READY   STATUS      RESTARTS   AGE
cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj   0/1     Completed   0          4s
maya-apiserver-5865d7d548-bmzck                              1/1     Running     0          2d17h
openebs-admission-server-684d455954-zvgrv                    1/1     Running     0          2d17h
openebs-localpv-provisioner-58f5df5458-ttmw8                 1/1     Running     0          2d17h
openebs-ndm-6s4r2                                            1/1     Running     0          2m12s
openebs-ndm-j7jhm                                            1/1     Running     0          2m14s
openebs-ndm-operator-8456cf565c-tq98h                        1/1     Running     0          2m49s
openebs-ndm-pkp7s                                            1/1     Running     0          2m3s
openebs-provisioner-594bd6cf78-4zqf5                         1/1     Running     0          2d17h
openebs-snapshot-operator-777cfb86df-kpzs8                   2/2     Running     0          2d17h
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl describe pod cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj -n openebs
Name:           cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj
Namespace:      openebs
Priority:       0
Node:           gke-standard-cluster-4-default-pool-ad827b74-w2j4/10.128.0.69
Start Time:     Fri, 10 Jan 2020 07:43:25 +0000
Labels:         controller-uid=e2852632-337c-11ea-adb6-42010a800fc1
                job-name=cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5
Annotations:    <none>
Status:         Succeeded
IP:             10.96.1.20
IPs:            <none>
Controlled By:  Job/cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5
Containers:
  cleaner:
    Container ID:  docker://d8d5b4b61ddb322cf4455d8b5ea86fd1ee77676f18086c031d509f8d515f6240
    Image:         quay.io/openebs/linux-utils:1.5.0
    Image ID:      docker-pullable://quay.io/openebs/linux-utils@sha256:3e2853a706795c8e6fe8de8c7db68fcb9d6aaec108c50fc17136f8e181cc5f6c
    Port:          <none>
    Host Port:     <none>
    Command:
      wipefs
      -af
      /dev/sdb
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 10 Jan 2020 07:43:27 +0000
      Finished:     Fri, 10 Jan 2020 07:43:27 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from openebs-maya-operator-token-cxzkf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  openebs-maya-operator-token-cxzkf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  openebs-maya-operator-token-cxzkf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/hostname=gke-standard-cluster-4-default-pool-ad827b74-w2j4
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
                 user=pass:NoSchedule
                 user123=pass123:NoSchedule
Events:
  Type    Reason          Age   From                                                        Message
  ----    ------          ----  ----                                                        -------
  Normal  Scheduled       14s   default-scheduler                                           Successfully assigned openebs/cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj to gke-standard-cluster-4-default-pool-ad827b74-w2j4
  Normal  Pulled          13s   kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4  Container image "quay.io/openebs/linux-utils:1.5.0" already present on machine
  Normal  Created         13s   kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4  Created container
  Normal  Started         12s   kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4  Started container
  Normal  SandboxChanged  11s   kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4  Pod sandbox changed, it will be killed and re-created.

Copy link
Copy Markdown
Contributor

@akhilerm akhilerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to the new functions. Otherwise LGTM

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
@rahulchheda rahulchheda changed the title (fix) ndm-operator : Fixed creation of cleanUp pod fix(cleeanup) : Fixed creation of cleanUp pod Jan 10, 2020
@rahulchheda rahulchheda changed the title fix(cleeanup) : Fixed creation of cleanUp pod fix(cleanup) : Fixed creation of cleanUp pod Jan 10, 2020
@rahulchheda rahulchheda changed the title fix(cleanup) : Fixed creation of cleanUp pod fix(cleanup) : Fixed creation of cleanUp Job Jan 10, 2020
Copy link
Copy Markdown
Contributor

@akhilerm akhilerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes looks good

@akhilerm akhilerm merged commit a6eb990 into openebs-archive:master Jan 10, 2020
rahulchheda added a commit to rahulchheda/node-disk-manager that referenced this pull request Jan 10, 2020
Cleanup pods will be now launched with tolerations for the nodes in which the released 
blockdevice is present. This will prevent jobs from being stuck in Pending state, and 
blockDevices stuck in Released state when one or more nodes on the cluster are tainted.
    - Added function to get Node Object using Cleaner Operator
    - Added function to get Tolerations from the Node Taints
    - Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
akhilerm pushed a commit that referenced this pull request Jan 10, 2020
Cleanup pods will be now launched with tolerations for the nodes in which the released 
blockdevice is present. This will prevent jobs from being stuck in Pending state, and 
blockDevices stuck in Released state when one or more nodes on the cluster are tainted.
    - Added function to get Node Object using Cleaner Operator
    - Added function to get Tolerations from the Node Taints
    - Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.

Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
@kmova kmova added this to the 0.4.6 milestone Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants