fix(cleanup) : Fixed creation of cleanUp Job#363
fix(cleanup) : Fixed creation of cleanUp Job#363akhilerm merged 4 commits intoopenebs-archive:masterfrom
Conversation
- Added function to get Node Object using Cleaner Operator
- Added function to get Tolerations from the Node Taints
- Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
| err := c.client.Get(context.TODO(), objKey, job) | ||
|
|
||
| err = c.client.Delete(context.TODO(), job, client.PropagationPolicy(metav1.DeletePropagationForeground)) | ||
| //err = c.client.Delete(context.TODO(), job, client.PropagationPolicy(metav1.DeletePropagationForeground)) |
There was a problem hiding this comment.
Just for testing purpose, let me fix this.
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
Codecov Report
@@ Coverage Diff @@
## master #363 +/- ##
========================================
- Coverage 51.27% 51% -0.27%
========================================
Files 55 55
Lines 2637 2637
========================================
- Hits 1352 1345 -7
- Misses 1176 1182 +6
- Partials 109 110 +1
Continue to review full report at Codecov.
|
| "github.com/openebs/node-disk-manager/pkg/apis/openebs/v1alpha1" | ||
| "sigs.k8s.io/controller-runtime/pkg/client" | ||
|
|
||
| "github.com/openebs/node-disk-manager/cmd/ndm_daemonset/controller" |
There was a problem hiding this comment.
Usually, this shouldn't import i.e inside pkg structure cmd/ shouldn't be imported.
There was a problem hiding this comment.
Observed this from pkg/cleanerjobcontroller.go, let me try to get around it.
| job, err := NewCleanupJob(bd, volumeMode, c.Namespace) | ||
|
|
||
| //retreive node Object to pass tolerations to the Job | ||
| selectedNode, err := c.getNodeObjectByNodeName(bd.Labels[controller.KubernetesHostNameLabel]) |
There was a problem hiding this comment.
If this is the only place that you are using controller to get the constant, then it is better to directly use the constant then import it. Also should this be HostName or nodeName?
There was a problem hiding this comment.
This will give the hostname @kmova and @rahulchheda . The method should be getNodeObjectByHostName. I think in localpv also we are using the same method. We can filter from nodes based on the hostname label.
akhilerm
left a comment
There was a problem hiding this comment.
Either hostname should be used, or spec.nodeAttributes.nodeName can be used to get the exact node name.
| job, err := NewCleanupJob(bd, volumeMode, c.Namespace) | ||
|
|
||
| //retreive node Object to pass tolerations to the Job | ||
| selectedNode, err := c.getNodeObjectByNodeName(bd.Labels[controller.KubernetesHostNameLabel]) |
There was a problem hiding this comment.
This will give the hostname @kmova and @rahulchheda . The method should be getNodeObjectByHostName. I think in localpv also we are using the same method. We can filter from nodes based on the hostname label.
- removed cmd dependencies created
- To do the above, moved function to GetNodeName jobController.
- NodeName now is fetched from spec.NodeAttributes.NodeName
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
There was a problem hiding this comment.
New Updated Testing Logs:
rahul_chheda_mayadata_io@rahulchheda:~$ kubectl get bd -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-693cb5e38a3bae3ea9bf1a9854cdbad5 gke-standard-cluster-4-default-pool-ad827b74-jvcc 10737418240 Unclaimed Active 5s
blockdevice-860ab5d1205c0f2891d5bbfea2e223e5 gke-standard-cluster-4-default-pool-ad827b74-w2j4 10737418240 Unclaimed Active 14s
blockdevice-9912682a2d1c2f85e77a03d311c44c5a gke-standard-cluster-4-default-pool-ad827b74-rj80 10737418240 Unclaimed Active 17s
rahul_chheda_mayadata_io@rahulchheda:~$ ls
chaosEngine.yaml litmus2SA.yaml metac nginxDefault.yaml saDefault.yaml snap
litmus2ClusterRole.yaml litmusClusterRole.yaml new_e.yaml openebs saDefaultLitmus.yaml
rahul_chheda_mayadata_io@rahulchheda:~$ cd openebs/
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl apply -f bdc.yaml -n openebs
kblockdeviceclaim.openebs.io/example-blockdeviceclaim created
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get bd -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-693cb5e38a3bae3ea9bf1a9854cdbad5 gke-standard-cluster-4-default-pool-ad827b74-jvcc 10737418240 Unclaimed Active 52s
blockdevice-860ab5d1205c0f2891d5bbfea2e223e5 gke-standard-cluster-4-default-pool-ad827b74-w2j4 10737418240 Claimed Active 1m
blockdevice-9912682a2d1c2f85e77a03d311c44c5a gke-standard-cluster-4-default-pool-ad827b74-rj80 10737418240 Unclaimed Active 1m
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n opeenbs
No resources found in opeenbs namespace.
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME READY STATUS RESTARTS AGE
maya-apiserver-5865d7d548-bmzck 1/1 Running 0 2d17h
openebs-admission-server-684d455954-zvgrv 1/1 Running 0 2d17h
openebs-localpv-provisioner-58f5df5458-ttmw8 1/1 Running 0 2d17h
openebs-ndm-6s4r2 1/1 Running 0 87s
openebs-ndm-j7jhm 1/1 Running 0 89s
openebs-ndm-operator-8456cf565c-tq98h 1/1 Running 0 2m4s
openebs-ndm-pkp7s 1/1 Running 0 78s
openebs-provisioner-594bd6cf78-4zqf5 1/1 Running 0 2d17h
openebs-snapshot-operator-777cfb86df-kpzs8 2/2 Running 0 2d17h
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl taint nodes --all user=pass:NoSchedule
node/gke-standard-cluster-4-default-pool-ad827b74-jvcc tainted
node/gke-standard-cluster-4-default-pool-ad827b74-rj80 tainted
node/gke-standard-cluster-4-default-pool-ad827b74-w2j4 tainted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl taint nodes --all user123=pass123:NoSchedule
node/gke-standard-cluster-4-default-pool-ad827b74-jvcc tainted
node/gke-standard-cluster-4-default-pool-ad827b74-rj80 tainted
node/gke-standard-cluster-4-default-pool-ad827b74-w2j4 tainted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get bdc -n openebs
NAME BLOCKDEVICENAME PHASE AGE
example-blockdeviceclaim blockdevice-860ab5d1205c0f2891d5bbfea2e223e5 Bound 1m
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl delete bdc example-blockdeviceclaim -n openebs
blockdeviceclaim.openebs.io "example-blockdeviceclaim" deleted
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl get pods -n openebs
NAME READY STATUS RESTARTS AGE
cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj 0/1 Completed 0 4s
maya-apiserver-5865d7d548-bmzck 1/1 Running 0 2d17h
openebs-admission-server-684d455954-zvgrv 1/1 Running 0 2d17h
openebs-localpv-provisioner-58f5df5458-ttmw8 1/1 Running 0 2d17h
openebs-ndm-6s4r2 1/1 Running 0 2m12s
openebs-ndm-j7jhm 1/1 Running 0 2m14s
openebs-ndm-operator-8456cf565c-tq98h 1/1 Running 0 2m49s
openebs-ndm-pkp7s 1/1 Running 0 2m3s
openebs-provisioner-594bd6cf78-4zqf5 1/1 Running 0 2d17h
openebs-snapshot-operator-777cfb86df-kpzs8 2/2 Running 0 2d17h
rahul_chheda_mayadata_io@rahulchheda:~/openebs$ kubectl describe pod cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj -n openebs
Name: cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj
Namespace: openebs
Priority: 0
Node: gke-standard-cluster-4-default-pool-ad827b74-w2j4/10.128.0.69
Start Time: Fri, 10 Jan 2020 07:43:25 +0000
Labels: controller-uid=e2852632-337c-11ea-adb6-42010a800fc1
job-name=cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5
Annotations: <none>
Status: Succeeded
IP: 10.96.1.20
IPs: <none>
Controlled By: Job/cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5
Containers:
cleaner:
Container ID: docker://d8d5b4b61ddb322cf4455d8b5ea86fd1ee77676f18086c031d509f8d515f6240
Image: quay.io/openebs/linux-utils:1.5.0
Image ID: docker-pullable://quay.io/openebs/linux-utils@sha256:3e2853a706795c8e6fe8de8c7db68fcb9d6aaec108c50fc17136f8e181cc5f6c
Port: <none>
Host Port: <none>
Command:
wipefs
-af
/dev/sdb
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 10 Jan 2020 07:43:27 +0000
Finished: Fri, 10 Jan 2020 07:43:27 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from openebs-maya-operator-token-cxzkf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
openebs-maya-operator-token-cxzkf:
Type: Secret (a volume populated by a Secret)
SecretName: openebs-maya-operator-token-cxzkf
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=gke-standard-cluster-4-default-pool-ad827b74-w2j4
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
user=pass:NoSchedule
user123=pass123:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14s default-scheduler Successfully assigned openebs/cleanup-blockdevice-860ab5d1205c0f2891d5bbfea2e223e5-j4srj to gke-standard-cluster-4-default-pool-ad827b74-w2j4
Normal Pulled 13s kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4 Container image "quay.io/openebs/linux-utils:1.5.0" already present on machine
Normal Created 13s kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4 Created container
Normal Started 12s kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4 Started container
Normal SandboxChanged 11s kubelet, gke-standard-cluster-4-default-pool-ad827b74-w2j4 Pod sandbox changed, it will be killed and re-created.
akhilerm
left a comment
There was a problem hiding this comment.
Add comments to the new functions. Otherwise LGTM
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
531cfa2 to
9cb9da5
Compare
Cleanup pods will be now launched with tolerations for the nodes in which the released
blockdevice is present. This will prevent jobs from being stuck in Pending state, and
blockDevices stuck in Released state when one or more nodes on the cluster are tainted.
- Added function to get Node Object using Cleaner Operator
- Added function to get Tolerations from the Node Taints
- Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
Cleanup pods will be now launched with tolerations for the nodes in which the released
blockdevice is present. This will prevent jobs from being stuck in Pending state, and
blockDevices stuck in Released state when one or more nodes on the cluster are tainted.
- Added function to get Node Object using Cleaner Operator
- Added function to get Tolerations from the Node Taints
- Added one more parameter in NewCleanupJob to set Tolerations of Job to be created.
Signed-off-by: Rahul M Chheda <rahul.chheda@mayadata.io>
Signed-off-by: Rahul M Chheda rahul.chheda@mayadata.io
Logs: These are sequential logs, will the steps I performed: