Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Minor updates to playground cleanup #10802

Merged
merged 6 commits into from May 8, 2024

Conversation

cpepper96
Copy link
Member

Notes for Reviewers

A couple of minor tweaks to the playground cleanup script:

  1. Add line to delete all resources in a namespace prior to deleting said namespace. I think this will allow for a more graceful deletion of namespaces.
  2. Removed patch command as it does not work.

Given that this is essentially a prod cluster, I'm hesitant to add the other command I found to force delete a namespace. I think for now it might be best to just investigate "stuck" namespaces as they come up. Thoughts?

Signed commits

  • Yes, I signed my commits.

Signed-off-by: Jerod Culpepper <cpepper96@gmail.com>
Signed-off-by: Jerod Culpepper <cpepper96@gmail.com>
Copy link

github-actions bot commented Apr 27, 2024

@leecalcote
Copy link
Member

What does more harm:

– leftover namespaces that take up space and scheduling cycles, if not empty.

  • Ungraceful delete of unwanted workloads.

@cpepper96
Copy link
Member Author

What does more harm:

– leftover namespaces that take up space and scheduling cycles, if not empty.

  • Ungraceful delete of unwanted workloads.

Honestly, I'm not sure what the worst case scenario for forcing deletion of a namespace is. When I was researching this issue people recommend not force deleting namespaces because you aren't fixing the root cause and that root cause could persist (Ex. if we force deleted the "Terminating" namespaces I was looking at earlier then we may not have been able to determine that the broken API services were causing them to hang and therefore those API services would have not been deleted).

@nikzayn
Copy link

nikzayn commented Apr 30, 2024

What does more harm:
– leftover namespaces that take up space and scheduling cycles, if not empty.

  • Ungraceful delete of unwanted workloads.

Honestly, I'm not sure what the worst case scenario for forcing deletion of a namespace is. When I was researching this issue people recommend not force deleting namespaces because you aren't fixing the root cause and that root cause could persist (Ex. if we force deleted the "Terminating" namespaces I was looking at earlier then we may not have been able to determine that the broken API services were causing them to hang and therefore those API services would have not been deleted).

I think while we are forcefully deleting the unused namespaces, we should check if that particular namespaces are linked with mentioned below:

  • Pods, if linked with pods, first need to drain the pods in case of deployments, then need to delete the pod.
  • Services, if linked with services, then we need to update the selectors and labels.
  • PVs and PVCs, if linked with both, then we need to delete the same as well.
  • ConfigMaps and Secrets, if linked with both, then we need to delete this as well.
  • Then, we can decommission or delete the namespaces gracefully.
  • Last, we can check if we have deleted all the namespaces properly or not -> kubectl get ns kubectl get all -A

@cpepper96
Copy link
Member Author

What does more harm:
– leftover namespaces that take up space and scheduling cycles, if not empty.

  • Ungraceful delete of unwanted workloads.

Honestly, I'm not sure what the worst case scenario for forcing deletion of a namespace is. When I was researching this issue people recommend not force deleting namespaces because you aren't fixing the root cause and that root cause could persist (Ex. if we force deleted the "Terminating" namespaces I was looking at earlier then we may not have been able to determine that the broken API services were causing them to hang and therefore those API services would have not been deleted).

I think while we are forcefully deleting the unused namespaces, we should check if that particular namespaces are linked with mentioned below:

  • Pods, if linked with pods, first need to drain the pods in case of deployments, then need to delete the pod.
  • Services, if linked with services, then we need to update the selectors and labels.
  • PVs and PVCs, if linked with both, then we need to delete the same as well.
  • ConfigMaps and Secrets, if linked with both, then we need to delete this as well.
  • Then, we can decommission or delete the namespaces gracefully.
  • Last, we can check if we have deleted all the namespaces properly or not -> kubectl get ns kubectl get all -A

By linked do you mean those resources are deployed to the namespace? The script currently deletes all resources in the target namespace kubectl delete all --all -n $ns and then deletes that namespace. Is there a better way of deleting those resources?

@nikzayn
Copy link

nikzayn commented Apr 30, 2024

What does more harm:
– leftover namespaces that take up space and scheduling cycles, if not empty.

  • Ungraceful delete of unwanted workloads.

Honestly, I'm not sure what the worst case scenario for forcing deletion of a namespace is. When I was researching this issue people recommend not force deleting namespaces because you aren't fixing the root cause and that root cause could persist (Ex. if we force deleted the "Terminating" namespaces I was looking at earlier then we may not have been able to determine that the broken API services were causing them to hang and therefore those API services would have not been deleted).

I think while we are forcefully deleting the unused namespaces, we should check if that particular namespaces are linked with mentioned below:

  • Pods, if linked with pods, first need to drain the pods in case of deployments, then need to delete the pod.
  • Services, if linked with services, then we need to update the selectors and labels.
  • PVs and PVCs, if linked with both, then we need to delete the same as well.
  • ConfigMaps and Secrets, if linked with both, then we need to delete this as well.
  • Then, we can decommission or delete the namespaces gracefully.
  • Last, we can check if we have deleted all the namespaces properly or not -> kubectl get ns kubectl get all -A

By linked do you mean those resources are deployed to the namespace? The script currently deletes all resources in the target namespace kubectl delete all --all -n $ns and then deletes that namespace. Is there a better way of deleting those resources?

Yes, you can delete all the resources but we just need to decommission the specific namespace so that kubectl delete --all will only delete the main resources like Pods, Deployments, Services and ReplicaSet. If there are other resources which are encapsulated in the K8s scenario like ConfigMaps, Secrets, PVCs etc. It will not delete it.
Secondly, I am thinking of using kubectl delete namespace <namespace> why because it's safe and controlled.
Lastly, if we are using kubectl delete all --all -n $ns are we sure that other critical data is not getting affected because we are tagging all here?

@MUzairS15
Copy link
Contributor

We are only deleting namespace/resources which are deployed by users as they come and try out playground. The namespaces that affect the working of playground environment is intact, also using kubectl delete all —all -n will not lead to deletion of cluster wide resources, so even if some of the cluster wide resources (CR, PV..) are somehow linked they will not get deleted. (Eg. an existing CRB used in prod getting used by a Role in one of the ns created by user even with —all it will not get deleted)

@sangramrath
Copy link
Contributor

I second @MUzairS15. This is a playground, meaning it is understood that users are not supposed to run anything production here. We could put a warning at some stage in the workflow stating this and also that it will be lost in nightly cleanup. This is common for cloud sandboxes (public clouds).

@cpepper96
Copy link
Member Author

I added the "Terminating" namespace check back and the patch command to force delete a "Terminating" namespace. I also cut down the kubectl delete $ns timeout to 1 minute. Let me know what y'all think @MUzairS15 @sangramrath @nikzayn @leecalcote

@MUzairS15 MUzairS15 merged commit daae859 into meshery:master May 8, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants