Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron job to garbage collect test resources #87

Closed
jlewi opened this issue Mar 28, 2018 · 3 comments
Closed

Cron job to garbage collect test resources #87

jlewi opened this issue Mar 28, 2018 · 3 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Mar 28, 2018

Our tests create a bunch of resources

  • Argo workflows
  • Namespaces
  • GCE VMs
  • GKE clusters

Most of these resources (exception is Argo) should be GC'd by the teardown steps in our tests. But some failures/bugs prevent resources from being GC'd.

It would be good to have a cron job to periodically garbage collect old resources like namespaces.

@ankushagarwal
Copy link

/mnt/test-data-volume also needs GC. It ran out of space yesterday.

@lluunn
Copy link
Contributor

lluunn commented May 11, 2018

We have >1000 workloads in the cluster.

screenshot from 2018-05-11 14-07-06

@jlewi
Copy link
Contributor Author

jlewi commented Jul 18, 2018

There is a proposal in the community to add a TTL field to K8s jobs
kubernetes/kubernetes#64470
Design doc: https://goo.gl/YxtxTi.

Hopefully this will be generalized to Argo workflows.

jlewi added a commit to jlewi/testing that referenced this issue Dec 28, 2018
Related to

kubeflow#53 GC old Argo Workflows
kubeflow#87 cron job to GC old resources.
jlewi added a commit to jlewi/testing that referenced this issue Dec 28, 2018
kubeflow#53 GC old Argo Workflows
kubeflow#87 cron job to GC old resources.
kubeflow#268 Maximum number of services reached.
k8s-ci-robot pushed a commit that referenced this issue Jan 3, 2019
* Add a python function to GC old Argo workflows and cloud endpoints

#53 GC old Argo Workflows
#87 cron job to GC old resources.
#268 Maximum number of services reached.

* Fix lint.

* Revert files that shouldn't be checked in.

* Fix loop termination criterion.
jlewi added a commit to jlewi/testing that referenced this issue Feb 3, 2019
* Add to cleanup_ci.py an "all" subcommand to delete all resources."
* Add a batch job for one off runs.

Related to:
  kubeflow#87 Cron job to garbage collect test resources
  kubeflow#249 cron job to collect Kubeflow deployments launched by E2E tests
k8s-ci-robot pushed a commit that referenced this issue Feb 5, 2019
* Create a cron job to regularly garbage collect test resources.

* Add to cleanup_ci.py an "all" subcommand to delete all resources."
* Add a batch job for one off runs.

Related to:
  #87 Cron job to garbage collect test resources
  #249 cron job to collect Kubeflow deployments launched by E2E tests

* * Add a cron job to run the cleanup every two hours.
* In cleanup_ci.py; don't load the imports of the manifests
  We encountered an error where the manifest didn't exist. I think
  that may have been a collision because we had a separate script running to do
  the deletes.

* Fix some bugs.

* Deal with config being none.

* Maybe activate service account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants