Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boskos: static resources removed from the configuration may never be deleted #17282

Closed
ixdy opened this issue Apr 17, 2020 · 2 comments
Closed
Labels
area/boskos Issues or PRs related to code in /boskos kind/bug Categorizes issue or PR as related to a bug.

Comments

@ixdy
Copy link
Member

ixdy commented Apr 17, 2020

I mentioned this tangentially in #16047 (comment), but I want to pull it to a separate issue to be more easily highlight it.

Boskos doesn't delete static resources that are removed from the configuration if they are in use, to ensure that jobs don't fail, and to ensure that such resources are properly cleaned up by the janitor.

Originally, this was a reasonable decision, since Boskos periodically synced its storage against the configuration, and most likely such resources would eventually be free and thus deleted from storage.

After #13990, Boskos only syncs its storage against the configuration when the configuration changes (or when Boskos restarts). As a result, it may take a long time for static resources to be deleted, if ever.

There was a similar issue for DRLCs that I addressed in #16021, effectively by putting the DRLCs into lame-duck mode.

There isn't a clear way to indicate that static resources are in lame-duck mode, though.

Possible ways to address this bug, in increasing order of complexity:

  1. Just delete static resources, regardless of what state they're in.
  2. Periodically sync storage against the config. It's probably less expensive now, due to the improvements around locking.
  3. Somehow indicate that resources are in lame-duck mode to prevent them from being leased, and then delete them once free:
    a. Add a field into the UserData for static resources. (Currently UserData is not used for static resources.)
    b. Set an ExpirationDate on static resources. (Currently ExpirationDate is not used for static resources.)
    c. Add a new field on the ResourceStatus indicating resources are in lame-duck mode.

Workaround until this bug is fixed: admins with access to the cluster where Boskos is running can just delete the resources manually using kubectl.

/area boskos

@ixdy ixdy added the kind/bug Categorizes issue or PR as related to a bug. label Apr 17, 2020
@k8s-ci-robot k8s-ci-robot added the area/boskos Issues or PRs related to code in /boskos label Apr 17, 2020
@ixdy
Copy link
Member Author

ixdy commented May 29, 2020

Moving to kubernetes-sigs/boskos#20.
/close

@k8s-ci-robot
Copy link
Contributor

@ixdy: Closing this issue.

In response to this:

Moving to kubernetes-sigs/boskos#20.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/boskos Issues or PRs related to code in /boskos kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants