Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup webhooks using pod termination #424

Closed
realshuting opened this issue Oct 31, 2019 · 2 comments
Closed

Cleanup webhooks using pod termination #424

realshuting opened this issue Oct 31, 2019 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@realshuting
Copy link
Member

https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks

@realshuting realshuting added the enhancement New feature or request label Oct 31, 2019
@realshuting realshuting added this to the Kyverno Release 0.13.0 milestone Oct 31, 2019
@shivdudhani shivdudhani self-assigned this Oct 31, 2019
@shivdudhani shivdudhani added this to To do in Development Shiv Oct 31, 2019
@shivdudhani shivdudhani moved this from To do to In progress in Development Shiv Nov 6, 2019
@shivdudhani
Copy link
Contributor

PreStop
Called before the container is terminated.
Termination of Pods:
When a user requests deletion of a Pod, the system records the intended grace period before the Pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container. Once the grace period has expired, the KILL signal is sent to those processes, and the Pod is then deleted from the API server.

By default, all deletes are graceful within 30 seconds. This 30 seconds time is all time that methods PreStop hook, TERM and KILL have to complete by. If the user explicitly sets graceful period to 0 and force, then the API server does not wait for confirmation before removing the resource from API.

https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html provides some explanation for the graceful shutdown of pods.
It states 2 points:

  • make sure the base image is forwarding signals to the application. [Requires testing]
  • some applications handle SIGTERM differently, for example nginx on recieving SIGTERM kills the process. But uses nginx -s quit for graceful shutdown, so here preStop hook can send the quit signals got graceful shutdown.

In kyverno we use a channel to be notified when signals are passed, and this channel is used to start cleanUp, and another notification channel to wait till cleanUp is complete(i.e. webhookconfigurations are deleted).
With nirmata@439c64b we have a helper function to mock signals, so this can be used to test the cleanUp.

Comments:
I think using a preStop hook won't help with the cleanup, but we can test further.
It might be helpful to have a separate goroutine that's responsible for cleanup and make sure it's small and fast.
Currently cleanup routine is serial now,

  • firstly resource mutating webhook configuration
  • secondly, policy validating webhook configuration
  • thirdly policy mutating webhook configuration

So the graceful shutdown time for each step is 30 seconds making the worst-case time to 90 seconds. With the pod only having 30 seconds to complete, it won't finish. Cleanup up parallelly would benefit theoretically.

@JimBugwadia @realshuting
Let me know you thoughts on the above!

@shivdudhani
Copy link
Contributor

by default http.Server waits indefinitely for connections to return to idle and then shuts down. Then we clean up the webhookconfigurations.
https://github.com/nirmata/kyverno/blob/bdb677abf6b75b4c93a3c34490acd60ba677b909/pkg/webhooks/server.go#L199-L207

Proposal:

  • Adjust the default deadline for context to 5*seconds, any request being processed is cleaned up.
  • Call cleanup webhookconfigurations before http server shutdown and clean up each resource in a separate goroutine with a wait group to wait for completion. After the cleanup is done we close the notification channel that waits on in the main goroutine. This will us to log if the completion was complete or not(example log: successful shutdown of kyverno controller).

@shivdudhani shivdudhani modified the milestones: Kyverno Release 0.13.0, Kyverno Release 1.0 Nov 8, 2019
@shivdudhani shivdudhani moved this from In progress to Pending PR in Development Shiv Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development Shiv
  
Pending PR
Development

No branches or pull requests

2 participants