Add Lifecycle hooks configuration to Tenant #1835

mctoohey · 2023-10-28T08:26:35Z

Adds Lifecycle hooks configuration to Tenant. This is parsed down to the MinIO container of each pod in the pool.

The tenant already allows liveliness, readiness and startup probes to be configured. Similar to those, the usage of lifecycle hooks is entirely optional. Adding configuration for lifecycle hooks (https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/) allows custom postStart/preStop hooks can be added to the MinIO container if desired.

mctoohey · 2023-11-07T04:26:56Z

Hi @jiuker, I was wondering if you had any thoughts on this change?

pjuarezd · 2023-11-07T04:39:43Z

@mctoohey would you mind tell us how you are planning to use the lifecycle hooks?, how can it be useful?

pjuarezd · 2023-11-07T04:47:55Z

Created a PR to your branch to add the CRD docs update @mctoohey mctoohey#1

mctoohey · 2023-11-07T04:59:33Z

@pjuarezd thank you for you response. Sure, I can provide some more context.
I am particular interested in the preStop lifecycle hook. This could be used to add an additional wait or check before shutting down the tenant pod. Giving time for remaining requests to process and for event notifications to be sent during which time the pod will not accept any new connections.
For my particular use case I am very interested in ensuring all bucket notifications for all requests are sent in the event of a graceful pod termination. I would like to have the option to add a preStop hook that waits a short period before sending the shutdown signal to the MinIO process. It could potentially check the minio_notify_current_send_in_progress metic to see if there are any notifications still being sent.
I acknowledge it is quite niche use case. Which is why my change only exposes this configuration to the tennant spec but leaves up to the end user to determine if/how they may want to use it.

jiuker · 2023-11-07T08:20:04Z

@pjuarezd thank you for you response. Sure, I can provide some more context. I am particular interested in the preStop lifecycle hook. This could be used to add an additional wait or check before shutting down the tenant pod. Giving time for remaining requests to process and for event notifications to be sent during which time the pod will not accept any new connections. For my particular use case I am very interested in ensuring all bucket notifications for all requests are sent in the event of a graceful pod termination. I would like to have the option to add a preStop hook that waits a short period before sending the shutdown signal to the MinIO process. It could potentially check the minio_notify_current_send_in_progress metic to see if there are any notifications still being sent. I acknowledge it is quite niche use case. Which is why my change only exposes this configuration to the tennant spec but leaves up to the end user to determine if/how they may want to use it.

That case should be minio's issue. @mctoohey

mctoohey · 2023-11-07T08:45:53Z

The other use case is a preStop that waits for incoming requests to finish being processed allows you to do a rolling restart (which I know MinIO doesn't recommend) of all pods in the replica set without experiencing any downtime.

mctoohey · 2023-11-07T09:00:33Z

I might be able to set --shutdown-timeout argument on MinIO to increase the graceful shutdown period. The preStop hook would still be nice so I could set up some custom alerting if MinIO fails to gracefully shutdown (with all requests inished).

hrenard · 2023-11-07T10:53:22Z

Nice !
In case of non-breaking rollout, waiting the cluster maintenance check to be ok would avoid some downtimes (like when upgrading).

pjuarezd · 2023-11-07T21:22:03Z

I might be able to set --shutdown-timeout argument on MinIO to increase the graceful shutdown period. The preStop hook would still be nice so I could set up some custom alerting if MinIO fails to gracefully shutdown (with all requests inished).

Having a preStop hook could only potentially add 2 additional seconds to the grace period, just somenthing to consider.
Instead it will be taking away some grace period time from the minio process to do graceful exit.

Default minio shutdown timeout is 5 seconds, extending the --shutdown-timeout to a bigger number seems like good option to give some additional room to minio to finish processing, since Kubernetes default grace period is 30 seconds, most likelly minio will exit on itself way before Kubernetes kills the process.

Extending the pod .spec.terminationGracePeriodSeconds: 30 to somenthing bigger in combination with extend the minio --shutdown-timeout would also work, but yeah, this is measure is only adding more time, not verifying that all received requests are already fulfilled.

Now, when a pod enters Terminating state, is removed from the endpoints and stops getting new traffic, wich means that only requests received before the terminating event might be pending to be shared with minio peers.

I don't think additional measures to ensure minio can finish requests are really needed, minio is already prepared to gracefully exit on SIGTERM in a short time fashion, but the preStop hook could be useful if we want to notify the pod is being terminated to Prometheus (for example), same goes for postStart, I would play with the events to do custom notifications instead.

mctoohey · 2023-11-07T21:54:04Z

Instead it will be taking away some grace period time from the minio process to do graceful exit.

My thought is that the preStop hook adds a period of time where the MinIO process will not receive any new connections and is still processing existing ones. So that when it does receive the SIGTERM it will be able to shutdown very quickly as there is nothing left to do.

minio is already prepared to gracefully exit on SIGTERM in a short time fashion

In my experience when restarting one MinIO pod in a multi node setup. Some client requests are always disrupted (not a huge deal since clients can retry) and some event notifications are dropped. My understanding from previous reading was that MinIO does not provide the guarantee that bucket notifications are delivered on shutdown.

For example if I make a PUT request, a SIGTERM is sent to MinIO, I get 200 response back but the notification may not have been sent (or been added to the queue if a queue is being used).

pjuarezd · 2023-11-07T22:40:22Z

In my experience when restarting one MinIO pod in a multi node setup. Some client requests are always disrupted (not a huge deal since clients can retry) and some event notifications are dropped. My understanding from previous reading was that MinIO does not provide the guarantee that bucket notifications are delivered on shutdown.

For example if I make a PUT request, a SIGTERM is sent to MinIO, I get 200 response back but the notification may not have been sent (or been added to the queue if a queue is being used).

That's right, there is no garantee that all event notifications will be delivered before the process shutdowns, but you should not lost events because of that, the right way should be to configure the notification target to persist the undelivered events, so that those are replayed once the process comes back online, ie for the Elasticsearch target set MINIO_NOTIFY_ELASTICSEARCH_QUEUE_DIR

mctoohey · 2023-11-08T01:53:41Z

the right way should be to configure the notification target to persist the undelivered events, so that those are replayed once the process comes back online

I should clarify that in my experience even when using a queue_dir sometimes events can be lost.

I just did a test with a kafka event notification with queue_dir set. I had a loop set to write a 1000 objects to MinIO (test_object1 to test_object1000). Restarted MinIO. After the restart I checked MinIO and saw that 456 had been written to MinIO but only 455 PUT object events had been received by Kafka and there were no events remaining in the queue directory.

I will continue to look into some of your suggestions but this is getting a bit off topic for this change. This change is really about whether we should be able to configure lifecycle hooks for the tenant container. I still think there are some use cases for lifecycle hooks such as sending start/stop notifications to external services. It would be nice if this configuration was exposed in the same way liveness, readiness and startup configuration is but leave it to the end user to determine if they have a use case for it.

harshavardhana · 2023-11-08T02:01:08Z

the right way should be to configure the notification target to persist the undelivered events, so that those are replayed once the process comes back online

I should clarify that in my experience even when using a queue_dir sometimes events can be lost.

I just did a test with a kafka event notification with queue_dir set. I had a loop set to write a 1000 objects to MinIO (test_object1 to test_object1000). Restarted MinIO. After the restart I checked MinIO and saw that 456 had been written to MinIO but only 455 PUT object events had been received by Kafka and there were no events remaining in the queue directory.

This should be treated as a bug, and not do workarounds around it.

Are you seeing that you did 1000 puts with 1000 success and you were doing a restart of the server after that?

mctoohey · 2023-11-08T03:59:56Z

@harshavardhana I've opened minio/minio#18404 with more detail on the what I have observed.

pjuarezd · 2023-11-08T23:05:22Z

could you rebase @mctoohey?

Signed-off-by: pjuarezd <pjuarezd@users.noreply.github.com>

pjuarezd

I still belive this is good to have, empower final user in standard features is a good mantra

mctoohey · 2023-11-24T01:41:59Z

Would I be able to get another review on this? I think allowing end users to configure standard Kubernetes features on the MinIO pods is a reasonable thing to do.

fangzhouxie · 2024-02-19T03:35:37Z

Hi team,

Is anyone still reviewing this pull request? I find this change to be very useful and it would be great if it can be merged.

Thanks,

cniackz · 2024-04-20T03:02:33Z

@mctoohey or @fangzhouxie please help us resolve conflicts

ramondeklein

LGTM, but it would be nice to have an integration test for this.

pjuarezd requested review from jiuker, harshavardhana, cniackz and cesnietor and removed request for jiuker November 7, 2023 04:35

mctoohey force-pushed the master branch from d712a78 to af2889e Compare November 8, 2023 23:38

pjuarezd previously approved these changes Nov 9, 2023

View reviewed changes

Matthew Toohey and others added 2 commits November 15, 2023 11:28

Add Lifecycle hooks configuration to Tenant

87677e2

Regen CRD docs to include Lifecycle

39167b5

Signed-off-by: pjuarezd <pjuarezd@users.noreply.github.com>

mctoohey dismissed pjuarezd’s stale review via 39167b5 November 15, 2023 00:35

mctoohey force-pushed the master branch from 94fc47d to 39167b5 Compare November 15, 2023 00:35

mctoohey requested a review from pjuarezd November 15, 2023 00:39

pjuarezd previously approved these changes Nov 20, 2023

View reviewed changes

cniackz requested a review from jiuker April 20, 2024 02:58

cniackz self-assigned this Apr 20, 2024

cniackz added the Addition label Apr 20, 2024

cniackz previously approved these changes Apr 20, 2024

View reviewed changes

Merge branch 'master' into master

36d9490

ramondeklein dismissed stale reviews from cniackz and pjuarezd via 36d9490 April 22, 2024 09:07

ramondeklein approved these changes Apr 22, 2024

View reviewed changes

pjuarezd approved these changes Apr 22, 2024

View reviewed changes

pjuarezd merged commit 4cc3123 into minio:master Apr 22, 2024
24 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lifecycle hooks configuration to Tenant #1835

Add Lifecycle hooks configuration to Tenant #1835

mctoohey commented Oct 28, 2023 •

edited

mctoohey commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 7, 2023

jiuker commented Nov 7, 2023

mctoohey commented Nov 7, 2023

mctoohey commented Nov 7, 2023

hrenard commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 8, 2023

harshavardhana commented Nov 8, 2023

mctoohey commented Nov 8, 2023

pjuarezd commented Nov 8, 2023

pjuarezd left a comment

mctoohey commented Nov 24, 2023

fangzhouxie commented Feb 19, 2024

cniackz commented Apr 20, 2024

ramondeklein left a comment

Add Lifecycle hooks configuration to Tenant #1835

Add Lifecycle hooks configuration to Tenant #1835

Conversation

mctoohey commented Oct 28, 2023 • edited

mctoohey commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 7, 2023

jiuker commented Nov 7, 2023

mctoohey commented Nov 7, 2023

mctoohey commented Nov 7, 2023

hrenard commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 7, 2023

pjuarezd commented Nov 7, 2023

mctoohey commented Nov 8, 2023

harshavardhana commented Nov 8, 2023

mctoohey commented Nov 8, 2023

pjuarezd commented Nov 8, 2023

pjuarezd left a comment

Choose a reason for hiding this comment

mctoohey commented Nov 24, 2023

fangzhouxie commented Feb 19, 2024

cniackz commented Apr 20, 2024

ramondeklein left a comment

Choose a reason for hiding this comment

mctoohey commented Oct 28, 2023 •

edited