Change the default livenessProbe endpoint to `/health` #83

leszko · 2019-11-27T08:58:43Z

Currently the default livenessProbe endpoint is /health/node-state. It should actually be /health.

The change is trivial, but before applying it we need to double check that it does not break rolling upgrade and scaling down.

The text was updated successfully, but these errors were encountered:

mesutcelik · 2019-11-27T22:00:13Z

Are you going to parse the response and decide if it is healthy?

Hazelcast::NodeState=ACTIVE
Hazelcast::ClusterState=ACTIVE
Hazelcast::ClusterSafe=TRUE
Hazelcast::MigrationQueueSize=0
Hazelcast::ClusterSize=2

Apart from /node-state/ and /health, we have /ready too. We just need to figure out which one does really tell us restart me if it returns not-200 Response Code. cc: @mmedenjak

leszko · 2019-11-28T09:18:23Z

I think that we don't need to parse it, HTTP 200 from /health should mean I'm alive. That is what should be used for livenessProbe.

Currently I think it's a little wrong, because we use /health/node-state and it returns 503 if the Hazelcast node is in the shutdown state. So if Hazelcast is in the shutdown state, Kubernetes terminates it. But I think that if Hazelcast is in the shutdown state, it's still alive and we should wait until it shuts down by itself properly.

adnxn · 2020-03-28T16:54:46Z

The change is trivial, but before applying it we need to double check that it does not break rolling upgrade and scaling down.

any guidance on how to validate these two things? rolling upgrades and scaling down?

adnxn · 2020-03-28T16:57:56Z

well just came across these:
https://hazelcast.com/blog/rolling-upgrade-hazelcast-imdg-on-kubernetes/
https://hazelcast.com/blog/how-to-scale-hazelcast-imdg-on-kubernetes/

but yea - any other advice would be welcome. thanks

Holmistr · 2020-03-29T19:57:44Z

Hi @adnxn , glad to see you here :) I'm assigning you the issue and I'll make sure to get you some guidance from our experts. Looking forward to your contribution!

leszko · 2020-03-30T07:49:07Z

@adnxn, thanks for taking this issue!

So, I recommend doing the following:

Change /health/node-state => /health and check if it waits correctly for readiness (it should start one member, wait 30s, start second member, wait, etc.)
Check rolling upgrade (put a lot of data into a cluster (~2GB), perform rolling upgrade, check if there is no data loss)
Check scaling (make big cluster (6 members at least), put a lot of data (~2GB), scale down to 2 members, scale up to 6 members, check if there is no data loss)

Also, I'd not change readinessProbe, readinessProbe should still stay as /health/node-state.

leszko · 2020-03-30T10:29:06Z

And about the technical details how to scale up/down and how to perform the rolling updates, the blog posts you mentioned are good guidelines. I recommend using Helm Chart.

Then for the scaling, all you need to do is to execute:

helm install --name my-release --set cluster.memberCount=6 stable/hazelcast
helm upgrade my-release --set cluster.memberCount=3 stable/hazelcast

And for the rolling update:

helm install --name my-release --set image.tag=3.12 hazelcast/hazelcast
helm upgrade my-release --set image.tag=3.12.1 hazelcast/hazelcast

Write here if you encounter any issues. I'll try to help.

adnxn · 2020-04-04T16:28:29Z

@leszko: thanks for the info

put a lot of data (~2GB)

what would be the best way to do this?

also - seems like changing the endpoint breaks the management console for v3.12.* hrm

leszko · 2020-04-06T06:02:03Z

@adnxn

Right, for the older HZ version (3.12.x), you need to use the old helm chart version. I forgot to mention that. Try the following commands, they should work:

helm install --name my-release --set cluster.memberCount=6 stable/hazelcast --version 2.10.0
helm upgrade my-release --set cluster.memberCount=3 stable/hazelcast --version 2.10.0

Wrt data inserting,
You can either write the client app to insert the data. Or you can use the built-in Client Console App. If you have a running hazelcast cluster on kubernetes, try executing the following.

$ kubectl exec -it hazelcast-0 /bin/bash
# java -cp lib/hazelcast-all*.jar com.hazelcast.client.console.ClientConsoleApp

mesutcelik · 2020-06-02T12:34:44Z

Hi @adnxn ,
Do you need any more help to finalize this issue?

leszko · 2020-09-16T11:26:08Z

@adnxn are you still working on this issue?

adnxn · 2020-09-18T01:59:59Z

hey i havent had time to follow up on this. if someone else wants to take it over, feel free.

sgandon · 2021-10-12T08:16:42Z

any news on this ?
is the liveness recommandation still /health
and readiness recommandation still /health/node-state ?
Do you confirm that the /health/node-state will respond 200 after having at least tried to join other hazelcast nodes at least once ?

mesutcelik added the good first issue Good for newcomers label Mar 19, 2020

Holmistr assigned adnxn Mar 29, 2020

leszko mentioned this issue Jun 3, 2020

Why are the samples using /hazelcast/health/node-state for OpenShift/Kubernetes readiness & liveness probe? hazelcast/hazelcast-code-samples#426

Closed

adnxn removed their assignment Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default livenessProbe endpoint to `/health` #83

Change the default livenessProbe endpoint to `/health` #83

leszko commented Nov 27, 2019

mesutcelik commented Nov 27, 2019

leszko commented Nov 28, 2019

adnxn commented Mar 28, 2020

adnxn commented Mar 28, 2020

Holmistr commented Mar 29, 2020

leszko commented Mar 30, 2020

leszko commented Mar 30, 2020

adnxn commented Apr 4, 2020 •

edited

Loading

leszko commented Apr 6, 2020 •

edited

Loading

mesutcelik commented Jun 2, 2020

leszko commented Sep 16, 2020

adnxn commented Sep 18, 2020

sgandon commented Oct 12, 2021 •

edited

Loading

Change the default livenessProbe endpoint to /health #83

Change the default livenessProbe endpoint to /health #83

Comments

leszko commented Nov 27, 2019

mesutcelik commented Nov 27, 2019

leszko commented Nov 28, 2019

adnxn commented Mar 28, 2020

adnxn commented Mar 28, 2020

Holmistr commented Mar 29, 2020

leszko commented Mar 30, 2020

leszko commented Mar 30, 2020

adnxn commented Apr 4, 2020 • edited Loading

leszko commented Apr 6, 2020 • edited Loading

mesutcelik commented Jun 2, 2020

leszko commented Sep 16, 2020

adnxn commented Sep 18, 2020

sgandon commented Oct 12, 2021 • edited Loading

Change the default livenessProbe endpoint to `/health` #83

Change the default livenessProbe endpoint to `/health` #83

adnxn commented Apr 4, 2020 •

edited

Loading

leszko commented Apr 6, 2020 •

edited

Loading

sgandon commented Oct 12, 2021 •

edited

Loading