Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default livenessProbe endpoint to /health #83

Open
leszko opened this issue Nov 27, 2019 · 13 comments
Open

Change the default livenessProbe endpoint to /health #83

leszko opened this issue Nov 27, 2019 · 13 comments
Labels
good first issue Good for newcomers

Comments

@leszko
Copy link

leszko commented Nov 27, 2019

Currently the default livenessProbe endpoint is /health/node-state. It should actually be /health.

The change is trivial, but before applying it we need to double check that it does not break rolling upgrade and scaling down.

@mesutcelik
Copy link

Are you going to parse the response and decide if it is healthy?

Hazelcast::NodeState=ACTIVE
Hazelcast::ClusterState=ACTIVE
Hazelcast::ClusterSafe=TRUE
Hazelcast::MigrationQueueSize=0
Hazelcast::ClusterSize=2

Apart from /node-state/ and /health, we have /ready too. We just need to figure out which one does really tell us restart me if it returns not-200 Response Code. cc: @mmedenjak

@leszko
Copy link
Author

leszko commented Nov 28, 2019

I think that we don't need to parse it, HTTP 200 from /health should mean I'm alive. That is what should be used for livenessProbe.

Currently I think it's a little wrong, because we use /health/node-state and it returns 503 if the Hazelcast node is in the shutdown state. So if Hazelcast is in the shutdown state, Kubernetes terminates it. But I think that if Hazelcast is in the shutdown state, it's still alive and we should wait until it shuts down by itself properly.

@mesutcelik mesutcelik added the good first issue Good for newcomers label Mar 19, 2020
@adnxn
Copy link

adnxn commented Mar 28, 2020

The change is trivial, but before applying it we need to double check that it does not break rolling upgrade and scaling down.

any guidance on how to validate these two things? rolling upgrades and scaling down?

@adnxn
Copy link

adnxn commented Mar 28, 2020

well just came across these:
https://hazelcast.com/blog/rolling-upgrade-hazelcast-imdg-on-kubernetes/
https://hazelcast.com/blog/how-to-scale-hazelcast-imdg-on-kubernetes/

but yea - any other advice would be welcome. thanks

@Holmistr
Copy link

Hi @adnxn , glad to see you here :) I'm assigning you the issue and I'll make sure to get you some guidance from our experts. Looking forward to your contribution!

@leszko
Copy link
Author

leszko commented Mar 30, 2020

@adnxn, thanks for taking this issue!

So, I recommend doing the following:

  • Change /health/node-state => /health and check if it waits correctly for readiness (it should start one member, wait 30s, start second member, wait, etc.)
  • Check rolling upgrade (put a lot of data into a cluster (~2GB), perform rolling upgrade, check if there is no data loss)
  • Check scaling (make big cluster (6 members at least), put a lot of data (~2GB), scale down to 2 members, scale up to 6 members, check if there is no data loss)

Also, I'd not change readinessProbe, readinessProbe should still stay as /health/node-state.

@leszko
Copy link
Author

leszko commented Mar 30, 2020

And about the technical details how to scale up/down and how to perform the rolling updates, the blog posts you mentioned are good guidelines. I recommend using Helm Chart.

Then for the scaling, all you need to do is to execute:

helm install --name my-release --set cluster.memberCount=6 stable/hazelcast
helm upgrade my-release --set cluster.memberCount=3 stable/hazelcast

And for the rolling update:

helm install --name my-release --set image.tag=3.12 hazelcast/hazelcast
helm upgrade my-release --set image.tag=3.12.1 hazelcast/hazelcast

Write here if you encounter any issues. I'll try to help.

@adnxn
Copy link

adnxn commented Apr 4, 2020

@leszko: thanks for the info

put a lot of data (~2GB)

what would be the best way to do this?

also - seems like changing the endpoint breaks the management console for v3.12.* hrm

@leszko
Copy link
Author

leszko commented Apr 6, 2020

@adnxn

  1. Right, for the older HZ version (3.12.x), you need to use the old helm chart version. I forgot to mention that. Try the following commands, they should work:
helm install --name my-release --set cluster.memberCount=6 stable/hazelcast --version 2.10.0
helm upgrade my-release --set cluster.memberCount=3 stable/hazelcast --version 2.10.0
  1. Wrt data inserting,
    You can either write the client app to insert the data. Or you can use the built-in Client Console App. If you have a running hazelcast cluster on kubernetes, try executing the following.
$ kubectl exec -it hazelcast-0 /bin/bash
# java -cp lib/hazelcast-all*.jar com.hazelcast.client.console.ClientConsoleApp

@mesutcelik
Copy link

Hi @adnxn ,
Do you need any more help to finalize this issue?

@leszko
Copy link
Author

leszko commented Sep 16, 2020

@adnxn are you still working on this issue?

@adnxn
Copy link

adnxn commented Sep 18, 2020

hey i havent had time to follow up on this. if someone else wants to take it over, feel free.

@adnxn adnxn removed their assignment Sep 18, 2020
@sgandon
Copy link

sgandon commented Oct 12, 2021

any news on this ?
is the liveness recommandation still /health
and readiness recommandation still /health/node-state ?
Do you confirm that the /health/node-state will respond 200 after having at least tried to join other hazelcast nodes at least once ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants