Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

Open
Mistobaan opened this issue Jan 20, 2018 · 7 comments

Comments

@Mistobaan
Copy link

I am trying to deploy the docker image gcr.io/stackdriver-trace-docker/zipkin-collector inside kubernetes cluster. I was wondering what would be the best endpoint to have the liveProve / readyProbe functionality for a kubernetes pod.

@Mistobaan
Copy link
Author

I just noticed in the startup logs that /health is enabled

@bogdandrutu
Copy link

What can I do to help?

@negz
Copy link

negz commented Feb 14, 2018

Just chiming in - I run the zipkin collector in a Kubernetes cluster and I'm using the /health endpoint for liveness and readiness probes without issue. It would be nice if the endpoint were documented.

@Mistobaan Mistobaan changed the title liveProve, readyProbe HTTP endpoints? document /health endpoints, useful for liveProve, readyProbe in k8s deployments. Feb 16, 2018
@Mistobaan
Copy link
Author

exactly, the endpoint is there but is not documented. I changed the title of the issue.

@codefromthecrypt
Copy link
Member

so the health endpoint is mentioned here, but yeah not well documented. We also use this for the HEALTHCHECK directive in docker. I'll move this issue to the main repo, noting that there is an emerging https://github.com/openzipkin/zipkin-helm which should master the info on k8s stuff

https://github.com/openzipkin/zipkin/tree/master/zipkin-server#endpoints

@codefromthecrypt codefromthecrypt transferred this issue from openzipkin/zipkin-gcp Jan 13, 2024
@codefromthecrypt
Copy link
Member

So, in summary, probably we should coalesce on a practice before documenting one, but gut feel is adapting from our Dockerfile one is not a bad start. We probably need some advice to clarify lack of startup and liveness probes in the ecosystem, e.g. if that's a feature or a bug. cc @mfordjody @optional303


To begin this, /health is a composite status of the heath of zipkin's dependencies. For example, if zipkin is configured for stackdriver or kafka and either connection don't work, /health will return non 200 code.

here is the text on out HEALTHCHECK in docker, which ack is disabled for k8s, but is the same basic info

# We use start period of 30s to avoid marking the container unhealthy on slow or contended CI hosts.
#
# If in production, you have a 30s startup, please report to https://gitter.im/openzipkin/zipkin
# including the values of the /health and /info endpoints as this would be unexpected.
HEALTHCHECK --interval=5s --start-period=30s --timeout=5s CMD ["docker-healthcheck"]

https://github.com/openzipkin/zipkin/blob/master/docker/Dockerfile#L66-L70

From https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold.

The incubating zipkin-helm chart in this org seems to have defined readiness, but not liveness, from docker HEALTHCHECK, notably missing the start period

          readinessProbe:
            httpGet:
              path: /health
              port: 9411
            initialDelaySeconds: 5
            periodSeconds: 5

https://github.com/openzipkin/zipkin-helm/blob/475ebd84f98992d423ef5d417756c388c9cdfb68/charts/zipkin/templates/deployment.yaml#L71-L76

The setup above is consistent with a few other helm charts including https://github.com/radius-project/radius/blob/21b25ddf265f0464e4641b8c79cff61a4f9badd0/deploy/monitoring/zipkin-mem.yaml#L19 and https://github.com/apache/dubbo-kubernetes/blob/c4f2898e4eacd978c780bec79989a465f3e5a9dd/deploy/kubernetes/zipkin.yaml#L91-L96

That said, Financial Times is using a socket for one and /health for the other, here, defaulting the startup delay to 200

          livenessProbe:
            initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
            tcpSocket:
              port: {{ .Values.ui.queryPort }}
          readinessProbe:
            initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
            httpGet:
              path: /health
              port: {{ .Values.ui.queryPort }}

https://github.com/Financial-Times/zipkin-helm/blob/0ae00a2e2be986b58de30475601ea5dc686ea0fd/templates/zipkin-ui.yaml#L29-L37

@codefromthecrypt
Copy link
Member

While spring-boot usage is an internal detail (so we wouldn't use their mappings or rely on how they do things like via an event bus which is TMI), that boot explicitly uses different HTTP paths for liveness and readiness is interesting and useful research https://spring.io/blog/2020/03/25/liveness-and-readiness-probes-with-spring-boot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants