document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

Mistobaan · 2018-01-20T02:13:35Z

I am trying to deploy the docker image gcr.io/stackdriver-trace-docker/zipkin-collector inside kubernetes cluster. I was wondering what would be the best endpoint to have the liveProve / readyProbe functionality for a kubernetes pod.

Mistobaan · 2018-01-20T02:19:33Z

I just noticed in the startup logs that /health is enabled

bogdandrutu · 2018-02-01T18:32:32Z

What can I do to help?

negz · 2018-02-14T10:32:48Z

Just chiming in - I run the zipkin collector in a Kubernetes cluster and I'm using the /health endpoint for liveness and readiness probes without issue. It would be nice if the endpoint were documented.

Mistobaan · 2018-02-16T17:08:51Z

exactly, the endpoint is there but is not documented. I changed the title of the issue.

codefromthecrypt · 2024-01-13T23:17:30Z

so the health endpoint is mentioned here, but yeah not well documented. We also use this for the HEALTHCHECK directive in docker. I'll move this issue to the main repo, noting that there is an emerging https://github.com/openzipkin/zipkin-helm which should master the info on k8s stuff

https://github.com/openzipkin/zipkin/tree/master/zipkin-server#endpoints

codefromthecrypt · 2024-01-15T14:00:14Z

So, in summary, probably we should coalesce on a practice before documenting one, but gut feel is adapting from our Dockerfile one is not a bad start. We probably need some advice to clarify lack of startup and liveness probes in the ecosystem, e.g. if that's a feature or a bug. cc @mfordjody @optional303

To begin this, /health is a composite status of the heath of zipkin's dependencies. For example, if zipkin is configured for stackdriver or kafka and either connection don't work, /health will return non 200 code.

here is the text on out HEALTHCHECK in docker, which ack is disabled for k8s, but is the same basic info

# We use start period of 30s to avoid marking the container unhealthy on slow or contended CI hosts.
#
# If in production, you have a 30s startup, please report to https://gitter.im/openzipkin/zipkin
# including the values of the /health and /info endpoints as this would be unexpected.
HEALTHCHECK --interval=5s --start-period=30s --timeout=5s CMD ["docker-healthcheck"]

https://github.com/openzipkin/zipkin/blob/master/docker/Dockerfile#L66-L70

From https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold.

The incubating zipkin-helm chart in this org seems to have defined readiness, but not liveness, from docker HEALTHCHECK, notably missing the start period

          readinessProbe:
            httpGet:
              path: /health
              port: 9411
            initialDelaySeconds: 5
            periodSeconds: 5

https://github.com/openzipkin/zipkin-helm/blob/475ebd84f98992d423ef5d417756c388c9cdfb68/charts/zipkin/templates/deployment.yaml#L71-L76

The setup above is consistent with a few other helm charts including https://github.com/radius-project/radius/blob/21b25ddf265f0464e4641b8c79cff61a4f9badd0/deploy/monitoring/zipkin-mem.yaml#L19 and https://github.com/apache/dubbo-kubernetes/blob/c4f2898e4eacd978c780bec79989a465f3e5a9dd/deploy/kubernetes/zipkin.yaml#L91-L96

That said, Financial Times is using a socket for one and /health for the other, here, defaulting the startup delay to 200

          livenessProbe:
            initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
            tcpSocket:
              port: {{ .Values.ui.queryPort }}
          readinessProbe:
            initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
            httpGet:
              path: /health
              port: {{ .Values.ui.queryPort }}

https://github.com/Financial-Times/zipkin-helm/blob/0ae00a2e2be986b58de30475601ea5dc686ea0fd/templates/zipkin-ui.yaml#L29-L37

codefromthecrypt · 2024-02-19T00:06:47Z

While spring-boot usage is an internal detail (so we wouldn't use their mappings or rely on how they do things like via an event bus which is TMI), that boot explicitly uses different HTTP paths for liveness and readiness is interesting and useful research https://spring.io/blog/2020/03/25/liveness-and-readiness-probes-with-spring-boot

Mistobaan changed the title ~~liveProve, readyProbe HTTP endpoints?~~ document /health endpoints, useful for liveProve, readyProbe in k8s deployments. Feb 16, 2018

codefromthecrypt transferred this issue from openzipkin/zipkin-gcp Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

Mistobaan commented Jan 20, 2018

Mistobaan commented Jan 20, 2018

bogdandrutu commented Feb 1, 2018

negz commented Feb 14, 2018

Mistobaan commented Feb 16, 2018

codefromthecrypt commented Jan 13, 2024

codefromthecrypt commented Jan 15, 2024

codefromthecrypt commented Feb 19, 2024

document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

document /health endpoints, useful for liveProve, readyProbe in k8s deployments. #3688

Comments

Mistobaan commented Jan 20, 2018

Mistobaan commented Jan 20, 2018

bogdandrutu commented Feb 1, 2018

negz commented Feb 14, 2018

Mistobaan commented Feb 16, 2018

codefromthecrypt commented Jan 13, 2024

codefromthecrypt commented Jan 15, 2024

codefromthecrypt commented Feb 19, 2024