docs: add design proposal for L4 HealthCheck #404

yskopets · 2019-11-02T18:28:39Z

Summary

add design proposal for L4 HealthCheck

Related issues

#393

[ci skip]

… to the control plane [ci skip]

[ci skip]

jakubdyszkiewicz

Nice research!

Just for the reference to how Consul works. Each Agent healthchecks their own service and send healthy/unhealthy event to Master only if there is a change. Service is healthy only when HC passed and Agent is up and running. At the same time, Agents are aware of each by communicating using Gossip Protocol, so when Agent is down agents nearby will detect this, send a message to Master and service becomes unhealthy.

jakubdyszkiewicz · 2019-11-04T08:55:48Z

docs/proposals/HealthChecks-L4.md

+  passiveChecks:
+    unhealthy_threshold: 3
+    penalty_interval: 10s # for how long endpoint should be considered unhealthy
+```


When I see source and destination I think of the connection between apps. For TrafficPermissions it makes sense because we want to secure connection between apps. Same with TrafficLogging. For passive HC it makes sense because we are checking the health of the connection.

For active HC I'd say this semantic is confusing. We want to define active HC for the application, not for the connection between applications. Maybe we should come up with different semantics for this, like target instead of sources+destinations.

jakubdyszkiewicz · 2019-11-04T09:01:22Z

docs/proposals/HealthChecks-L4.md

+
+Conclusions:
+* we can already use `Health xDS` for `Envoy -> local app` health checks
+* changes to the Envoy will be necessary to use `Health xDS` for `Envoy -> upstream` "health checks" (add support for mTLS)


What is the use case of using HDS for Envoy -> upstream? Wouldn't it be better to only HC your local app and send status to CP, which then updates list of endpoints for dataplanes that use this service?

success of Envoy -> local app check doesn't give a full picture:

it doesn't account for mTLS between client and server

it doesn't account for different geographical location (e.g., connectivity to stand-by instances in another datacenter)

jakubdyszkiewicz · 2019-11-04T09:16:03Z

docs/proposals/HealthChecks-L4.md

+
+## Requirements
+
+1. support `Envoy -> upstream` "health checks"


I don't think this scales very well. At least for active health checks.
Let's say we've got app backend and 10 other apps with 10 instances each that communicate with it. HC is sent every second. Now we generate 100rps to backend.

It's always a user's choice.

Active health checks make perfect sense when your infrastructure is not big.

jakubdyszkiewicz · 2019-11-04T09:19:24Z

docs/proposals/HealthChecks-L4.md

+Conclusions:
+* we can already use `Health xDS` for `Envoy -> local app` health checks
+* changes to the Envoy will be necessary to use `Health xDS` for `Envoy -> upstream` "health checks" (add support for mTLS)
+* changes to the Envoy will be necessary to send event logs to the Control Plane (instead of logging to a file)


Does it make sense to use active HC events when HDS is used?

For passive HC (outlier detection) I think this is "very local" for connection between A -> B. What would you do with information that B does not work from A perspective in the control plane?

the goal of a Control Plane is to be smart and help users in every possible way.

E.g.,

make it visible that the problem is local to a single dataplane

make it visible that the problem is specific to a certain geo location

CLAassistant · 2019-11-22T20:38:26Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

[ci skip]

yskopets requested review from subnetmarco and jakubdyszkiewicz November 2, 2019 18:28

yskopets added docs and removed kuma-cp labels Nov 2, 2019

yskopets added 2 commits November 2, 2019 19:29

docs: add a template for HealthCheck proposal

4e7460e

[ci skip]

docs: document consideration notes for reporting health statuses back…

47672c5

… to the control plane [ci skip]

yskopets force-pushed the docs/health-checks-proposal branch 2 times, most recently from 3e94429 to 96ad7f1 Compare November 2, 2019 18:32

docs: add design proposal for HealthCheck resource

ccdcef4

[ci skip]

yskopets force-pushed the docs/health-checks-proposal branch from 96ad7f1 to ccdcef4 Compare November 2, 2019 18:33

yskopets added the design-proposal label Nov 2, 2019

jakubdyszkiewicz reviewed Nov 4, 2019

View reviewed changes

docs: add design proposal for HealthCheck resource

d854b10

[ci skip]

yskopets requested a review from a team December 4, 2019 13:11

yskopets merged commit 57e210b into master Dec 11, 2019

yskopets deleted the docs/health-checks-proposal branch December 19, 2019 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add design proposal for L4 HealthCheck #404

docs: add design proposal for L4 HealthCheck #404

yskopets commented Nov 2, 2019

jakubdyszkiewicz left a comment

jakubdyszkiewicz Nov 4, 2019

jakubdyszkiewicz Nov 4, 2019

yskopets Nov 4, 2019

jakubdyszkiewicz Nov 4, 2019

yskopets Nov 4, 2019

jakubdyszkiewicz Nov 4, 2019

yskopets Nov 4, 2019 •

edited

CLAassistant commented Nov 22, 2019


		## Requirements

		1. support `Envoy -> upstream` "health checks"

docs: add design proposal for L4 HealthCheck #404

docs: add design proposal for L4 HealthCheck #404

Conversation

yskopets commented Nov 2, 2019

Summary

Related issues

jakubdyszkiewicz left a comment

Choose a reason for hiding this comment

jakubdyszkiewicz Nov 4, 2019

Choose a reason for hiding this comment

jakubdyszkiewicz Nov 4, 2019

Choose a reason for hiding this comment

yskopets Nov 4, 2019

Choose a reason for hiding this comment

jakubdyszkiewicz Nov 4, 2019

Choose a reason for hiding this comment

yskopets Nov 4, 2019

Choose a reason for hiding this comment

jakubdyszkiewicz Nov 4, 2019

Choose a reason for hiding this comment

yskopets Nov 4, 2019 • edited

Choose a reason for hiding this comment

CLAassistant commented Nov 22, 2019

yskopets Nov 4, 2019 •

edited