-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add design proposal for L4 HealthCheck #404
Conversation
… to the control plane [ci skip]
3e94429
to
96ad7f1
Compare
96ad7f1
to
ccdcef4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice research!
Just for the reference to how Consul works. Each Agent healthchecks their own service and send healthy/unhealthy event to Master only if there is a change. Service is healthy only when HC passed and Agent is up and running. At the same time, Agents are aware of each by communicating using Gossip Protocol, so when Agent is down agents nearby will detect this, send a message to Master and service becomes unhealthy.
passiveChecks: | ||
unhealthy_threshold: 3 | ||
penalty_interval: 10s # for how long endpoint should be considered unhealthy | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I see source and destination I think of the connection between apps. For TrafficPermissions it makes sense because we want to secure connection between apps. Same with TrafficLogging. For passive HC it makes sense because we are checking the health of the connection.
For active HC I'd say this semantic is confusing. We want to define active HC for the application, not for the connection between applications. Maybe we should come up with different semantics for this, like target
instead of sources
+destinations
.
|
||
Conclusions: | ||
* we can already use `Health xDS` for `Envoy -> local app` health checks | ||
* changes to the Envoy will be necessary to use `Health xDS` for `Envoy -> upstream` "health checks" (add support for mTLS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case of using HDS for Envoy -> upstream
? Wouldn't it be better to only HC your local app and send status to CP, which then updates list of endpoints for dataplanes that use this service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
success of Envoy -> local app
check doesn't give a full picture:
- it doesn't account for mTLS between client and server
- it doesn't account for different geographical location (e.g., connectivity to stand-by instances in another datacenter)
|
||
## Requirements | ||
|
||
1. support `Envoy -> upstream` "health checks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this scales very well. At least for active health checks.
Let's say we've got app backend
and 10 other apps with 10 instances each that communicate with it. HC is sent every second. Now we generate 100rps to backend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a user's choice.
Active health checks make perfect sense when your infrastructure is not big.
Conclusions: | ||
* we can already use `Health xDS` for `Envoy -> local app` health checks | ||
* changes to the Envoy will be necessary to use `Health xDS` for `Envoy -> upstream` "health checks" (add support for mTLS) | ||
* changes to the Envoy will be necessary to send event logs to the Control Plane (instead of logging to a file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to use active HC events when HDS is used?
For passive HC (outlier detection) I think this is "very local" for connection between A -> B. What would you do with information that B does not work from A perspective in the control plane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal of a Control Plane is to be smart and help users in every possible way.
E.g.,
- make it visible that the problem is local to a single dataplane
- make it visible that the problem is specific to a certain geo location
|
Summary
L4 HealthCheck
Related issues
#393