support override host status restriction for stateful session#19665
support override host status restriction for stateful session#19665htuch merged 16 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: wbpcode <wbphub@live.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
@htuch Could you please give a review to the new proto API when you have free time? 😄 |
|
And @snowp Could you please add some comments for these new code when you have some bandwidth? 🌷 |
|
cc @howardjohn |
adisuissa
left a comment
There was a problem hiding this comment.
Thanks for working on this!
Looked mostly at the API, left a few comments.
| DEGRADED = 5; | ||
| } | ||
|
|
||
| message HealthStatusSet { |
There was a problem hiding this comment.
Do you expect this message to have more fields?
If not, what's the advantage over just specifying the repeated field directly (in your specific case renaming the field to override_host_status_set)?
There was a problem hiding this comment.
This is because we need to distinguish between the case where no override_host_status is configured at all and the case where 0 items are configured. The former will provide a default value.
| // Expected override host statuses. The result of load balancing is allowed to be overwritten | ||
| // only when the target host status is as expected when stateful session stickiness directly | ||
| // specifies a target host. | ||
| // | ||
| // If this is set and and has no any item then any host from stateful session stickiness will be | ||
| // ignored. | ||
| // If this is not set then healthy statuses (``UNKNOWN``, ``HEALTHY`` and ``DEGRADED`` for now) | ||
| // will be applied by default. |
There was a problem hiding this comment.
Is this field only relevant when the stateful_session extension is used?
If so, can you add this to the comment, and also add a reference to it?
There was a problem hiding this comment.
Should we be referring to the specific extension here at all? Or should we instead have this talk about the host override feature in general? I can imagine other extensions wanting to use host overrides as well
There was a problem hiding this comment.
For the moment, it does. I will add some more comments. Thanks.
There was a problem hiding this comment.
One thing that I do not fully grok is what is expected if a config has this field set, but doesn't have the sticky_session extension configured.
There was a problem hiding this comment.
Should we be referring to the specific extension here at all? Or should we instead have this talk about the host override feature in general? I can imagine other extensions wanting to use host overrides as well
@htuch @snowp This configuration is designed for general host override. But LoadBalancerContext::overrideHostToSelect() is internaly C++ API which is annomymous for public users and the stateful session is the only related implemenatation for now. Although we will have more extensions in the future, now we still need a specific example to help users to understand this config.
The result of load balancing is allowed to be overwritten by the L4/L7 filters specified host.
For example the `stateful session filter <........>` will specfic upstream host directly accoriding to the
downstream request.
But the overridden will only worked when the status of specficed host is as expected in the configuration.
how about some comments like this? cc @htuch cc @adisuissa
There was a problem hiding this comment.
I'm a bit lost reading this. I think there is context assumed of the reader that isn't likely to be there in most cases. Can you explain this section as if the reader had never heard of this override feature, pointing them to the relevant docs with RST links and/or some examples?
There was a problem hiding this comment.
The override host feature is currently only available through the L4/L7 extensions. Now stateful session is the only extension that is ready.
So my intention is to use the stateful session as an example to help the reader understand the functionality and the configuration.
I will think about how to make it clearer and easier to understand. 🤔
| return status & HealthyStatus; | ||
| uint32_t LoadBalancerContextBase::createOverrideHostStatus( | ||
| const envoy::config::cluster::v3::Cluster::CommonLbConfig& common_config) { | ||
| if (!common_config.has_override_host_status()) { |
There was a problem hiding this comment.
What happens if there is but it is empty?
There was a problem hiding this comment.
Then override_host_status will be zero and all the override host from LoadBalancerContext::overrideHostToSelect() will be ignored and stateful session stickiness will be disabled for current cluster.
|
|
||
| message HealthStatusSet { | ||
| // An order-independent set of health status. | ||
| repeated HealthStatus statuses = 1; |
There was a problem hiding this comment.
Could this be an empty set? If not, then add a PGV constraint.
There was a problem hiding this comment.
Yep, it can be an empty set.
Then override_host_status will be zero and all the override host from LoadBalancerContext::overrideHostToSelect() will be ignored and stateful session stickiness will be disabled for current cluster.
There was a problem hiding this comment.
That sounds like the case when the override_host_status field is set which is technically different from being set with a zero list. Is there a distinction between override_host_status not being set and it being set with an empty list of statuses?
There was a problem hiding this comment.
@snowp Yep. If override_host_status is set with an empty list of statuses then the override host will be ignored. And if override_host_status is not set completely, we will give some default statuses.
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
…ide_host_status_restriction
Signed-off-by: wbpcode <wbphub@live.com>
adisuissa
left a comment
There was a problem hiding this comment.
Minor comment about the api, but otherwise LGTM.
|
/wait |
Signed-off-by: wbpcode <wbphub@live.com>
…ide_host_status_restriction
|
@adisuissa @htuch docs/API is updated. Thanks for all your patient reivews 😄. And sorry for the delayed replies and updates. |
|
hi, @snowp , the API is ready and could you give a review to the implementation when you have free time? Thanks. 🌷 |
…ide_host_status_restriction
|
/retest |
|
Retrying Azure Pipelines: |
|
ping @snowp PTAL :) |
|
I can provide senior maintainer coverage once @adisuissa has reviewed. |
…ide_host_status_restriction
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
|
/retest |
|
Retrying Azure Pipelines: |
|
Hi, @adisuissa the review comments from @snowp are addressed. PTAL, thanks. 🌷 |
adisuissa
left a comment
There was a problem hiding this comment.
Thanks! Overall LGTM.
Left one minor comment.
/wait
| default: | ||
| break; |
There was a problem hiding this comment.
I think this may hide a case where the server sends a config with some other health status.
Consider adding [(validate.rules).enum.defined_only = true] to the statuses, and maybe a emit a log here.
Signed-off-by: wbpcode <wbphub@live.com>
|
@adisuissa gentle ping |
htuch
left a comment
There was a problem hiding this comment.
Looks good, but a few suggestions.
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
htuch
left a comment
There was a problem hiding this comment.
LGTM modulo one last nit on type alias.
| return status & DegradedStatus; | ||
| case Host::Health::Healthy: | ||
| return status & HealthyStatus; | ||
| std::bitset<32> LoadBalancerContextBase::createOverrideHostStatus( |
There was a problem hiding this comment.
Nit: can you add a using type alias for this, e.g. using HostStatusSet = std::bitset<32>? search-replace this across the diff.
| // | ||
| // If multiple bit fields are set, it is acceptable as long as the status of override host is in | ||
| // any of these statuses. | ||
| const std::bitset<32> override_host_status_{}; |
There was a problem hiding this comment.
FWIW it doesn't need to be 32-bit long, but it probably doesn't matter that much either.
There was a problem hiding this comment.
Yes. Only 3 bits are used for now. But in the x64 matchine, the storage size of std::bitset<N> (N > 0) is at least 8 bytes. So I think setting N to 32 or 8 makes little difference.
std::cout << sizeof(std::bitset<8>) << std::endl; // 8
std::cout << sizeof(std::bitset<16>) << std::endl; // 8
std::cout << sizeof(std::bitset<32>) << std::endl; // 8
std::cout << sizeof(std::bitset<64>) << std::endl; // 8
std::cout << sizeof(std::bitset<128>) << std::endl; // 16
Signed-off-by: wbpcode <wbphub@live.com>
|
/retest |
|
Retrying Azure Pipelines: |
…roxy#19665) Stateful session will try to parse upstream address from downstream request directly and override the result of load balancing algorithm by the LoadBalancerContext::overrideHostToSelect API. To avoid the load balancer selecting hosts that in unexpected statuses, specifying some expected statuses are necessary. In the previous design, we will provide expected statuses of override host by the LoadBalancerContext::overrideHostToSelect API. And in the PR envoyproxy#18207, after some discussion with @htuch, we found may be cluster-level config may be more reasonable design and implementation. Ref some more details: envoyproxy#18207 (comment) So this PR try to close previous discussion in the envoyproxy#18207: Refactoring LoadBalancerContext::overrideHostToSelect API to remove expected statuses for the return value. Add new common lb config override_host_status and related implementation. Risk Level: Mid. Testing: N/A. Docs Changes: N/A. Release Notes: N/A. Platform Specific Features: N/A. @wbpcode Signed-off-by: wbpcode <wbphub@live.com> Signed-off-by: kuochunghsu <kuochunghsu@pinterest.com>
Commit Message: support override host status restriction for stateful session
Additional Description:
Stateful session will try to parse upstream address from downstream request directly and override the result of load balancing algorithm by the
LoadBalancerContext::overrideHostToSelectAPI.To avoid the load balancer selecting hosts that in unexpected statuses, specifying some expected statuses are necessary.
In the previous design, we will provide expected statuses of override host by the
LoadBalancerContext::overrideHostToSelectAPI.And in the PR #18207, after some discussion with @htuch, we found may be cluster-level config may be more reasonable design and implementation.
Ref some more details: #18207 (comment)
So this PR try to close previous discussion in the #18207:
LoadBalancerContext::overrideHostToSelectAPI to remove expected statuses for the return value.override_host_statusand related implementation.Risk Level: Mid.
Testing: N/A.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.