Skip to content

support override host status restriction for stateful session#19665

Merged
htuch merged 16 commits intoenvoyproxy:mainfrom
wbpcode:override_host_status_restriction
Mar 10, 2022
Merged

support override host status restriction for stateful session#19665
htuch merged 16 commits intoenvoyproxy:mainfrom
wbpcode:override_host_status_restriction

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented Jan 24, 2022

Commit Message: support override host status restriction for stateful session
Additional Description:

Stateful session will try to parse upstream address from downstream request directly and override the result of load balancing algorithm by the LoadBalancerContext::overrideHostToSelect API.

To avoid the load balancer selecting hosts that in unexpected statuses, specifying some expected statuses are necessary.

In the previous design, we will provide expected statuses of override host by the LoadBalancerContext::overrideHostToSelect API.

And in the PR #18207, after some discussion with @htuch, we found may be cluster-level config may be more reasonable design and implementation.
Ref some more details: #18207 (comment)

So this PR try to close previous discussion in the #18207:

  • Refactoring LoadBalancerContext::overrideHostToSelect API to remove expected statuses for the return value.
  • Add new common lb config override_host_status and related implementation.

Risk Level: Mid.
Testing: N/A.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.

Signed-off-by: wbpcode <wbphub@live.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #19665 was opened by wbpcode.

see: more, trace.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jan 24, 2022

@htuch Could you please give a review to the new proto API when you have free time? 😄

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jan 24, 2022

And @snowp Could you please add some comments for these new code when you have some bandwidth? 🌷

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jan 24, 2022

cc @howardjohn

Signed-off-by: wbpcode <wbphub@live.com>
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
Looked mostly at the API, left a few comments.

DEGRADED = 5;
}

message HealthStatusSet {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this message to have more fields?
If not, what's the advantage over just specifying the repeated field directly (in your specific case renaming the field to override_host_status_set)?

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we need to distinguish between the case where no override_host_status is configured at all and the case where 0 items are configured. The former will provide a default value.

Comment on lines +599 to +606
// Expected override host statuses. The result of load balancing is allowed to be overwritten
// only when the target host status is as expected when stateful session stickiness directly
// specifies a target host.
//
// If this is set and and has no any item then any host from stateful session stickiness will be
// ignored.
// If this is not set then healthy statuses (``UNKNOWN``, ``HEALTHY`` and ``DEGRADED`` for now)
// will be applied by default.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field only relevant when the stateful_session extension is used?
If so, can you add this to the comment, and also add a reference to it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be referring to the specific extension here at all? Or should we instead have this talk about the host override feature in general? I can imagine other extensions wanting to use host overrides as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, it does. I will add some more comments. Thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I do not fully grok is what is expected if a config has this field set, but doesn't have the sticky_session extension configured.

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be referring to the specific extension here at all? Or should we instead have this talk about the host override feature in general? I can imagine other extensions wanting to use host overrides as well

@htuch @snowp This configuration is designed for general host override. But LoadBalancerContext::overrideHostToSelect() is internaly C++ API which is annomymous for public users and the stateful session is the only related implemenatation for now. Although we will have more extensions in the future, now we still need a specific example to help users to understand this config.

The result of load balancing is allowed to be overwritten by the L4/L7 filters specified host.
For example the `stateful session filter <........>` will specfic upstream host directly accoriding to the 
downstream request.
But the overridden will only worked when the status of specficed host is as expected in the configuration.

how about some comments like this? cc @htuch cc @adisuissa

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost reading this. I think there is context assumed of the reader that isn't likely to be there in most cases. Can you explain this section as if the reader had never heard of this override feature, pointing them to the relevant docs with RST links and/or some examples?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The override host feature is currently only available through the L4/L7 extensions. Now stateful session is the only extension that is ready.

So my intention is to use the stateful session as an example to help the reader understand the functionality and the configuration.

I will think about how to make it clearer and easier to understand. 🤔

return status & HealthyStatus;
uint32_t LoadBalancerContextBase::createOverrideHostStatus(
const envoy::config::cluster::v3::Cluster::CommonLbConfig& common_config) {
if (!common_config.has_override_host_status()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is but it is empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then override_host_status will be zero and all the override host from LoadBalancerContext::overrideHostToSelect() will be ignored and stateful session stickiness will be disabled for current cluster.


message HealthStatusSet {
// An order-independent set of health status.
repeated HealthStatus statuses = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an empty set? If not, then add a PGV constraint.

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it can be an empty set.

Then override_host_status will be zero and all the override host from LoadBalancerContext::overrideHostToSelect() will be ignored and stateful session stickiness will be disabled for current cluster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like the case when the override_host_status field is set which is technically different from being set with a zero list. Is there a distinction between override_host_status not being set and it being set with an empty list of statuses?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp Yep. If override_host_status is set with an empty list of statuses then the override host will be ignored. And if override_host_status is not set completely, we will give some default statuses.

Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about the api, but otherwise LGTM.

@adisuissa
Copy link
Copy Markdown
Contributor

/wait

Signed-off-by: wbpcode <wbphub@live.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 8, 2022

@adisuissa @htuch docs/API is updated. Thanks for all your patient reivews 😄. And sorry for the delayed replies and updates.

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@repokitteh-read-only repokitteh-read-only bot removed the api label Feb 8, 2022
@wbpcode wbpcode requested a review from snowp February 8, 2022 07:19
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 8, 2022

hi, @snowp , the API is ready and could you give a review to the implementation when you have free time? Thanks. 🌷

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 10, 2022

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19665 (comment) was created by @wbpcode.

see: more, trace.

@KBaichoo
Copy link
Copy Markdown
Contributor

ping @snowp PTAL :)

@htuch
Copy link
Copy Markdown
Member

htuch commented Feb 20, 2022

I can provide senior maintainer coverage once @adisuissa has reviewed.

Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 21, 2022

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19665 (comment) was created by @wbpcode.

see: more, trace.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Feb 21, 2022

Hi, @adisuissa the review comments from @snowp are addressed. PTAL, thanks. 🌷

Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall LGTM.
Left one minor comment.
/wait

Comment on lines +547 to +548
default:
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may hide a case where the server sends a config with some other health status.
Consider adding [(validate.rules).enum.defined_only = true] to the statuses, and maybe a emit a log here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

Signed-off-by: wbpcode <wbphub@live.com>
@rojkov
Copy link
Copy Markdown
Member

rojkov commented Feb 28, 2022

@adisuissa gentle ping

adisuissa
adisuissa previously approved these changes Mar 1, 2022
Copy link
Copy Markdown
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api
LGTM code changes.
@htuch for final review.
Also, not sure if this needs release notes.

@repokitteh-read-only repokitteh-read-only bot removed the api label Mar 1, 2022
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but a few suggestions.

Signed-off-by: wbpcode <wbphub@live.com>
adisuissa
adisuissa previously approved these changes Mar 5, 2022
Signed-off-by: wbpcode <wbphub@live.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one last nit on type alias.

return status & DegradedStatus;
case Host::Health::Healthy:
return status & HealthyStatus;
std::bitset<32> LoadBalancerContextBase::createOverrideHostStatus(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you add a using type alias for this, e.g. using HostStatusSet = std::bitset<32>? search-replace this across the diff.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

//
// If multiple bit fields are set, it is acceptable as long as the status of override host is in
// any of these statuses.
const std::bitset<32> override_host_status_{};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it doesn't need to be 32-bit long, but it probably doesn't matter that much either.

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Only 3 bits are used for now. But in the x64 matchine, the storage size of std::bitset<N> (N > 0) is at least 8 bytes. So I think setting N to 32 or 8 makes little difference.

    std::cout << sizeof(std::bitset<8>) << std::endl;  // 8
    std::cout << sizeof(std::bitset<16>) << std::endl;  // 8
    std::cout << sizeof(std::bitset<32>) << std::endl;  // 8
    std::cout << sizeof(std::bitset<64>) << std::endl;  // 8
    std::cout << sizeof(std::bitset<128>) << std::endl; // 16

Signed-off-by: wbpcode <wbphub@live.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Mar 9, 2022

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19665 (comment) was created by @wbpcode.

see: more, trace.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit 1575185 into envoyproxy:main Mar 10, 2022
JuniorHsu pushed a commit to JuniorHsu/envoy that referenced this pull request Mar 17, 2022
…roxy#19665)

Stateful session will try to parse upstream address from downstream request directly and override the result of load balancing algorithm by the LoadBalancerContext::overrideHostToSelect API.

To avoid the load balancer selecting hosts that in unexpected statuses, specifying some expected statuses are necessary.

In the previous design, we will provide expected statuses of override host by the LoadBalancerContext::overrideHostToSelect API.

And in the PR envoyproxy#18207, after some discussion with @htuch, we found may be cluster-level config may be more reasonable design and implementation.
Ref some more details: envoyproxy#18207 (comment)

So this PR try to close previous discussion in the envoyproxy#18207:

Refactoring LoadBalancerContext::overrideHostToSelect API to remove expected statuses for the return value.
Add new common lb config override_host_status and related implementation.
Risk Level: Mid.
Testing: N/A.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.

@wbpcode

Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: kuochunghsu <kuochunghsu@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants