Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: deprecate optional http filters and add route/vh level is_optional support in the typed_per_filter_config #27263

Merged
merged 20 commits into from
Jun 1, 2023

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented May 9, 2023

Commit Message: router: remove optional http filters because it's hcm dependent
Additional Description:

The ConfigImpl (route table) created by the rds may shared by multiple listeners/hcm. It means that the ConfigImpl should be hcm independent.

However, in the current codebase, the optional http filters is extracted from the HCM and used for the ConfigImpl creation.
Because the xds cache, the first hcm's http filters list will be used.

This typically a bug or potential error becasue different HCMs may have complete different http filters list and optional configuration.

Risk Level: mid.
Testing: n/a.
Docs Changes: n/a.
Release Notes: n/a.
Platform Specific Features: n/a.
[Optional Runtime guard:] n/a.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@wbpcode
Copy link
Member Author

wbpcode commented May 9, 2023

cc @markdroth because you concern #15025

@wbpcode
Copy link
Member Author

wbpcode commented May 9, 2023

There is a quesion about the PR self:

  • should a runtime flag to be added? although i think this is a bug. But it was introduced at two years ago. May be some users depend on current behavior? cc @envoyproxy/runtime-guard-changes cc @mattklein123

@soulxu
Copy link
Member

soulxu commented May 9, 2023

There is a quesion about the PR self:

  • should a runtime flag to be added? although i think this is a bug. But it was introduced at two years ago. May be some users depend on current behavior? cc @envoyproxy/runtime-guard-changes cc @mattklein123

As my understanding the usecase of is_optional is primary for the case of upgrade envoy in the cluster, then new filter which is passed down by the control-plane won't break the functionality of the node with old version envoy. I guess for this case, this new filter will be marked as optional at all HCMs' config, it doesn't make sense in some HCMs are optional and some HCMs aren't optional if the user want the old envoy running correctly. I guess this is why this bug won't be found until now. If we directly remove this feature, I guess this will break some users depending on this.

I quickly search the code. maybe we can add the optional config into the hash of the provider

const uint64_t manager_identifier = MessageUtil::hash(rds);

Then we can get different provider instances for different optional config

Not sure whether this can be an acceptable temporary fix. if yes, then you can have a better fix later and won't take the risk break some users.

@wbpcode
Copy link
Member Author

wbpcode commented May 9, 2023

cc @soulxu See #26097 and related discussion. I think it's similar to your new idea and we had did lots of related discussion about it.

It's ok for me. depend on the decision of community.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some kind of test, thank you.

/wait

@wbpcode
Copy link
Member Author

wbpcode commented May 9, 2023

@mattklein123 I think I need some more input from the community first.

Will the suggestion from @soulxu be a possible better way to solve this problem?
This is related to rds, so I want consider it more carefully and take various options into account.

@markdroth
Copy link
Contributor

I think this is the right behavior, but I suggest combining this PR with #27210. That way, any user that runs into a problem will be able to solve it with the is_optional mechanism.

I don't entirely understand @soulxu's suggestion, but I think the behavior we want here is:

  • When we receive the RDS resource, we iterate through the filter configs, checking the type_url for each one to see if that filter impl is supported.
  • If the filter impl is not supported, the default behavior is to NACK the RDS resource.
  • If the filter impl is not supported but the entry is marked as is_optional, then we ignore the entry instead of NACKing.

Note that this behavior has already been agreed upon and has been implemented by gRPC. I don't think we want Envoy to do something different.

@soulxu
Copy link
Member

soulxu commented May 9, 2023

  • When we receive the RDS resource, we iterate through the filter configs, checking the type_url for each one to see if that filter impl is supported.
  • If the filter impl is not supported, the default behavior is to NACK the RDS resource.
  • If the filter impl is not supported but the entry is marked as is_optional, then we ignore the entry instead of NACKing.

Do we want the HCM's is_optional apply to the pre-route/vh filter config when is_optional isn't specified in the pre-route/vh filter config?

@wbpcode
Copy link
Member Author

wbpcode commented May 10, 2023

I don't entirely understand @soulxu's suggestion, but I think the behavior we want here is:

It basically like #26097. We we shared same rds resource and proto config in multiple HCMs. But we may have different config providers based on the optional filters list. Rds resource will be accepted if all config providers validated it.

But this way are more complicate and not sure if has other potential problem. So I didn't consider that first.


Of course we can fix current error directly like this PR did. But this error was exist for two years. I worry about the back compatiability. If some users depend on this 'error', orz.

Not sure if a runtime guard enough.

@soulxu
Copy link
Member

soulxu commented May 10, 2023

Of course we can fix current error directly like this PR did. But this error was exist for two years. I worry about the back compatiability. If some users depend on this 'error', orz.

Not sure if a runtime guard enough.

yea, the trouble is we documented this behavior in API doc

// This is also same with typed per filter config.
bool is_optional = 6;

That is declared behavior, so in theory, removing it isn't right. except we consider this behavior is fully not working, then it is fine to delete it directly. But it work some case, not sure whether have people depend on it.

@markdroth
Copy link
Contributor

Do we want the HCM's is_optional apply to the pre-route/vh filter config when is_optional isn't specified in the pre-route/vh filter config?

No. We can't do that without breaking cacheability.

It basically like #26097. We we shared same rds resource and proto config in multiple HCMs. But we may have different config providers based on the optional filters list. Rds resource will be accepted if all config providers validated it.

I don't think that will work, because the set of HCM configs can change over time. Consider the case where we initially have 3 HCM configs that point to the same RDS resource, we ACK that RDS resource, and then later we get a new HCM config that does not work with the RDS resource. At that point, the RDS resource would suddenly be considered invalid, but it's too late to NACK it.

As I said in the discussion in #26097, the current xDS ACK/NACK mechanism is inherently limited to validating the resource in isolation, without considering how it interacts with other resources. Any other behavior breaks cacheability.

yea, the trouble is we documented this behavior in API doc

I didn't realize we had that comment there, and I agree that it should be updated. But the wording there is actually so vague that I'm not sure anyone reading it would actually assume it means the current behavior.

@wbpcode
Copy link
Member Author

wbpcode commented May 11, 2023

I don't think that will work, because the set of HCM configs can change over time. Consider the case where we initially have 3 HCM configs that point to the same RDS resource, we ACK that RDS resource, and then later we get a new HCM config that does not work with the RDS resource. At that point, the RDS resource would suddenly be considered invalid, but it's too late to NACK it.

@markdroth yeah, I think you are right. I didn't thought this point in the past. Thanks.

@wbpcode
Copy link
Member Author

wbpcode commented May 11, 2023

Then, I think there no other choice. I will add a runtime guard and update doc. And also the tests.

This reverts commit 1725ce7.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #27263 was synchronize by wbpcode.

see: more, trace.

@wbpcode
Copy link
Member Author

wbpcode commented May 14, 2023

cc @alyssawilk for runtime guard.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@wbpcode
Copy link
Member Author

wbpcode commented May 14, 2023

cc @mattklein123 ready for a formal review. Could you take a look when you have free time, thanks.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@alyssawilk
Copy link
Contributor

I can do second pass but first can @adisuissa and @markdroth make sure to tag resolved conversation as resolved? It's a bit hard to read at least parts of diffs with all the comments, and I don't want to resolve anything still in progress

@markdroth
Copy link
Contributor

I don't have permission to resolve or unresolve comment threads. But I commented on the one thread that @alyssawilk resolved that I think is still open.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@wbpcode
Copy link
Member Author

wbpcode commented May 30, 2023

Hi, @markdroth @adisuissa, could you give a stamp about this PR if it ready for a second pass? Then we can push this to the next phase. 😄

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one last question about API breaking changes

source/common/router/config_impl.cc Show resolved Hide resolved
alyssawilk
alyssawilk previously approved these changes May 30, 2023
@wbpcode
Copy link
Member Author

wbpcode commented May 31, 2023

/retest

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
@wbpcode wbpcode dismissed stale reviews from snowp and mattklein123 May 31, 2023 02:23

The review outdated.

@wbpcode
Copy link
Member Author

wbpcode commented May 31, 2023

Tests were fixed and a new test to cover the new exception thowing was added.

Could you giva a new stamp? cc @alyssawilk Thanks.

@wbpcode wbpcode merged commit 9ecf761 into envoyproxy:main Jun 1, 2023
85 checks passed
@wbpcode wbpcode deleted the dev-remote-dependency-of-rds-to-hcm branch June 7, 2023 01:48
@wbpcode
Copy link
Member Author

wbpcode commented Jun 7, 2023

close #15025

reskin89 pushed a commit to reskin89/envoy that referenced this pull request Jul 11, 2023
…tional` support in the typed_per_filter_config (envoyproxy#27263)

* router: remove optional http filters because it's hcm dependent

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* Revert "router: remove optional http filters because it's hcm dependent"

This reverts commit 1725ce7.

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* add runtime guard/tests/change log

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix more test

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* add route/virual host level optional flag support and test and change log

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix format

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix docs

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix test

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* Update changelogs/current.yaml

Co-authored-by: Adi (Suissa) Peleg <adip@google.com>
Signed-off-by: code <wangbaiping@corp.netease.com>

* address all comments

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix conflict

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* address comment

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* address comments

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

* fix test and add new test to cover the new exception throwing

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>

---------

Signed-off-by: wbpcode <wangbaiping@corp.netease.com>
Signed-off-by: code <wangbaiping@corp.netease.com>
Co-authored-by: Adi (Suissa) Peleg <adip@google.com>
Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants