-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: Implement both versions of onConfigUpdate() everywhere #6879
Conversation
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
/assign htuch |
Signed-off-by: Fred Douglas <fredlas@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why having both onConfigUpdate
methods is necessary long term? I thought it was OK to keep them during the migration, but we could have only a single one as end state.
Here's how I reckon it could work. You write generically, in the SOTW generic code, a stack tracking library. This translates all SOTW updates into a delta update for the purposes of onConfigUpdate
. This comes down to filling in the list of resources to remove based on the knowledge of the resources present in the previous update and the current one. Would that work?
I think the issue is the I think the name field got added to the proto later on, after the comments about the double onConfigUpdate being a hack had gone in. So, I think the issue of names just hadn't occurred to anyone yet. |
@fredlas good point; we used to (sort of) have a method that did this mapping, making use of the fact that we could downcast to a known proto and it had that business logic for extracting the name out of the known resource types. I think you cleaned this up (or somebody did). Would there be value in a utility method that each individual xDS can call from its SOTW |
That sounded like a good approach, so I started looking into it, but hit another issue: the question of what is being diffed against. The answer is "state retrieved from some part of the rest of Envoy," like cm_.clusters() for CDS or listener_manager_.listeners() for LDS. I guess you could also have that utility function take a list of existing names as another base argument. But, this is all seeming a little messy. This PR as it is now will accomplish the same goals, just with ~10 lines of code reused once. Going the duplicate onConfigUpdate route will also not make it any harder to introduce this generic approach later. So, let's proceed this way so that aggregated delta can get done. |
@fredlas I'll continue the review, but would be appreciated if you could track this improvement as an issue or a TODO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, nice to see this all coming together.
/wait
source/common/router/rds_impl.cc
Outdated
} | ||
Protobuf::RepeatedPtrField<ProtobufWkt::Any> unwrapped_resource; | ||
*unwrapped_resource.Add() = resources[0].resource(); | ||
onConfigUpdate(unwrapped_resource, resources[0].version()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the way it works today is that we only get an RdsRouteConfigSubscription::onConfigUpdate
update if the resource is present in the RDS response. If it's absent, no change is made to the configuration. Initially it's empty, and we warm listeners on first response. We may have an empty config though if the update failed.
In the delta world, we can add resources as they arrive. If we get a removal request, what do we do? Should we be resetting the configuration back to empty? The semantics are a bit different here to SOTW, since with SOTW, we are cool with missing config implying latching, here we have an explicit removal operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess would be to have a removal request reset the configuration, since RDS is one of the types that does have a "name" to it. If the server wanted to update, it should just send an updated version. If it's sending an update explicitly naming the resource as being gone, it makes sense to get rid of that resource. Although, it sounds like when the RDS client gets started, it already has a name in mind, and only that name is valid. So removing the config should be more like shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to address the code changes for this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I ended up logging an error and ignoring the remove request. It seems like it basically doesn't make sense for the server to even try a removal here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this becomes more interesting with on-demand, where you have the possibility the management server decides to remove a resource and have the client lazily fetch (and maybe recreate it on that path). Since we don't support that, I think it's fine to error out for now, but I'd like to make this a TODO and link to an on-demand tracking issue (do we have one?) so that folks know this is subject to change in terms of behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, yes. Left a TODO. Looks like there is #2500.
Signed-off-by: Fred Douglas <fredlas@google.com>
test/common/router/rds_impl_test.cc
Outdated
|
||
std::string config_json2 = R"EOF( | ||
{ | ||
"api_type": "REST", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does a REST RDS configuration figure into delta xDS testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't accept DELTA_GRPC. Maybe I should just delete this test.
test/common/router/rds_impl_test.cc
Outdated
|
||
EXPECT_EQ(0UL, | ||
route_config_provider_manager_->dumpRouteConfigs()->dynamic_route_configs().size()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear to me which bits of this are testing generic route config provider logic and which are the delta xDS specifics. Is it possible to add more comments or reduce the route config provider testing that is already covered elsewhere?
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
}; | ||
|
||
// Basic test of delta's passthrough call to the state-of-the-world variant, to | ||
// increase coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more than just coverage, since we have some new code and want to ensure it's doing the right thing. I think especially because this is otherwise unused by anyone today, it's valuable to have this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code is the same validateUpdateSize()
call that the SotW onConfigUpdate has, followed by calling the SotW onConfigUpdate on the 0th element of the array. It's not literally onConfigUpdate(x, y) { onConfigUpdate(x) }
, but it's close.
@@ -115,6 +115,92 @@ TEST_F(SdsApiTest, DynamicTlsCertificateUpdateSuccess) { | |||
handle->remove(); | |||
} | |||
|
|||
class PartialMockSds : public SdsApi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
/retest |
🔨 rebuilding |
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patience on this one.
* master: test: Add coverage for IsolatedStoreImpl::find* (envoyproxy#7043) server: add ProcessContext (envoyproxy#7018) config: Implement both versions of onConfigUpdate() everywhere (envoyproxy#6879) gzip: add test for various compression strategy and level (envoyproxy#7055) Fix typo in comment for rds.RouteConfiguration.validate_clusters (envoyproxy#7056) mysql_filter: add handling for partial messages (envoyproxy#6885) migrate from v2alpha to v2 (envoyproxy#7044) tests: fix tsan test flake (envoyproxy#7052) upstream: fix HostUtility::healthFlagsToString (envoyproxy#7051) tech debt: eliminate absl::make_unique (envoyproxy#7034) router: add a route name field in route.Route list (envoyproxy#6776) ext_authz: configurable HTTP status code for network errors. (envoyproxy#6669) stats: remove const-cast for symbol-table in edcs_filter_test.cc (envoyproxy#7045) build: bump libevent to 3b1864b. (envoyproxy#7012) stats: improve test-coverage for a few stats-related functions. (envoyproxy#7038) docs: fix csrf filter source origin note (envoyproxy#7041) Fix common typo: grcp -> grpc (envoyproxy#7040) snapshot (envoyproxy#7036) Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Adding support for the new delta-style onConfigUpdate everywhere. Going by the comments, we had planned to have just this onConfigUpdate, rather than both it and the state-of-the-world style one. However, when I investigated unifying them it didn't make sense; I think both are needed. Part of #4991.
Risk Level: low
Testing: added tests for the delta style onConfigUpdate in CDS and EDS