Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Implement both versions of onConfigUpdate() everywhere #6879

Merged
merged 16 commits into from
May 23, 2019

Conversation

fredlas
Copy link
Contributor

@fredlas fredlas commented May 9, 2019

Adding support for the new delta-style onConfigUpdate everywhere. Going by the comments, we had planned to have just this onConfigUpdate, rather than both it and the state-of-the-world style one. However, when I investigated unifying them it didn't make sense; I think both are needed. Part of #4991.

Risk Level: low
Testing: added tests for the delta style onConfigUpdate in CDS and EDS

Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
@fredlas
Copy link
Contributor Author

fredlas commented May 9, 2019

/assign htuch

Signed-off-by: Fred Douglas <fredlas@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why having both onConfigUpdate methods is necessary long term? I thought it was OK to keep them during the migration, but we could have only a single one as end state.

Here's how I reckon it could work. You write generically, in the SOTW generic code, a stack tracking library. This translates all SOTW updates into a delta update for the purposes of onConfigUpdate. This comes down to filling in the list of resources to remove based on the knowledge of the resources present in the previous update and the current one. Would that work?

@fredlas
Copy link
Contributor Author

fredlas commented May 13, 2019

I think the issue is the name field in the DeltaDiscoveryResponse::Resource proto. DiscoveryResponse just has the plain Any blobs (the thing that would be the resource field of a Resource). For the APIs where name makes sense, they themselves can know to reach in and grab it - because at that point you can convert from an Any to e.g. Cluster - but that isn't something that you can do generically in DeltaSubscriptionImpl or wherever. (Since, I believe what you're describing would also need the concept of "name" guaranteed to be a part of all xDS resources. Some APIs also have deduplication logic that relies on the name).

I think the name field got added to the proto later on, after the comments about the double onConfigUpdate being a hack had gone in. So, I think the issue of names just hadn't occurred to anyone yet.

@htuch
Copy link
Member

htuch commented May 13, 2019

@fredlas good point; we used to (sort of) have a method that did this mapping, making use of the fact that we could downcast to a known proto and it had that business logic for extracting the name out of the known resource types. I think you cleaned this up (or somebody did).

Would there be value in a utility method that each individual xDS can call from its SOTW onConfigUpdate() to do the delta onConfigUpdate() call with a naming lambda?

@fredlas
Copy link
Contributor Author

fredlas commented May 14, 2019

That sounded like a good approach, so I started looking into it, but hit another issue: the question of what is being diffed against. The answer is "state retrieved from some part of the rest of Envoy," like cm_.clusters() for CDS or listener_manager_.listeners() for LDS. I guess you could also have that utility function take a list of existing names as another base argument.

But, this is all seeming a little messy. This PR as it is now will accomplish the same goals, just with ~10 lines of code reused once. Going the duplicate onConfigUpdate route will also not make it any harder to introduce this generic approach later. So, let's proceed this way so that aggregated delta can get done.

@htuch
Copy link
Member

htuch commented May 14, 2019

@fredlas I'll continue the review, but would be appreciated if you could track this improvement as an issue or a TODO.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, nice to see this all coming together.
/wait

}
Protobuf::RepeatedPtrField<ProtobufWkt::Any> unwrapped_resource;
*unwrapped_resource.Add() = resources[0].resource();
onConfigUpdate(unwrapped_resource, resources[0].version());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the way it works today is that we only get an RdsRouteConfigSubscription::onConfigUpdate update if the resource is present in the RDS response. If it's absent, no change is made to the configuration. Initially it's empty, and we warm listeners on first response. We may have an empty config though if the update failed.

In the delta world, we can add resources as they arrive. If we get a removal request, what do we do? Should we be resetting the configuration back to empty? The semantics are a bit different here to SOTW, since with SOTW, we are cool with missing config implying latching, here we have an explicit removal operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be to have a removal request reset the configuration, since RDS is one of the types that does have a "name" to it. If the server wanted to update, it should just send an updated version. If it's sending an update explicitly naming the resource as being gone, it makes sense to get rid of that resource. Although, it sounds like when the RDS client gets started, it already has a name in mind, and only that name is valid. So removing the config should be more like shutting down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to address the code changes for this in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I ended up logging an error and ignoring the remove request. It seems like it basically doesn't make sense for the server to even try a removal here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this becomes more interesting with on-demand, where you have the possibility the management server decides to remove a resource and have the client lazily fetch (and maybe recreate it on that path). Since we don't support that, I think it's fine to error out for now, but I'd like to make this a TODO and link to an on-demand tracking issue (do we have one?) so that folks know this is subject to change in terms of behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, yes. Left a TODO. Looks like there is #2500.

source/common/router/rds_impl.cc Show resolved Hide resolved
source/common/upstream/cds_api_impl.cc Show resolved Hide resolved
test/common/upstream/eds_test.cc Show resolved Hide resolved
source/server/lds_api.cc Show resolved Hide resolved
Signed-off-by: Fred Douglas <fredlas@google.com>

std::string config_json2 = R"EOF(
{
"api_type": "REST",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a REST RDS configuration figure into delta xDS testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't accept DELTA_GRPC. Maybe I should just delete this test.

test/common/secret/sds_api_test.cc Show resolved Hide resolved

EXPECT_EQ(0UL,
route_config_provider_manager_->dumpRouteConfigs()->dynamic_route_configs().size());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear to me which bits of this are testing generic route config provider logic and which are the delta xDS specifics. Is it possible to add more comments or reduce the route config provider testing that is already covered elsewhere?

Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

test/common/secret/sds_api_test.cc Outdated Show resolved Hide resolved
};

// Basic test of delta's passthrough call to the state-of-the-world variant, to
// increase coverage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more than just coverage, since we have some new code and want to ensure it's doing the right thing. I think especially because this is otherwise unused by anyone today, it's valuable to have this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code is the same validateUpdateSize() call that the SotW onConfigUpdate has, followed by calling the SotW onConfigUpdate on the 0th element of the array. It's not literally onConfigUpdate(x, y) { onConfigUpdate(x) }, but it's close.

@@ -115,6 +115,92 @@ TEST_F(SdsApiTest, DynamicTlsCertificateUpdateSuccess) {
handle->remove();
}

class PartialMockSds : public SdsApi {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

test/common/secret/sds_api_test.cc Show resolved Hide resolved
@fredlas
Copy link
Contributor Author

fredlas commented May 22, 2019

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: release (failed build)

🐱

Caused by: a #6879 (comment) was created by @fredlas.

see: more, trace.

Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patience on this one.

@htuch htuch merged commit 73c46c9 into envoyproxy:master May 23, 2019
mpuncel added a commit to mpuncel/envoy that referenced this pull request May 23, 2019
* master:
  test: Add coverage for IsolatedStoreImpl::find* (envoyproxy#7043)
  server: add ProcessContext (envoyproxy#7018)
  config: Implement both versions of onConfigUpdate() everywhere (envoyproxy#6879)
  gzip: add test for various compression strategy and level (envoyproxy#7055)
  Fix typo in comment for rds.RouteConfiguration.validate_clusters (envoyproxy#7056)
  mysql_filter: add handling for partial messages (envoyproxy#6885)
  migrate from v2alpha to v2 (envoyproxy#7044)
  tests: fix tsan test flake (envoyproxy#7052)
  upstream: fix HostUtility::healthFlagsToString (envoyproxy#7051)
  tech debt: eliminate absl::make_unique (envoyproxy#7034)
  router: add a route name field in route.Route list (envoyproxy#6776)
  ext_authz: configurable HTTP status code for network errors. (envoyproxy#6669)
  stats: remove const-cast for symbol-table in edcs_filter_test.cc (envoyproxy#7045)
  build: bump libevent to 3b1864b. (envoyproxy#7012)
  stats: improve test-coverage for a few stats-related functions. (envoyproxy#7038)
  docs: fix csrf filter source origin note (envoyproxy#7041)
  Fix common typo: grcp -> grpc (envoyproxy#7040)
  snapshot (envoyproxy#7036)

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants