-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/{grpc,contour}: EDS always returns a value #1110
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Dave Cheney <dave@cheney.net>
Signed-off-by: Dave Cheney <dave@cheney.net>
There is no valid response for an unknown secret so we must return nothing in that case. Signed-off-by: Dave Cheney <dave@cheney.net>
Unlike SecretCache, if there is no route registered we return a blank RouteConfiguration entry for the name provided to Query. This is safe to do because RouteConfigurations have a simple blank value, just their name. Signed-off-by: Dave Cheney <dave@cheney.net>
Similar to SecretsCache there is no valid blank valid for an unknown listener. This is because each listener has a required address field. Making up a value there could cause Envoy to open that port, or worse, conflict with an already open port. Because there is no safe valid value, when querying for an unknown listener, return nothing. Signed-off-by: Dave Cheney <dave@cheney.net>
As with Listener and Secret, there is no valid blank value for a Cluster that can be supplied knowing just its name. In this case, if something is not registered with the cluster, Query will return nothing rather than a blank value like it does for RouteCache.Query. Signed-off-by: Dave Cheney <dave@cheney.net>
Like RouteCache, but unlike Secret, Cluster, or Listener Caches, EndpointTranslator.Query will return on answer per name supplied. Where an endpoint set is not registered, we return a blank ClusterLoadAssignment value using the query name supplied. Signed-off-by: Dave Cheney <dave@cheney.net>
stevesloka
approved these changes
May 21, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Fantastic work @davecheney! 🎉
Thanks for the prompt review. |
This was referenced May 21, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1091
Updates #420
Updates #439
Problems with Contour not returning a value to Envoy when asked specifically have occurred several times in the past. When they have occurred, because I didn't understand the underlying issue, they were fixed in a piecemeal fashion. This PR addresses the root cause of the issue and hopefully we'll never have to talk about this again.
What caused the issue?
Envoy can ask Contour two different questions; the first is "please tell me the contents of the xDS table", the second is "please tell me the contents of these specific named xDS entries". The former is performed when a xDS Discovery Request supplies no
resource_name
hints. The latter usesresource_name
hints to say "I want to know about these specific resources".Naively one might think that the second case is a subset of the first, send me what you know about these specific resources is the union of the contents of the xDS cache and the names supplied. The first case is even simpler, it's the union of the cache with itself. However it's not so simple.
The first question, "tell me everything you have it the cache" is simple; everything in the cache is turned into grpc and sent over the wire to Envoy. The non intersecting set of things which are not in the cache are not sent to Envoy because that makes NO SENSE. However in the second case, for some resource types, Envoy will ask for a resource by name and will not be happy if the resource is not provided. This was the cause of #439, and #1091.
In the latter case a Service had been scaled to zero. The service was present in the api server so a valid ingress document pointed to it. Thus Contour created a Cluster record and sent that to Envoy, which in turn asked EDS for the members of the cluster we just told it existed. However, because the service was scaled to zero, the endpoint object was blank, there were no valid endpoints, so nothing was entered into the EDS cache on Contour's side. Here's where things get really upsetting.
xDS does not provide any way for a server to reject a request. Well, you can close the connection, but that's not going to help, Envoy will just reconnect and ask again, and all the while, it will be blocked on startup. So, when Envoy asks for an EDS record we must provide it, even if we don't have any information about it. This is actually a more general problem than EDS, but EDS is more susceptible to this problem because of the background described in #1091 (comment).
How does this PR fix the issue?
This PR does two things. First, it recognises that the two questions Envoy are different--the DiscoveryRequest with resource_name hint is not a subset of tell me everything in the cache--this this PR removes the
Values(...string)
method ongrpc.Resource
and replaces it withContents()
andQuery([]string)
. The former clearly returns the contents of the cache, the latter takes a set of resources to query.Secondly, on a per resource basis, the
Query
implementation will return blank results for records it does not hold in its cache when safe to do so. It turns out that for CDS, LDS, and SDS, there is no safe way to construct a blank record, so we cannot provide one to Envoy if it asks for one. For EDS and RDS, there is a way to construct a blank record given the query name provided, which we do, and this satisfies Envoy, unblocking the startup process.In addition to tests at the
internal/contour
andinternal/e2e
level I have also tested it in my cluster repeatedly scaling a service to zero, restarting Envoy, and validating that prior to this change Envoy would block on startup when a Service with zero active pods was referenced via an Ingress record. Post this change, Envoy starts immediately.