Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: Do not fallback when the control plane returns NoEndpoints #2880

Closed
olix0r opened this issue Jun 2, 2019 · 5 comments · Fixed by linkerd/linkerd2-proxy#268
Closed
Assignees
Labels

Comments

@olix0r
Copy link
Member

olix0r commented Jun 2, 2019

We must assume that the control plane is authoritative (in order to enforce identity, policy, etc). When we initially scoped the fallback logic, it seemed to make sense that the the proxy should use fallback routing when the control plane reports no endpoints. In practice, it seems better for the proxy to rely on its load balancer as long as the control plane does not return a IllegalArgument error.

@olix0r olix0r added area/proxy priority/P0 Release Blocker labels Jun 2, 2019
@olix0r olix0r added this to To do in 2.4 - Release via automation Jun 2, 2019
@hawkw hawkw self-assigned this Jun 2, 2019
@hawkw
Copy link
Member

hawkw commented Jun 2, 2019

The Destination API's NoEndpoints response message specifies an exists boolean field which is intended to indicate whether or not a client should fall back:

    // `no_endpoints{exists: false}` indicates that the service does not exist
    // and the client MAY try an alternate service discovery method (e.g. DNS).
    //
    // `no_endpoints(exists: true)` indicates that the service does exist and
    // the client MUST NOT fall back to an alternate service discovery method.
    NoEndpoints no_endpoints = 3;

Based on my understanding of this API, we should fall back on receipt of a no_endpoints{exists: false} message or an IllegalArgument error?

hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 2, 2019
When the Destination service returns a `NoEndpoints` response, a field
`exists` is set if the destination _does_ exist in service discovery
but has no endpoints. The API spec states that:

> `no_endpoints{exists: false}` indicates that the service does not exist
> and the client MAY try an alternate service discovery method (e.g. DNS).
>
> `no_endpoints(exists: true)` indicates that the service does exist and
> the client MUST NOT fall back to an alternate service discovery
> method.

When the DNS fallback behaviour was removed from the proxy in #259, this
field was overlooked, and the proxy was changed to fall back to original
destination routing _any_ time the control plane indicates that no
endpoints exist for a destination. This is incorrect, as the proxy
should always treat the destination service as authoritative.
Additionally, this means that requests to endpoints which are known to
not exist will construct an unnecessary client service, and eventually
fail with a 502 error when the upstream client connection  to the
non-existant endpoint ultimately fails.

This branch changes the `control::destination::resolution::Daemon`
future to only send `Update::NoEndpoints` (which indicates that the load
balancer should fall back) to the load balancer when the Destination
service response has `exists: false`, or on `InvalidArgument` errors,
and _not_ when a `no_endpoints{exists: true}` response is recieved.
These requests will now fail fast with a 503 error rather than
attempting an ultimately futile fallback request.

Previously, one of the discovery tests expected the incorrect behaviour.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@olix0r
Copy link
Member Author

olix0r commented Jun 2, 2019

I understand that's in the API, but I'm not convinced that policy makes sense going forward... If you resolve a service and then it is later deleted, it seems appropriate for the proxy to eagerly fail these requests. Is there a use case when we would want to forward requests for non-existent services?

@hawkw
Copy link
Member

hawkw commented Jun 2, 2019

I believe that the intention behind {exists: false} was to indicate that the requested destination isn't a Kubernetes service that the control plane is aware of, so it might still exist (such as statefulsets). My understanding is that this is also what the InvalidArgument response in #2664 is supposed to indicate.

It seems possible that the exists field was was overlooked when we decided to use InvalidArgument for this purpose, and might now have two different ways of saying the same thing. Possibly @adleong has some insight?

@olix0r
Copy link
Member Author

olix0r commented Jun 2, 2019

@hawkw not exactly. IllegalArgument indicates that the name will never be a valid service. exists=false indicates that the service doesn't currently exist, but may exist later.

@hawkw
Copy link
Member

hawkw commented Jun 2, 2019

@olix0r ah, yeah, that's an important distinction, thanks for pointing that out.

@hawkw hawkw moved this from To do to In progress in 2.4 - Release Jun 5, 2019
2.4 - Release automation moved this from In progress to Done Jun 12, 2019
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue Jun 12, 2019
When the DNS fallback behavior was removed from the proxy in #259, the
proxy was changed to fall back to original destination routing _any_
time the control plane indicates that no endpoints exist for a
destination. This is incorrect, as the proxy should always treat the
destination service as authoritative. Additionally, this means that
requests to endpoints which are known to not exist will construct an
unnecessary client service, and eventually fail with a 502 error when
the upstream client connection to the non-existent endpoint ultimately
fails. Instead, the proxy should only fall back when the destination is
outside the search suffixes or the Destination service returns an
`InvalidArgument` response.

This is a much larger change than #263. In particular, the fallback
behavior has been moved from when the the actual client service is 
called to when the client service is _constructed_, as we do not expect
to recieve an `InvalidArgument` response on a Destination query that
has previously recieved updates. 

This was implemented by changing the `Resolve` trait to return a
`Future` whose `Item` type is a `Resolution`, rather than returning a
`Resolution`. When the Destination query returns `InvalidArgument`, the
future will fail. The `proxy::http::fallback` middleware has been
changed so that rather than making both the primary and fallback service
and falling back on a per-request basis, it tries to make the primary
service, and falls back if the primary `MakeService` future fails. 

Although this is a large change, I think the resulting code is simpler
and more elegant than the previous approach.

Previously, one of the discovery tests expected the incorrect behavior.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880
Closes #263 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
panthervis added a commit to panthervis/linkerd2-proxy that referenced this issue Oct 8, 2021
When the DNS fallback behavior was removed from the proxy in #259, the
proxy was changed to fall back to original destination routing _any_
time the control plane indicates that no endpoints exist for a
destination. This is incorrect, as the proxy should always treat the
destination service as authoritative. Additionally, this means that
requests to endpoints which are known to not exist will construct an
unnecessary client service, and eventually fail with a 502 error when
the upstream client connection to the non-existent endpoint ultimately
fails. Instead, the proxy should only fall back when the destination is
outside the search suffixes or the Destination service returns an
`InvalidArgument` response.

This is a much larger change than #263. In particular, the fallback
behavior has been moved from when the the actual client service is 
called to when the client service is _constructed_, as we do not expect
to recieve an `InvalidArgument` response on a Destination query that
has previously recieved updates. 

This was implemented by changing the `Resolve` trait to return a
`Future` whose `Item` type is a `Resolution`, rather than returning a
`Resolution`. When the Destination query returns `InvalidArgument`, the
future will fail. The `proxy::http::fallback` middleware has been
changed so that rather than making both the primary and fallback service
and falling back on a per-request basis, it tries to make the primary
service, and falls back if the primary `MakeService` future fails. 

Although this is a large change, I think the resulting code is simpler
and more elegant than the previous approach.

Previously, one of the discovery tests expected the incorrect behavior.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880
Closes #263 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
2.4 - Release
  
Done
2 participants