Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route requests to original dsts when no endpoints exist in the lb #240

Closed
wants to merge 45 commits into from

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Apr 30, 2019

Currently, when no endpoints exist in the load balancer for a
destination, we fail the request. This is because we expect endpoints to
be discovered by both destination service queries and DNS lookups, so
if there are no endpoints for a destination, it is assumed to not exist.

In linkerd/linkerd2#2661, we intend to remove the DNS lookup from the
proxy and instead fall back to routing requests for which no endpoints
exist in the destination service to their SO_ORIGINAL_DST IP address.
This means that the current approach of failing requests when the load
balancer has no endpoints will no longer work.

This branch introduces a new balance::fallback::layer, which marries a
load balancer with a router. When the load balancer has no endpoints, we
fall back to the router, rather than failing the request at the
balancer. We can use this router to recognize endpoints based on the
request's original destination address.

A new trait, HasEndpointStatus, is added in order to allow the
discovery lookup to communicate the "no endpoints" state to the
balancer. In addition, we add a new Update::NoEndpoints variant to
proxy::resolve::Update, so that when the control plane sends a no
endpoints update, we switch from the balancer to the fallback router
immediately, rather than waiting for all the endpoints to be
individually removed.

A future commit will remove the DNS lookups from the discovery module.

Signed-off-by: Eliza Weisman eliza@buoyant.io

@hawkw hawkw self-assigned this Apr 30, 2019
@hawkw
Copy link
Contributor Author

hawkw commented Apr 30, 2019

I've marked this PR as a draft as I'd still like to add integration tests for this change if possible. However, I thought @olix0r and @kleimkuhler would like to have a look at it sooner rather than later.

src/app/main.rs Outdated
@@ -663,7 +692,7 @@ where
// implementations can use the route-specific configuration.
let dst_route_stack = svc::builder()
.layer(buffer::layer(max_in_flight))
.layer(pending::layer())
.layer(pending::layer::<_, dst::Route, _>())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type annotation is necessary as a result of changing the pending::layer function to bound the target type.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added 3 commits May 1, 2019 14:46
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added 11 commits May 2, 2019 15:21
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
self,
max_in_flight: usize,
recognize: Rec,
) -> fallback::Layer<Rec, Self, A>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking out loud -- I wonder if this could just take another layer type, so the router/buffering/etc logic need not be fixed in this file (and could potentially support arbitrary other layers on the fallback route)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

impl ::std::error::Error for NoEndpoints {}

pub mod fallback {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this module doubles the size of this file... can it be split into its own file? maybe balance needs to become a directory...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I wasn't sure --- will split it out!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
if let Some(ref mut router) = self.router {
return svc::Service::call(router, req);
} else {
self.create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seems problematic, as we'll loop into call without having invoked poll_ready. In fact, this seems unreachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention was to guard against a situation where the no endpoints is received between poll_ready and call. I'm not sure if that's actually a concern. but i think you're right that this is incorrect --- the state transition should only ever occur in poll_ready.

in the current situation I don't think this is a problem since routers are always ready? but if that changes this would be a problem

hawkw added 2 commits May 8, 2019 10:26
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw closed this May 9, 2019
hawkw added a commit that referenced this pull request May 15, 2019
Currently, when no endpoints exist in the load balancer for a
destination, we fail the request. This is because we expect endpoints to
be discovered by both destination service queries _and_ DNS lookups, so
if there are no endpoints for a destination, it is assumed to not exist.

In linkerd/linkerd2#2661, we intend to remove the DNS lookup from the
proxy and instead fall back to routing requests for which no endpoints
exist in the destination service to their SO_ORIGINAL_DST IP address.
This means that the current approach of failing requests when the load
balancer has no endpoints will no longer work.

This branch introduces a generic `fallback` layer, which composes a
primary and secondary service builder into a new layer. The primary 
service can fail requests with an error type that propages the original
request, allowing the fallback middleware to call the fallback service
with the same request. Other errors returned by the primary service are
still propagated upstream.

In contrast to the approach used in #240, this fallback middleware is 
generic and not tied directly to a load balancer or a router, and can
be used for other purposes in the future. It relies on the router cache
eviction added in #247 to drain the router when it is not being used, 
rather than proactively destroying the router when endpoints are
available for the lb, and re-creating it when they exist again.

A new trait, `HasEndpointStatus`, is added in order to allow the
discovery lookup to communicate the "no endpoints" state to the
balancer. In addition, we add a new `Update::NoEndpoints` variant to
`proxy::resolve::Update`, so that when the control plane sends a no
endpoints update, we switch from the balancer to the no endpoints state
_immediately_, rather than waiting for all the endpoints to be
individually removed. When the balancer has no endpoints, it fails all
requests with a fallback error, so that the fallback middleware

A subsequent PR (#248) will remove the DNS lookups from the discovery 
module.

Closes #240.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@olix0r olix0r deleted the liza/balance-rt-good branch August 17, 2019 01:35
sprt pushed a commit to sprt/linkerd2-proxy that referenced this pull request Aug 30, 2019
commit b27dfb2
Author: Takanori Ishibashi <takanori.1112@gmail.com>
Date:   Wed May 15 05:58:42 2019 +0900

    updaes->updates (linkerd#250)

    Signed-off-by: Takanori Ishibashi <takanori.1112@gmail.com>

commit 16441c2
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Tue May 14 14:40:03 2019 -0700

     Pass router::Config directly to router::Layer (linkerd#253)

    Currently, router `layer`s are constructed with a single argument, a
    type implementing `Recognize`. Then, the entire router stack is built
    with a `router::Config`. However, in linkerd#248, it became necessary to
    provide the config up front when constructing the `router::layer`, as
    the layer is used in a fallback layer. Rather than providing a separate
    type for a preconfigured layer, @olix0r suggested we simply change all
    router layers to accept the `Config` when they're constructed (see
    linkerd#248 (comment)).

    This branch changes `router::Layer` to accept the config up front. The
    `router::Stack` types `make` function now requires no arguments, and the
    implementation of `Service` for `Stack` can be called with any `T` (as
    the target is now ignored).

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit b70c68d
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Wed May 15 13:14:04 2019 -0700

    Load balancers fall back to ORIG_DST when no endpoints exist (linkerd#248)

    Currently, when no endpoints exist in the load balancer for a
    destination, we fail the request. This is because we expect endpoints to
    be discovered by both destination service queries _and_ DNS lookups, so
    if there are no endpoints for a destination, it is assumed to not exist.

    In linkerd/linkerd2#2661, we intend to remove the DNS lookup from the
    proxy and instead fall back to routing requests for which no endpoints
    exist in the destination service to their SO_ORIGINAL_DST IP address.
    This means that the current approach of failing requests when the load
    balancer has no endpoints will no longer work.

    This branch introduces a generic `fallback` layer, which composes a
    primary and secondary service builder into a new layer. The primary
    service can fail requests with an error type that propages the original
    request, allowing the fallback middleware to call the fallback service
    with the same request. Other errors returned by the primary service are
    still propagated upstream.

    In contrast to the approach used in linkerd#240, this fallback middleware is
    generic and not tied directly to a load balancer or a router, and can
    be used for other purposes in the future. It relies on the router cache
    eviction added in linkerd#247 to drain the router when it is not being used,
    rather than proactively destroying the router when endpoints are
    available for the lb, and re-creating it when they exist again.

    A new trait, `HasEndpointStatus`, is added in order to allow the
    discovery lookup to communicate the "no endpoints" state to the
    balancer. In addition, we add a new `Update::NoEndpoints` variant to
    `proxy::resolve::Update`, so that when the control plane sends a no
    endpoints update, we switch from the balancer to the no endpoints state
    _immediately_, rather than waiting for all the endpoints to be
    individually removed. When the balancer has no endpoints, it fails all
    requests with a fallback error, so that the fallback middleware

    A subsequent PR (linkerd#248) will remove the DNS lookups from the discovery
    module.

    Closes linkerd#240.

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit 6525b06
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Wed May 15 23:35:09 2019 +0300

    Allow disabling tap by setting an env var (linkerd#252)

    This PR fixes linkerd/linkerd2#2811. Now if
    `LINKERD2_PROXY_TAP_DISABLED` is set, the tap is not served at all. The
    approach taken is that  the `ProxyParts` is changed so the
    `control_listener` is now an `Option` that will be None if tap is
    disabled as this control_listener seems to be exclusively used to serve
    the tap. Feel free to suggest a better approach.

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 91f32db
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Thu May 16 00:45:23 2019 +0300

    Assert that outbound TLS works before identity is certified (linkerd#251)

    This commit introduces TLS capabilities to the support server as well as
    tests to ensure that outbound TLS works even when there is no verified
    certificate for the proxy yet.

    Fixes linkerd/linkerd2#2599

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 45aadc6
Author: Sean McArthur <sean@buoyant.io>
Date:   Wed May 15 14:25:39 2019 -0700

    Update h2 to v0.1.19

    Includes a couple HPACK fixes

    Signed-off-by: Sean McArthur <sean@buoyant.io>

commit 3e0e00c
Author: Oliver Gould <ver@buoyant.io>
Date:   Thu May 16 08:11:06 2019 -0700

    Update mio to 0.6.17 (linkerd#257)

    To pick up tokio-rs/mio#939
hawkw added a commit that referenced this pull request Feb 11, 2022
This branch updates `tokio-util` to v0.7.0.

The dependabot-generated PR for this update doesn't work, because this
is a breaking API change. The only update that was necessary here was
changing the use of `tokio_util::sync::PollSender` in
`linkerd-proxy-discover` to use the updated method names.

Closes #240
olix0r pushed a commit that referenced this pull request Feb 11, 2022
This branch updates `tokio-util` to v0.7.0.

The dependabot-generated PR for this update doesn't work, because this
is a breaking API change. The only update that was necessary here was
changing the use of `tokio_util::sync::PollSender` in
`linkerd-proxy-discover` to use the updated method names.

Closes #240

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
olix0r pushed a commit that referenced this pull request Mar 30, 2022
This branch updates `tokio-util` to v0.7.0.

The dependabot-generated PR for this update doesn't work, because this
is a breaking API change. The only update that was necessary here was
changing the use of `tokio_util::sync::PollSender` in
`linkerd-proxy-discover` to use the updated method names.

Closes #240

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
(cherry picked from commit 7f17318)
Signed-off-by: Oliver Gould <ver@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants