Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation proposal for zone aware routing #4458

Merged

Conversation

@ElvinEfendi
Copy link
Member

commented Aug 16, 2019

What this PR does / why we need it:
Implementation proposal for zone aware routing.

Continuation of #4455.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

@codecov-io

This comment has been minimized.

Copy link

commented Aug 16, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@f0383c6). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #4458   +/-   ##
=========================================
  Coverage          ?   58.89%           
=========================================
  Files             ?       87           
  Lines             ?     6669           
  Branches          ?        0           
=========================================
  Hits              ?     3928           
  Misses            ?     2316           
  Partials          ?      425

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0383c6...a8a68dd. Read the comment docs.

@aledbf aledbf added this to In Progress in 0.26.0 Aug 16, 2019
@aledbf

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

/lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

**How do we make sure we do our best to choose zone-local endpoint?**
This will be done on Lua side. For every backend we will initialize two balancer instances: (1) with all endpoints
(2) with all endpoints corresponding to current zone for the backend. Then given the request once we choose what backend
needs to serve the request, we will first try to use zonal balancer for that backend. If zonal balancer does not exist (i.e there's no zonal endpoint)

This comment has been minimized.

Copy link
@ElvinEfendi

ElvinEfendi Aug 16, 2019

Author Member

Below are my raw thoughts, will edit and simplify once I digest it myself

This logic inherently assumes that endpoints are distributed across zones in proportion to ingress-nginx pods distribution and requests are evenly distributed across ingress-nginx pods. While the second part of the assumption is mostly the case in practice it might be very difficult to ensure the first part.

What we can do instead we can have ingress-nginx to dynamically distribute traffic based on endpoints distribution across zones. Given N requests, distributed evenly among M ingress-nginx pods then each pod gets N / M requests. Let's say there are 3 zones. Since we can not guarantee ingress-nginx pods are scheduled evenly across zones, let's say zone i has z_i*M/3 ingress-nginx pods. That means it will get (N / M) * (z_i * M / 3) requests. While under perfect distribution it would be (N / M) * (M / 3). Reducing the formulas we get N*z_i/3 and N/3. That means zone i gets N * (z_i - 1) / 3 more requests. Now if all the endpoints would be evenly distributed across zones we would do strict zone-local routing if N * (z_i - 1) / 3 < 0. Otherwise we would route to other zones with the probability of (N * (z_i - 1) / 3) / (N / 3). That way we guarantee that requests are load balanced evenly across endpoints across zones. But if like ingress-nginx pods, we can not ensure endpoints to be evenly distributed across zones then we have to apply a similar calculation. Now assume there are E endpoints. Evenly distribution would mean each zone has E / 3 endpoints. But let's assume zone i has e_i * E / 3 endpoints. Now similarly we can say zone i will have e_i * E / 3 - E / 3 more endpoints. Which is E * (e_i - 1) / 3. Now if

N * (z_i - 1) / 3 <= 0 and E * (e_i - 1) / 3 >= 0 that means there are enough endpoints to handle zone-local traffic

N * (z_i - 1) / 3 <= 0 and E * (e_i - 1) / 3 < 0 then if z_i - e_i < 0 with probability of z_i / e_i proxy to other zones otherwise keep requests local otherwise

N * (z_i - 1) / 3 > 0 and E * (e_i - 1) / 3 <= 0 that means now enough zone local endpoints, we need to rout to other zones

N * (z_i - 1) / 3 > 0 and E * (e_i - 1) / 3 > 0 then if z_i - e_i < 0 then keep requests local otherwise with probability of z_i / e_i proxy to other zones

This comment has been minimized.

Copy link
@aledbf

aledbf Aug 16, 2019

Member

Since we can not guarantee ingress-nginx pods are scheduled evenly across zones

Yes, you can do that using nodeAffinity to distribute the ingress controller pods between zones https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Now if all the endpoints would be evenly distributed across zones we would do strict zone-local routing

Same thing for the apps. Without nodeAffinity there is no way to ensure the distribution.

This comment has been minimized.

Copy link
@ElvinEfendi

ElvinEfendi Aug 16, 2019

Author Member

But does node affinity ensure distribution though? Does it not just do best effort?

This comment has been minimized.

Copy link
@aledbf

aledbf Aug 16, 2019

Member

You can force it with requiredDuringSchedulingIgnoredDuringExecution https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity

This comment has been minimized.

Copy link
@ElvinEfendi

ElvinEfendi Aug 16, 2019

Author Member

I think we can come up with a heuristic as I was trying above in my comment that let's us to dynamically detect the proportion of pods distribution across zones and based on that decide whether request should be routed locally or to an endpoint in the other zone. It boils down to following logic

  • if ingress-nginx pod that received request is in the zone that have less ingress-nginx pods in total

    • if there's more service endpoints in this zone then always do zone-local proxying because you know there are more than enough endpoints in this zone
    • otherwise if there are less endpoints in this zone then we have to calculate how much less endpoints there are in this zone in proportion to how much less traffic we get in this zone. If we still get more traffic in proportion then we route out excess traffic to other zone otherwise do zone-local proxying
  • if ingress-nginx pod that received request is in the zone that have more ingress-nginx pods in total

    • if there's more service endpoints in this zone then we have to calculate how much more endpoints there are in this zone in proportion to how much more traffic we get in this zone. If we get even more traffic then we route out excess traffic to other zone otherwise do zone-local proxying
    • if there's less service endpoints in this zone then we again calculate the proportion and based on that with some probability we route out to other zones otherwise keep it zone-local

This comment has been minimized.

Copy link
@ElvinEfendi

ElvinEfendi Aug 16, 2019

Author Member

You can force it with requiredDuringSchedulingIgnoredDuringExecution https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity

I wonder how much overhead this adds on scheduler when you have i.e over 500 replicas

This comment has been minimized.

Copy link
@ElvinEfendi

ElvinEfendi Aug 16, 2019

Author Member

Envoy also has zone aware routing feature. However what I'm proposing in this comment is significantly different than the way Envoy implements it. Envoy spills over traffic to other zones only when there are capacity issue of some sort, in other words reactively.
But the above heuristic ensures that we split traffic evenly across endpoints even if ingress-nginx and app pods are not evenly distribution across zones.

I'd like to also highlight that this is not addressing zonal failures.

This comment has been minimized.

Copy link
@ElvinEfendi
@k8s-ci-robot k8s-ci-robot merged commit d765faa into kubernetes:master Aug 16, 2019
13 checks passed
13 checks passed
cla/linuxfoundation ElvinEfendi authorized
Details
pull-ingress-nginx-boilerplate Job succeeded.
Details
pull-ingress-nginx-codegen Job succeeded.
Details
pull-ingress-nginx-e2e-1-12 Job succeeded.
Details
pull-ingress-nginx-e2e-1-13 Job succeeded.
Details
pull-ingress-nginx-e2e-1-14 Job succeeded.
Details
pull-ingress-nginx-e2e-1-15 Job succeeded.
Details
pull-ingress-nginx-gofmt Job succeeded.
Details
pull-ingress-nginx-golint Job succeeded.
Details
pull-ingress-nginx-lualint Job succeeded.
Details
pull-ingress-nginx-test Job succeeded.
Details
pull-ingress-nginx-test-lua Job succeeded.
Details
tide In merge pool.
Details
@ElvinEfendi ElvinEfendi deleted the ElvinEfendi:zone-aware-routing-kep-1 branch Aug 16, 2019
koushki added a commit to koushki/ingress-nginx that referenced this pull request Aug 16, 2019
…g-kep-1

implementation proposal for zone aware routing
@aledbf aledbf moved this from In Progress to done in 0.26.0 Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.