New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live Nginx configuration update without reloading #2174

Merged
merged 1 commit into from Mar 18, 2018

Conversation

Projects
6 participants
@ElvinEfendi
Member

ElvinEfendi commented Mar 6, 2018

Requires #2172 to be merged first and published the new Nginx image.

What this PR does / why we need it:
The PR implements live configuration update for Nginx upstreams/k8s endpoints. That means every time you deploy your app Nginx won't be reloaded. This is very important when you are front-ending many applications behind a single deployment of ingress-nginx and those app gets deployed frequently and have significant traffic.

I've mainly focused on laying out proper foundation for further development of live Nginx (re)configuration feature in ingress-nginx. The PR implements only Round Robin algorithm but can easily be extended to support more. I have another PR at #2167 to configure load balancing algorithm per ingress, that configuration will be used by Lua code as well to decide which LB algorithm to use for a given app(namespace/service).

The feature can be enabled by using --enable-dynamic-configuration command line flag. If you wanna see more Lua logs you can also set error-log-level to info to see what it does on endpoints changes.

In order to avoid JSON decoding in the request path, for every Nginx worker I spin up a periodic function that reads raw configuration from a shared Lua dictionary and decodes it into local cache per worker. Then balancer uses its local cache to get the list of endpoints and other backend configuration for given namespace/service.

If you want to test you can use index.docker.io/elvinefendi/nginx-ingress-controller:0.0.1

Next steps: Setup a testing framework for Lua code, add support for EWMA and more LB algorithms, add support for live update of secrets by using certificate_by_lua

Which issue this PR fixes: n/a

Special notes for your reviewer:
I'm aware that there's another similar PR being worked on at #2152. I was told about that only after I started working on this feature at Shopify#16 (comment). There are significant differences in the implementation between the two PRs and I thought there could be value in creating this PR as well.

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 6, 2018

/assign @aledbf

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 6, 2018

@ElvinEfendi @valeriano-manassero can we get consensus about which version we are going to push?
We cannot have two version of the same feature.

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 7, 2018

@aledbf I’d love to hear what @valeriano-manassero thinks about going forward with this PR and collaboratively extending it later. I am going to work on this feature in coming weeks consistently and if you want I can put together a small RFC as well to discuss the future of this work in a more organized way.

I have already commented on other PR and raised my concerns. In my humble opinion the controller change in this PR is more concise and cleaner and the Lua implementation is more performant(no ‘json’ decoding in request path), correct(I use lua-resty-lock to avoid possible race condition between different Nginx workers accessing load balancing state in shared dictionary) and easier to extend(configuration.lua has its clear responsibility of accepting raw config piece per component and storing it in general shared dictionary which will be consumed by the respective middleware as needed. For example as of now ‘balancer.lua’ needs ‘backends’ only and it is responsible what needs to be done with that data alone. In future we can POST ‘servers’ config as well that will be consumed in a similar way by i.e ‘certificate.lua’ that serves certs dynamically etc).

We are also already running this code in our test cluster without any issue and also starting to migrate some less critical services to it(I will update with more data later).

Given the above compelling reasons I suggest we move forward with this PR.

Please let me know what do you think.

@valeriano-manassero

This comment has been minimized.

valeriano-manassero commented Mar 7, 2018

@aledbf @ElvinEfendi Sorry for the delay, I had a couple of very busy days.
I agree with you this PR is cleaner so it's ok for me to go on.
I will rebase further implementations I need on this PR when you'll merge in master.
Maybe a it's good if you want to discuss with me what I would like to implement for the future.
Thanks.

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Mar 7, 2018

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 7, 2018

The CI should be fixed once #2172 gets merged and new Nginx image gets published.

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 7, 2018

Thanks @valeriano-manassero! I would be happy to discuss that with you.

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 9, 2018

@aledbf could you push the new version of Nginx updated at #2172? Once I have that I'll fix the CI here.

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 9, 2018

@ElvinEfendi today I will publish the new nginx image

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 9, 2018

@ElvinEfendi please squash the commits (no more than three or four)

@ElvinEfendi ElvinEfendi force-pushed the Shopify:dynamic-upstreams branch from 8b550d0 to efb3f13 Mar 9, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Mar 9, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@41cefeb). Click here to learn what that means.
The diff coverage is 56.09%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2174   +/-   ##
=========================================
  Coverage          ?   36.85%           
=========================================
  Files             ?       70           
  Lines             ?     4963           
  Branches          ?        0           
=========================================
  Hits              ?     1829           
  Misses            ?     2855           
  Partials          ?      279
Impacted Files Coverage Δ
internal/ingress/controller/config/config.go 98% <ø> (ø)
internal/ingress/controller/controller.go 0% <0%> (ø)
cmd/nginx/flags.go 83.94% <100%> (ø)
internal/ingress/controller/template/template.go 63.94% <100%> (ø)
internal/ingress/controller/nginx.go 5.19% <15.38%> (ø)
internal/file/bindata.go 51.8% <84.61%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41cefeb...af6ae14. Read the comment docs.

@ElvinEfendi ElvinEfendi force-pushed the Shopify:dynamic-upstreams branch from efb3f13 to b0e5e5a Mar 9, 2018

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 9, 2018

@aledbf looks like you've published the new Nginx image already, thanks!

CI is 🍏 now. Please let me know if you think this PR requires some more changes.

@ElvinEfendi ElvinEfendi force-pushed the Shopify:dynamic-upstreams branch from b0e5e5a to 1d4f09f Mar 12, 2018

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 12, 2018

@aledbf is there anything blocking this PR from merging?

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 13, 2018

@ElvinEfendi only testing. If I don't find any issues I will merge this in two days.
Apologies but I've been really busy with other stuff.

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 13, 2018

@aledbf thanks for the update!

if isSticky(host, location, backend.SessionAffinity.CookieSessionAffinity.Locations) {
upstreamName = fmt.Sprintf("sticky-%v", upstreamName)
}
if !dynamicConfigurationEnabled {

This comment has been minimized.

@ElvinEfendi

ElvinEfendi Mar 14, 2018

Member

This is currently buggy, it will ignore https proto when dynamic configuration is enabled. I have fixed it at https://github.com/Shopify/ingress/pull/29/files. @aledbf let me know if you want me include the fix(and corresponding regression test) in this PR or in a separate PR after this gets merged.

related to https://github.com/kubernetes/ingress-nginx/pull/2174/files#diff-f16f9102ce1211ffe529bbb935a8d46eR40

This comment has been minimized.

@aledbf

aledbf Mar 15, 2018

Member

Please include the fix in this PR

newBackends = append(newBackends, pcfg.Backends[i].DeepCopy())
}
n.runningConfig.Backends = []*ingress.Backend{}

This comment has been minimized.

@aledbf

aledbf Mar 15, 2018

Member

Please do not change the state of n.runningConfig. Please make a copy and use that to make the comparison.

This comment has been minimized.

@ElvinEfendi

ElvinEfendi Mar 15, 2018

Member

@aledbf notice that I'm changing only n.runningConfig.Backends and I have the copy of its original value at https://github.com/kubernetes/ingress-nginx/pull/2174/files#diff-cde3fffe2425ad7efaa8add1d05ae2c0R756 where I restore at https://github.com/kubernetes/ingress-nginx/pull/2174/files#diff-cde3fffe2425ad7efaa8add1d05ae2c0R772.

The idea is to make sure that the change between new and running config is only Backends change. Please let me know if you think there's a better way of doing this.

This comment has been minimized.

@aledbf

aledbf Mar 15, 2018

Member

You should not change n.runningConfig. I prefer a full copy that we can be disposed of later.

This comment has been minimized.

@ElvinEfendi

ElvinEfendi Mar 15, 2018

Member

Is there already a utility function to deep copy runningConfig?

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 15, 2018

@aledbf I included bug fix(c778654) and corresponding test cases(3879d72).

I've also refactored(2a9f13d) IsDynamicallyConfigurable implementation to not change n.runningConfig and improved the test for it(534a3c6).

I'll squash my commits once you finish reviewing.

@aledbf aledbf moved this from done to In Progress in nginx 0.12.0 Mar 16, 2018

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 16, 2018

@ElvinEfendi please rebase and squash for the last review and merge

@ElvinEfendi ElvinEfendi force-pushed the Shopify:dynamic-upstreams branch 2 times, most recently from 7d1a560 to d855a5e Mar 16, 2018

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 16, 2018

@ElvinEfendi please rebase and squash for the last review and merge

@aledbf done!

end
function _M.call()
if ngx.var.request_method ~= "POST" or ngx.var.request_uri ~= "/configuration/backends" then

This comment has been minimized.

@oilbeater

oilbeater Mar 17, 2018

Contributor

It's better to have a GET api to retrieve all configurations. As all configurations now store in memory not files, when something gets wrong, it will be difficult to debug if configuration is correct

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

@ElvinEfendi I just finished testing the PR and I just have two comments:

  1. If the dynamic configuration is enabled you need to skip the update of upstreams in the file nginx.conf (template)
    Add the flag --v=2 to see what happens
127.0.0.1 - [127.0.0.1] - - [18/Mar/2018:02:18:53 +0000] "POST /configuration/backends HTTP/1.1" 201 5 "-" "Go-http-client/1.1" 2890 0.000 [-] - - - -
I0318 02:18:53.682267       5 controller.go:175] dynamic reconfiguration succeeded, skipping reload
I0318 02:18:53.682301       5 util.go:64] system fs.file-max=180573
I0318 02:18:53.682307       5 nginx.go:560] maximum number of open file descriptors : 89262
I0318 02:18:53.714604       5 nginx.go:658] NGINX configuration diff
I0318 02:18:53.714670       5 nginx.go:659] --- /etc/nginx/nginx.conf	2018-03-18 02:17:17.168970949 +0000
+++ /tmp/new-nginx-cfg589937464	2018-03-18 02:18:53.712665931 +0000
@@ -221,6 +221,7 @@
 
         server 172.17.0.5:8080 max_fails=0 fail_timeout=0;
         server 172.17.0.3:8080 max_fails=0 fail_timeout=0;
+        server 172.17.0.6:8080 max_fails=0 fail_timeout=0;
 
     }
  1. When you send the update to lua please include a log line like glog.V(2).Infof with the body you are sending. Without this is impossible to know what's the configuration in lua
    Even better is you can add an additional route to get the content from lua.
@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 18, 2018

Thanks @aledbf!

If the dynamic configuration is enabled you need to skip the update of upstreams in the file nginx.conf (template)

I implemented it that way intentionally thinking that it would be more resilient to failures. But now that I think more about it is probably better to completely skip OnUpdate function call in syncIngress when dynamic reconfiguration is enabled. When we switch back to default behaviour the up-to-date configuration will be generated anyway.

When you send the update to lua please include a log line like glog.V(2).Infof with the body you are sending. Without this is impossible to know what's the configuration in lua
Even better is you can add an additional route to get the content from lua.

👍 I'll do both.

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

I implemented it that way intentionally thinking that it would be more resilient to failures. But now that I think more about it is probably better to completely skip OnUpdate function call in syncIngress when dynamic reconfiguration is enabled

Please keep in mind you only need to comment the upstream generation. We still need the reload of nginx when an Ingress is added/removed or a new annotation is configured

To be clear, https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/template/nginx.tmpl#L311-L338 should be inside an if section asking for dynamic configuration being disabled (so this is not generated)

@ElvinEfendi

This comment has been minimized.

Member

ElvinEfendi commented Mar 18, 2018

@aledbf addressed all your comments in the last 5 commits.

I have also pushed 66137ba to allow requests only from localhost to the Lua configuration endpoint.

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

If someone else can help to test this please use quay.io/aledbf/nginx-ingress-controller:0.341 with the flag --enable-dynamic-configuration=true in the deployment

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

@ElvinEfendi this is working just fine. Let's wait until tomorrow and I will merge this.

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

@ElvinEfendi after this PR we need to clean up the JSON is being sent to LUA because there's a lot of information we are not going to use (and I don't want to use more memory than necessary in nginx)

@ElvinEfendi ElvinEfendi force-pushed the Shopify:dynamic-upstreams branch from ab02f3e to af6ae14 Mar 18, 2018

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

/approve

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Mar 18, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aledbf

This comment has been minimized.

Member

aledbf commented Mar 18, 2018

@ElvinEfendi thanks!

@aledbf aledbf merged commit c90a4e8 into kubernetes:master Mar 18, 2018

2 of 3 checks passed

tide Not mergeable. Needs lgtm label.
Details
cla/linuxfoundation ElvinEfendi authorized
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ElvinEfendi ElvinEfendi deleted the Shopify:dynamic-upstreams branch Mar 18, 2018

@aledbf aledbf moved this from In Progress to done in nginx 0.12.0 Mar 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment