Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for proxy injector reinvocation #6347

Closed
wants to merge 4 commits into from

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Jun 22, 2021

Fixes #3750 and partially #6267

As of k8s 1.15 mutating webhooks can be reinvoked whenever another mutating webhook running after the current one mutates the pod being persisted.
Enabling reinvocation for the injector will allow configuring the proxy with annotations generated by such other mutating webhooks.

The implementation consists on adding "remove" statements into the json patch returned by the injector, implemented mostly in the new file pkg/inject/pod_patch.go which now holds the podPatch, moved from pkg/inject/inject.go.

This also updates the "reinvocation" integration test introduced in #6309 to properly verify the reinvocation is happening.

And this also means the "existing proxy sidecar" check is no longer relevant, and has been limited to "existing 3rd party sidecar".

Possible followup: refactor the CLI inject and uninject commands to properly leverage the cleanup done by this patch.

Fixes #3750 and partially #6267

As of k8s 1.15 mutating webhooks can be reinvoked whenever another mutating webhook running after the current one mutates the pod being persisted.
Enabling reinvocation for the injector will allow configuring the proxy with annotations generated by such other mutating webhooks.

The implementation consists on adding "remove" statements into the json patch returned by the injector, implemented mostly in the new file `pkg/inject/pod_patch.go` which now holds the `podPatch`, moved from `pkg/inject/inject.go`.

This also updates the "reinvocation" integration test introduced in #6309 to properly verify the reinvocation is happening.

And this also means the "existing proxy sidecar" check is no longer relevant, and has been limited to "existing 3rd party sidecar".

Possible followup: refactor the CLI inject and uninject commands to properly leverage the cleanup done by this patch.
@alpeb alpeb requested a review from a team as a code owner June 22, 2021 20:46
Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is super cool! I think everything looks great as it is, although I haven't managed to test it yet since I couldn't find a readily available MWC to deploy alongside the injector and see how it all plays out in practice.

I've left a nitpicky comment about the removed log statement and a suggestion to contextualise the patch.addRemovals(..) call.

controller/webhook/server.go Show resolved Hide resolved
pkg/inject/inject.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dadjeibaah dadjeibaah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great. I haven't tested them out as well with another MWC but they look good. I left some super tiny nits.

pkg/healthcheck/sidecar.go Outdated Show resolved Hide resolved
pkg/inject/pod_patch.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@9b7a548). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #6347   +/-   ##
=======================================
  Coverage        ?   44.30%           
=======================================
  Files           ?      184           
  Lines           ?    17616           
  Branches        ?      265           
=======================================
  Hits            ?     7805           
  Misses          ?     9077           
  Partials        ?      734           
Flag Coverage Δ
golang 42.38% <0.00%> (?)
javascript 66.69% <0.00%> (?)
unittests 44.30% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b7a548...ca1e22a. Read the comment docs.

@alpeb
Copy link
Member Author

alpeb commented Jun 23, 2021

@mateiidavid @dadjeibaah Thanks for your prompt reviews 🙂

If you wanna give it a test with another mutating webhook, you can install kubemod by applying test/integration/reinvocation/testdata/kubemod.yaml. Then you can configure it by instantiating a ModRule instance. The one in test/integration/reinvocation/testdata/modrule.yaml for example sets it so the kubemod mutating webhook adds the config.linkerd.io/proxy-log-level: debug annotation to the pod template.

charts/patch/templates/patch.json Show resolved Hide resolved
@@ -29,6 +62,12 @@
},
{{- end }}
{{- range $label, $value := .Values.labels }}
{{- if $alreadyInjected }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the set of labels here during reinvocation is different from the set of a labels from a previous invocation? If so, we could have some previous labels which are not cleaned up here. Would it make sense to remove all labels with the linkerd.io prefix?

On the other hand, if the label set changing is not a concern, do we need to do the remove at all? Doing the add should replace the previous value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it'd make sense to remove all labels prefixed with linkerd.io 👍

@alpeb
Copy link
Member Author

alpeb commented Jun 23, 2021

I realized the injector is now erasing what the jaeger mutating webhook is doing. Stay tuned...

@alpeb
Copy link
Member Author

alpeb commented Jun 25, 2021

So the jaeger injector adds the env vars LINKERD2_PROXY_TRACE_ATTRIBUTES_PATH, LINKERD2_PROXY_TRACE_COLLECTOR_SVC_ADDR and LINKERD2_PROXY_TRACE_COLLECTOR_SVC_NAME into the proxy container. That mutation triggers again the proxy injector which first deletes the existing proxy container and adds it back, but doesn't know about those vars.

Here are some possible fixes:

Move some of the proxy template logic out of the helm template

Prior to generating the proxy container patch, the proxy injector would fetch all the env vars in the proxy container and merge them against the list of known "core" env vars, giving precedence to the latter, and use the resulting list in the patch. That means moving the rendering of the env vars out of the _proxy.tpl helm template and into the proxy injector code. The template would instead just iterate over a list of vars it receives as a param.

Use special names for env vars injected by extensions

  • Use a LINKERD2_PROXY_EXT prefix in env vars injected by extensions. So the jaeger ones would end up like LINKERD2_PROXY_EXT_TRACE_ATTRIBUTES_PATH, LINKERD2_PROXY_EXT_TRACE_COLLECTOR_SVC_ADDR and LINKERD2_PROXY_EXT_TRACE_COLLECTOR_SVC_NAME. So prior to removing the existing proxy container, the proxy-injector would gather any vars with that prefix, to re-apply them in the final patch. This requires changes in the proxy to use these new names.

@adleong
Copy link
Member

adleong commented Jun 25, 2021

Something to note about the proposals here is that they assume that extension webhooks will only add environment variables and will not modify any of the core env variables. This is true today, but we should consider if we're okay imposing this limitation going forward.

One of the very nice properties of the extension system today is that the proxy-injector has no coupling whatsoever to the extensions and the extensions may perform any arbitrary mutation to the proxy container. This could include modifying core env variables or other aspects of the proxy sidecar such as resource requests, security policy, etc. etc. This clean layering is possible because we can guarantee that the extension injectors will be invoked (or re-invoked) after the proxy-injector.

With this change, we would no longer be able to make that guarantee and would need to the proxy-injector to take into account that the extensions may have already performed mutations and be aware of what types of mutations are possible. Overall, I'm not sure that the benefit of allowing webhooks to add linkerd configuration through annotations is worth the increased complexity, increased coupling, and decreased flexibility that we would incur.

Instead, I would recommend that users either ensure the pod spec has all the desired annotations BEFORE it starts admission control, or that any webhooks which wish to perform mutation on the proxy container do it directly (i.e. by setting environment variables just like an extension would) instead of by setting annotations to control the proxy-injector.

@alpeb
Copy link
Member Author

alpeb commented Jun 28, 2021

Overall, I'm not sure that the benefit of allowing webhooks to add linkerd configuration through annotations is worth the increased complexity, increased coupling, and decreased flexibility that we would incur.

There's also #6267 to cater for, to assist policy discovery. Not a hard requirement, but would make that feature more complete and avoid surprises when using additional sidecars.

I have in mind a third alternative, that I think would actually make the injector current behavior smoother, without adding any kind of coupling. Currently the injector's patch consists of multiple add ops for annotations, labels and volumes. However, the proxy init and non-init containers are parsed in separate templates and fed wholesale into the patch. We could instead treat each part of them as an add op, which shouldn't interfere with mutations from other webhooks. Let me me give this a try and I'll report back.

@alpeb
Copy link
Member Author

alpeb commented Jun 30, 2021

Ok it is possible for the injector to generate a nice non-destructive json patch, instead of removing and then re-adding the proxy. However I painfully remembered that the purpose of the wholesale _proxy.tpl template is to be used as-is in the control plane deployments, particularly when installing through Helm, where templates is all we have at hand. So we can't get rid of that, and anything fancier like a non-destructive json patch in the injector requires separate logic, which would duplicate a lot of what we have in that template.

Yet another idea: stick to the rule that the proxy is to be configured only through annotations, forbidding extensions to directly mutate the proxy container. Instead, any such mutation (adding env vars, security policy, resources, etc) would then be declared by having the extension inject a new annotation containing the actual json patch for the mutation. Then when the injector runs, it would apply it on top of the regular patch.

What do you folks think? Any other suggestions?

@adleong
Copy link
Member

adleong commented Jun 30, 2021

I'm not sure what you mean by:

having the extension inject a new annotation containing the actual json patch for the mutation

Does this mean the extension webhook would add an annotation containing a json patch and then the proxy-injector would apply that patch? Why not just have the extension webhook perform the mutation directly? (like our existing extension webhooks do)

@alpeb
Copy link
Member Author

alpeb commented Jun 30, 2021

Does this mean the extension webhook would add an annotation containing a json patch and then the proxy-injector would apply that patch?

Correct. Assuming we want to go ahead with reinvocation, the problem we current have is that after the extension performs a mutation, the injector needs to remove the proxy and then add it again (as to re-use the Helm template logic), thus losing the changes made by the extension.

@adleong
Copy link
Member

adleong commented Jun 30, 2021

What problem are we trying to solve with reinvocation? If it's just to allow extensions to configure the proxy, I think it makes more sense for those extensions to configure the proxy directly and keep reinvocation disabled on the proxy-injector. This gives us the invariant that extension webhooks will always be invoked (or re-invoked) AFTER the proxy-injector.

@alpeb
Copy link
Member Author

alpeb commented Jul 1, 2021

What problem are we trying to solve with reinvocation? If it's just to allow extensions to configure the proxy, I think it makes more sense for those extensions to configure the proxy directly and keep reinvocation disabled on the proxy-injector. This gives us the invariant that extension webhooks will always be invoked (or re-invoked) AFTER the proxy-injector.

The problem reinvocation solves is to be able to configure the proxy given mutations done to the pod by webhooks running after the proxy injector, in particular other sidecar containers added after the proxy.

For example Cloudstate, which adds a sidecar and a config.linkerd.io/skip-outbound-ports annotation to have the proxy skip that sidecar. Granted, the workaround there is to require that annotation to be added beforehand.

Another more critical case is for policy enforcement (#6267). The proxy needs to know the list of ports exposed by all the containers as it will not allow connections to other ports. That list is not available through the downward API; only the injector can inspect the pod to retrieve it, and any container created after the injector runs will be locked out if the injector doesn't get another chance.

@adleong
Copy link
Member

adleong commented Jul 1, 2021

I think it makes things much simpler if we have the requirement that any mutating webhooks must assume that they will be invoked (or re-invoked) AFTER the proxy-injector. In the case of Cloudstate, this means that to add to the skip port list, their webhook should mutate the --outbound-ports-to-ignore flag of the proxy-init container instead of setting the config.linkerd.io/skip-outbound-ports annotation. Webhooks which add additional containers that listen on new ports would need to mutate the LINKERD2_PROXY_INBOUND_PORTS env variable to add their ports.

@alpeb
Copy link
Member Author

alpeb commented Jul 9, 2021

Closing this one for now, until need for reinvocation is clarified. #6267 is getting addressed by #6445 which also adds an annotation to handle the additional sidecar case.

@alpeb alpeb closed this Jul 9, 2021
@kleimkuhler kleimkuhler deleted the alpeb/injector-reinvoke-2 branch July 9, 2021 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the proxy injector MWC to include reinvocationPolicy
5 participants