Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verified workaround for kube-proxy hostnameOverride issue #9663

Closed
wants to merge 2 commits into from

Conversation

dlipovetsky
Copy link
Contributor

@dlipovetsky dlipovetsky commented Jul 28, 2018

Updates the patch that is a work-around to kubernetes/kubeadm#857.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 28, 2018
@k8sio-netlify-preview-bot
Copy link
Collaborator

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 50625ae

https://deploy-preview-9663--kubernetes-io-master-staging.netlify.com

@dlipovetsky
Copy link
Contributor Author

@detiber @timothysc

@neolit123
Copy link
Member

/sig cluster-lifecycle
/king bug

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jul 28, 2018
@mdlinville
Copy link
Contributor

/approve

Needs technical LGTM

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mistyhacks

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2018
@neolit123
Copy link
Member

/assign @detiber
/assign @timothysc

},
{
"op": "add",
"path": "/spec/template/spec/containers/0/command/-",
"value": "--hostname-override=${NODE_NAME}"
"path": "/spec/template/spec/initContainers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch to an initContainer here? Is the --hostname-override command line option no longer taking precedence over the component config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly right. I found that --hostname-override did not take precedence over the component config. I tested with v1.10.4 and v1.11.1 and reported my findings in kubernetes/kubeadm#857.

Thanks for taking a look!

Copy link
Member

@neolit123 neolit123 Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlipovetsky @detiber
that's very odd. so for kube-proxy --hostname-override just overrides the default config:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-proxy/app/server.go#L144

i don't think anything should modify it at that point, right?

@stewart-yu
any ideas about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 I ran experiments and found that the config overrides flags (Please see my findings in kubernetes/kubeadm#857 (comment)).

Looking at the code, I find that the flags update a config struct, but later the configuration file is unmarshalled into a new config struct, and the pointer to the original config struct is overwritten, effectively discarding all the flag values. (Maybe this is by design, since the flags are deprecated.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, even if the flags are deprecated they have a period of N releases where they should still work as expected. if this is by design, then we have a bit of an issue here.

my problem is that i don't know who to poke about kube-proxy.
i usually ask @stewart-yu but let's ask the whole sig-network about this...

/cc @kubernetes/sig-network-bugs
please, have a look at the observations by @dlipovetsky above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file and flags appear to have been mutually exclusive starting with the commit that introduced support for the config file. That was in v1.7.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/bug Categorizes issue or PR as related to a bug. labels Aug 2, 2018
}
}
],
"image": "k8s.gcr.io/kube-proxy-amd64:v1.11.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a note that the arch and version is hardcoded here.

Copy link
Member

@neolit123 neolit123 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON doesn't allow comments, so it should go bellow the JSON example.

something in the lines of this perhaps?

Please replace amd64 and v1.11.1 in kube-proxy-amd64:v1.11.1 with the architecture and version you wish to use for the kube-proxy image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use "busybox" instead, sine all it's doing is invoking sed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to use the same image for the init container and kube-proxy container because the busybox image might not be available. I am applying this patch on an air-gapped cluster where busybox is not present.

Air-gapped clusters are not the common use case, so I think it would be ok to use busybox by default and add a note to replace busybox with the kube-proxy image if busybox is not available.

I can also think of some way to derive the correct kube-proxy image at the time the patch is applied, but that might be over-engineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber Do you have an opinion on whether the patch should use the busybox image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlipovetsky On one hand I like the idea of using a minimal image for the initContainer, but on the other I do like the idea of not requiring an additional image for air-gapped environments. I'm not sure that either approach has a clear advantage for a temporary workaround...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Temporary" to a point: as of now we don't know who will work on making kube-proxy's --hostname-override flag work again.

Regarding the air-gapped situation, I build my machine images with busybox pulled, and use the "imagePullPolicy" field with value "IfNotPresent," so that the init container here won't pull the image again. We could achieve the same thing by specifying an image tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seh are you able to submit a PR with your proposed changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can do that tomorrow morning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #9878.

"path": "/spec/template/spec/volumes/-",
"value": {
"name": "shared-data",
"mountPath": "/shared-data"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this volume definition mention it being of type of "emptyDir?" The presence of the "mountPath" field looks wrong here.

@neolit123
Copy link
Member

@dlipovetsky
so we had a talk about it at the sig-cluster-lifecycle meeting today and since this seems like a kube-proxy bug we should probably merge after resolving the review comments.

i will leave it to @detiber for the LGTM.

@seh
Copy link
Contributor

seh commented Aug 8, 2018

It is unfortunate that we have kube-proxy trying to use a single, common configuration file on all nodes in the cluster, but it gets every one of those node names wrong, so we have to override its use of the hostname—replacing it with the node name (which can't match the hostname in AWS)—but then a flag-handling bug in kube-proxy forces us to make a unique copy of that common configuration file on every node. We're back where we'd be if kube-proxy never read a common configuration file at all, only with the brittleness of trying to patch that file.

While we're here, I'll note that kube-proxy's --hostname-override flag should really be called --node-name-override or --identification-override, since kube-proxy is detecting the hostname correctly. We just don't want it to use that hostname for its "identity," per the flag documentation:

If non-empty, will use this string as identification instead of the actual hostname.

seh referenced this pull request in kubernetes/kubernetes Aug 8, 2018
When kube-proxy was refactored to use a configuration file, the ability
to use 0 for conntrack min, max, max per core, and tcp timeouts was
inadvertently broken; if you specified 0, it would instead apply the
default value from defaults.go.

This change restores the ability to use 0 to mean 0.

Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
seh referenced this pull request in kubernetes/kubernetes Aug 8, 2018
Add support for configuring kube-proxy via a config file instead of
command line flags.
@Bradamant3
Copy link
Contributor

Bradamant3 commented Aug 8, 2018

/assign

leaving docs approval in place, but it looks to me as though we could use some explanatory words in addition to the examples, whenever and however you decide how to proceed. If you add words, you could tag me and I'll take a look (or I can follow up after merge, either way).

@seh
Copy link
Contributor

seh commented Aug 8, 2018

Since I apply this patch when each master instance starts, I wanted to use an idempotent operation, and hence need to avoid using a JSON patch like the proposed change here does. I found that I could use a strategic merge patch instead, so long as I am willing to repeat the command-line arguments for the "kube-proxy" container.

kubectl --namespace=kube-system patch daemonset kube-proxy \
        --patch='
spec:
  template:
    spec:
      volumes:
      - name: shared-data
        emptyDir: {}
      initContainers:
      - name: update-config-file
        image: busybox
        imagePullPolicy: IfNotPresent
        command:
        - sh
        - -c
        - "/bin/sed \"s/hostnameOverride: \\\"\\\"/hostnameOverride: $(NODE_NAME)/\" /var/lib/kube-proxy/config.conf > /shared-data/config.conf"
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        volumeMounts:
        - name: kube-proxy
          mountPath: /var/lib/kube-proxy
          readOnly: true
        - name: shared-data
          mountPath: /shared-data
      containers:
      - name: kube-proxy
        command:
        - /usr/local/bin/kube-proxy
        - --config=/shared-data/config.conf
        volumeMounts:
        - name: shared-data
          mountPath: /shared-data
          readOnly: true'

Omitted here are the other parts of this DaemonSet I've needed to patch to keep up with kubernetes/kubernetes#65931.

Notable differences from @dlipovetsky's version:

  • The "shared-data" volume is of type "emptyDir."
  • The init container uses the "busybox" image, not counting on the "k8s.gcr.io/kube-proxy-amd64" image to have sed available.
  • Two of the volume mounts are read-only to clarify their roles as sources and destinations.

@dlipovetsky
Copy link
Contributor Author

@seh Thanks a lot for contributing your version of the patch. Because applying it should be idempotent, I think it's a better solution than the JSON patch in this PR.

However, when I tested your patch, I found it could not be applied more than once, and kubectl exited with a non-zero exit code. Please see the logs here: https://gist.github.com/dlipovetsky/f9d84c20827b025f646e2bf7193df75c

@seh
Copy link
Contributor

seh commented Aug 8, 2018

Yes, @dlipovetsky, I did observe that, even with the previous version of my strategic merge patch version of my workaround for kubernetes/kubernetes#65931. I didn't include it here since I thought it would obscure the focus of the proposal, but here's the additional harness that I used to overcome kubectl patch's annoying exit code choice.

patch_output=$( \
  kubectl --namespace=kube-system patch daemonset kube-proxy \
          --patch='
spec:
  template:
# ...
')
if [ $? != 0 ] && [ "${patch_output}" != 'daemonset.extensions/kube-proxy not patched' ]; then
  code=$?
  echo "Failed to patch kube-proxy DaemonSet: ${patch_output}" 1>&2
  exit ${code}
fi

The script feels clumsy to me, so I'm open to suggestions for how to improve it. In short, when kubectl patch doesn't apply a patch because it determines that no change is necessary, it exits with code 1, even though it ostensibly succeeded in ensuring the target object conforms to the patch. I'm assuming that if kubectl patch fails for some more grave reason, it will print a different message than the one I've captured there.

Note that kubectl delete has the --ignore-not-found flag. Here, I'd like to see kubectl patch learn --treat-conformance-as-success or something to that effect.

@dlipovetsky
Copy link
Contributor Author

@seh Thanks for explaining! Like you, I apply the patch across multiple masters, and I have a similar wrapper that makes applying the patch in the PR idempotent. (I agree that having --treat-conformance-as-success would be very useful!)

Both the JSON patch and strategic merge patch require a script in order to make the apply idempotent. If the strategic merge did not require a script, I would prefer it. As it stands, I prefer the JSON patch because it makes very explicit what is changed, added, or removed.

If it's ok with you, I will update the JSON patch (in the PR) with your suggestions to use emptyDir for the shared-data volume and make the volume mounts read-only.


@detiber, @neolit123 : If you prefer the strategic merge patch @seh suggested in #9663 (comment), I am happy to close this PR and perhaps @seh can create a PR with the strategic merge patch.

@detiber
Copy link
Member

detiber commented Aug 13, 2018

@dlipovetsky I'm fine with the current approach as it is only meant to be a workaround.

@dlipovetsky
Copy link
Contributor Author

@detiber Ok, thanks!

@dlipovetsky
Copy link
Contributor Author

Closing in favor of #9878.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants