Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kuberouter for k8s 1.16+ #8697

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

UnderMyBed
Copy link
Contributor

fixes #8695

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 8, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @UnderMyBed!

It looks like this is your first PR to kubernetes/kops 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kops has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @UnderMyBed. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 8, 2020
@hakman
Copy link
Member

hakman commented Mar 8, 2020

Thanks @UnderMyBed!
Why not just update the k8s-1.16.yaml.template file?
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 8, 2020
@UnderMyBed
Copy link
Contributor Author

Why not just update the k8s-1.16.yaml.template file?

I am assuming you meant k8s-1.12.yaml.template?

There were a few changes to the manifest and cni config that I made, beyond just adding the cniVersion, to bring it more inline with the official manifests. Since this was a non trivial change it felt "right" to version it here.

@UnderMyBed
Copy link
Contributor Author

/assign @justinsb @mikesplain

@rifelpet
Copy link
Member

for reference, here is a naive diff between this new template and the 1.12 template:

12,18c12,24
<       "name":"kubernetes",
<       "type":"bridge",
<       "bridge":"kube-bridge",
<       "isDefaultGateway":true,
<       "ipam": {
<         "type":"host-local"
<       }
---
>       "cniVersion":"0.3.0",
>       "name":"mynet",
>       "plugins":[
>         {
>           "name":"kubernetes",
>           "type":"bridge",
>           "bridge":"kube-bridge",
>           "isDefaultGateway":true,
>           "ipam":{
>               "type":"host-local"
>             }
>         }
>       ]
55a62,63
>         - name: KUBE_ROUTER_CNI_CONF_FILE
>           value: /etc/cni/net.d/10-kuberouter.conflist
84c92,95
<           if [ ! -f /etc/cni/net.d/10-kuberouter.conf ]; then
---
>           if [ ! -f /etc/cni/net.d/10-kuberouter.conflist ]; then
>             if [ -f /etc/cni/net.d/*.conf ]; then
>               rm -f /etc/cni/net.d/*.conf;
>             fi;
87c98,103
<             mv ${TMP} /etc/cni/net.d/10-kuberouter.conf;
---
>             mv ${TMP} /etc/cni/net.d/10-kuberouter.conflist;
>           fi;
>           if [ ! -f /var/lib/kube-router/kubeconfig ]; then
>             TMP=/var/lib/kube-router/.tmp-kubeconfig;
>             cp /etc/kube-router/kubeconfig ${TMP};
>             mv ${TMP} /var/lib/kube-router/kubeconfig;

Unfortunately we don't have an e2e test for kuberouter, let me put something together right now and we can get this merged asap. Ideally we can get the e2e job passing before this is merged, and then see it is still passing after this is merged.

@rifelpet
Copy link
Member

actually, i'm assuming kuberouter is broken for 1.16+ correct? that means I wont be able to get the job to pass. I'm inclined to merge this now and then we can use the new job to confirm it works. thoughts @mikesplain @hakman ?

@hakman
Copy link
Member

hakman commented Mar 12, 2020

You were right, I meant the k8s-1.12.yaml.template.
The changes that I see are not that big to require a new template.

Comment on lines 100 to 102
if [ ! -f /var/lib/kube-router/kubeconfig ]; then
TMP=/var/lib/kube-router/.tmp-kubeconfig;
cp /etc/kube-router/kubeconfig ${TMP};
mv ${TMP} /var/lib/kube-router/kubeconfig;
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should not be needed. It would only work when the ConfigMap contains a kubeconfig file like here:
https://github.com/cloudnativelabs/kube-router/blob/master/daemonset/generic-kuberouter-all-features.yaml#L26-L44

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, Good call. I can remove that section

@rifelpet
Copy link
Member

We have the e2e job up and running and I can confirm it is failing.

You can search the logs for kube-router and see that the container crashes and kubelet doesnt recognize the cni config that was added:

Mar 12 12:51:26.162873 ip-172-20-51-192 kubelet[9677]: W0312 12:51:26.162848    9677 cni.go:202] Error validating CNI config list {"cniVersion":"","name":"kubernetes","plugins":[{"bridge":"kube-bridge","ipam":{"subnet":"100.96.0.0/24","type":"host-local"},"isDefaultGateway":true,"name":"kubernetes","type":"bridge"}]}: [plugin bridge does not support config version ""]
Mar 12 12:51:26.162873 ip-172-20-51-192 kubelet[9677]: W0312 12:51:26.162870    9677 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d/

I noticed that kube-router has many suggested daemonset manifests, perhaps a comment at the top of this template that indicates which kube-router yaml file it was copied from? otherwise this looks good to me. Thanks for getting this fixed!

@hakman
Copy link
Member

hakman commented Mar 12, 2020

Thanks for removing the extra section @UnderMyBed. The changes seem quite small to limit them to 1.16+. Would it be ok with you to do them for 1.12+ or there is some compatibility issue that I'm missing?

@UnderMyBed
Copy link
Contributor Author

Thanks for removing the extra section @UnderMyBed. The changes seem quite small to limit them to 1.16+. Would it be ok with you to do them for 1.12+ or there is some compatibility issue that I'm missing?

Likely not, My motivation behind the limit was the changes and not being able to easily regression test the cni config changes back to 1.12. I'll refactor this to remove the 1.16 limitation.

@hakman
Copy link
Member

hakman commented Mar 12, 2020

Thanks for removing the extra section @UnderMyBed. The changes seem quite small to limit them to 1.16+. Would it be ok with you to do them for 1.12+ or there is some compatibility issue that I'm missing?

Likely not, My motivation behind the limit was the changes and not being able to easily regression test the cni config changes back to 1.12. I'll refactor this to remove the 1.16 limitation.

Cool. Thanks. Will do a quick test after this gets merged to master to see if there is any issue with 1.12.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 12, 2020
@hakman
Copy link
Member

hakman commented Mar 12, 2020

One more thing, this should be changed to 0.3.1-kops.3:

version := "0.3.1-kops.2"

Starting in k8s 1.16 the kublet requires that cniVersion is set in the cni config
@hakman
Copy link
Member

hakman commented Mar 12, 2020

Thanks @UnderMyBed
/lgtm
/assign @rifelpet

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2020
@rifelpet
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet, UnderMyBed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2020
@k8s-ci-robot k8s-ci-robot merged commit e902c45 into kubernetes:master Mar 12, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 12, 2020
@UnderMyBed
Copy link
Contributor Author

Thanks everyone for helping me out on my first PR 😄

@rifelpet
Copy link
Member

rifelpet commented Mar 12, 2020

No problem! Looks like the e2e job is now successfully creating a cluster too. Though its failing a couple tests, the cluster appears mostly functional.

@hakman
Copy link
Member

hakman commented Mar 13, 2020

No problem! Looks like the e2e job is now successfully creating a cluster too. Though its failing a couple tests, the cluster appears mostly functional.

@rifelpet we should think about updating the kube-router version on master before doing too much debugging about the failed tests.

@UnderMyBed UnderMyBed deleted the kuberouter-1.16-fix branch March 13, 2020 14:20
@rifelpet
Copy link
Member

Agreed, I see a 0.4.0 is out. I don't see any k8s version compatibility matrix though.

k8s-ci-robot added a commit that referenced this pull request Apr 7, 2020
…8735-#8742-#8751-upstream-release-1.17

Automated cherry pick of #8697 #8735 #8742 #8751: Update kube-router to v0.4.0
k8s-ci-robot added a commit that referenced this pull request Apr 7, 2020
…8735-#8742-#8751-upstream-release-1.16

Automated cherry pick of #8697 #8735 #8742 #8751: Update kube-router to v0.4.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1.16 Cluster fails to become healthy using kuberouter networking
6 participants