Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created 1.8 cluster on sa-east-1 region with networking=kube-router and the cluster does not work #3986

Closed
felipejfc opened this issue Dec 2, 2017 · 18 comments
Labels
area/cni area/networking lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@felipejfc
Copy link
Contributor

kops release 1.8.0 beta.1

kops create cluster --node-count 1 --master-zones sa-east-1c --zones sa-east-1c,sa-east-1a --node-size c4.2xlarge --master-size m4.large --networking kube-router --cloud-labels "Cost\ Center=Backend,Owner=Backend,Environment=Production,Application=kubernetes" --network-cidr="172.23.0.0/16" --name mykube.com --ssh-public-key ~/.ssh/mykey.pub --kubernetes-version 1.8.4
  • Master node is not ready
  • No other node showing
$ kubectl get nodes
NAME                                          STATUS     AGE       VERSION
ip-172-23-91-116.sa-east-1.compute.internal   NotReady   19m       v1.8.4
$ kubectl describe node ip-172-23-91-116.sa-east-1.compute.internal 
...
KubeletNotReady 		runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
...
$ kubectl logs -n kube-system -f kube-router-pnqm2
I1202 02:51:11.931274       1 network_policy_controller.go:106] Starting network policy controller
I1202 02:51:11.931433       1 network_policy_controller.go:118] Performing periodic syn of the iptables to reflect network policies
I1202 02:51:11.932800       1 network_policy_controller.go:298] Iptables chains in the filter table are synchronized with the network policies.
I1202 02:51:11.936479       1 network_policy_controller.go:189] sync iptables took 5.037163ms
E1202 02:51:11.943658       1 network_routes_controller.go:82] Failed to get pod CIDR from CNI conf file: Failed to load CNI conf file: error reading /etc/cni/net.d/10-kuberouter.conf: open /etc/cni/net.d/10-kuberouter.conf: no such file or directory
E1202 02:51:11.946053       1 network_routes_controller.go:95] Failed to insert pod CIDR into CNI conf file: Failed to load CNI conf file: open /etc/cni/net.d/10-kuberouter.conf: no such file or directory
I1202 02:51:11.946067       1 network_routes_controller.go:99] Populating ipsets.
...
@felipejfc
Copy link
Contributor Author

the same same using flannel instead of kube-router works

@justinsb justinsb added this to the 1.8.1 milestone Dec 2, 2017
@chrislovecnm
Copy link
Contributor

/assign @murali-reddy

@k8s-ci-robot
Copy link
Contributor

@chrislovecnm: GitHub didn't allow me to assign the following users: murali-reddy.

Note that only kubernetes members can be assigned.

In response to this:

/assign @murali-reddy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrislovecnm
Copy link
Contributor

/cc @murali-reddy

/area cni

@murali-reddy
Copy link
Contributor

@felipejfc please update the daemonset definition to latest mainifest

there was fix to support init container that works with 1..8

I think latest kops release should have the fix.

@chrislovecnm
Copy link
Contributor

Can someone try kops beta 2?

@chrislovecnm
Copy link
Contributor

Or master

@khelll
Copy link

khelll commented Dec 5, 2017

I applied the daemonset definition as @murali-reddy instructed and things worked out.
Basically and to be in the safe side, I pulled out the definition from this file to a new file locally, then applied it:

kubectl apply -f definition.yml -n kube-system

I had to do this even after the new KOPs 1.8 release.

These are the logs now:

$kubectl describe daemonset kube-router -n kube-system

#.....
Events:
  Type     Reason            Age   From        Message
  ----     ------            ----  ----        -------
  Warning  FailedCreate      49m   daemon-set  Error creating: Pod "kube-router-f5mvh" is invalid: spec.initContainers[0].volumeMounts[0].name: Not found: "cni"
  Warning  FailedCreate      32m   daemon-set  Error creating: Pod "kube-router-kd2sf" is invalid: spec.initContainers[0].volumeMounts[0].name: Not found: "cni"
  Warning  FailedCreate      15m   daemon-set  Error creating: Pod "kube-router-v58bm" is invalid: spec.initContainers[0].volumeMounts[0].name: Not found: "cni"
  Normal   SuccessfulCreate  12m   daemon-set  Created pod: kube-router-6wn9w

@murali-reddy
Copy link
Contributor

Thanks @khelll. Thats strange. I will take a look.

@murali-reddy
Copy link
Contributor

murali-reddy commented Dec 14, 2017

@khelll just tested with latest cops

kops version
Version 1.8.0 (git-5099bc5)

and with 1.8.4

Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

I dont see any issue

 k get pods -n kube-system -l k8s-app=kube-router -o wide
NAME                READY     STATUS    RESTARTS   AGE       IP              NODE
kube-router-vrtpj   1/1       Running   0          7m        172.20.46.147   ip-172-20-46-147.us-west-2.compute.internal
kube-router-z5fm2   1/1       Running   0          6m        172.20.51.139   ip-172-20-51-139.us-west-2.compute.internal
kube-router-z6w2z   1/1       Running   0          6m        172.20.90.208   ip-172-20-90-208.us-west-2.compute.internal

@khelll
Copy link

khelll commented Dec 14, 2017

If I recall correctly, this happened because I upgraded my KOPs to 1.8 then upgraded the cluster via rolling update.

However, when I create a new cluster, I don't get that error.

@chrislovecnm
Copy link
Contributor

@khelll upgrade should work as well. Can you provide more details?

@ghost
Copy link

ghost commented Dec 15, 2017

I ran into this issue when upgrading as well. It only happened on a multi-master upgrade, our single master staging environment didn't have the same issue.

I'm just guessing here, but what I assume that happened is that the legacy annotation format init-container setup took precedence over the new syntax. I noticed that my daemonset had all 3 formats, alpha and beta annotations as well as the spec defined initcontainers.

If I manually tried to update the daemonset within the spec, I got the error about volumes. On the other hand if I updated all 3 variants, everything started working again.

Sadly I didn't try simply deleting the annotations and updating the spec defined initcontainers, but in my mind this is related to the legacy definitions.

@murali-reddy
Copy link
Contributor

@chrislovecnm any presribed way to do CNI upgrade. I see your comment in #3620.

We will need bootstrapbuilder.go version bumped as well

I have not addressed your comment. Has it got anything to with this issue.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2018
@justinsb justinsb modified the milestones: 1.9.0, 1.10 May 26, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@BobbyJohansen
Copy link

@murali-reddy I am having this issue when creating new instance groups with kubernetes 1.8.5 (kube-router) and kops 1.8.0. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni area/networking lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants