-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi AZ on existing VPC not working with Calico #4466
Comments
I wonder if this has anything to do with source/destination checking in aws ... Reading the docs it mentions a custom deployment to switch this off .. Can you check this has been switched off? .. or the output from the pod |
I'm actually running into the same issue after upgrading to 1.8.1. I'm rolling back now to see if that fixes it. My src/dest check was on and the
I also run kube2iam so it may be related... looking into it |
cc @ottoyiu |
I was able to replicate this. Looks like once The only other useful info I can provide is that I was rolling-updating masters when I noticed this from 1.8.0 to 1.8.1. Based on timing, I think the master that |
Seems like k8s-ec2-srcdst can't access 100.64.0.1, the kubernetes apiserver. Is it being scheduled on one of the masters? If so, is kube-proxy working as expected? That said, this controller is due for a rewrite to something more modern since it's written in a very old informer controller style... |
@ottoyiu yep, it's on the masters. Nothing odd being logged in kube-proxy, everything else seems fine. This doesn't seem to recover unless it is recreated, for me at least. Any way we can have the container fail when an error like that occurs so kubernetes will restart it? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
kops
version are you running?1.8.1
AWS
Sorry, this isn't a very simple setup. I'm installing kops into existing VPCs using terraform output.
I generate the initial config with:
Then I
kops edit cluste
and setand alter the
subnet
section to reference my existing VPCs/Subnets.Then I
kops update cluster --out . --target=terraform mycluster.com
and finallyterraform plan
andterraform apply
.The cluster was created but containers are unable to network with each other across availability zones. This mostly manifested itself as DNS errors as containers couldn't reach kube-dns running on other nodes. Nodes in the same AZ could talk to each other.
I redid the above process with
weave
networking and everything worked ok.I expected containers across availability zones to be able to talk to each other.
I have replaced the cluster with one running
weave
to get un-blocked. If this would be really helpful I can kill my cluster and recreate a broken one.The text was updated successfully, but these errors were encountered: