Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico/node fails to start due to config race #1009

Closed
caseydavenport opened this issue Aug 9, 2017 · 2 comments · Fixed by #1070
Closed

calico/node fails to start due to config race #1009

caseydavenport opened this issue Aug 9, 2017 · 2 comments · Fixed by #1070
Assignees
Milestone

Comments

@caseydavenport
Copy link
Member

$ kubectl logs calico-node-jgp8b -n kube-system -c calico-node -p
Checking datastore connection
Datastore connection verified
ERROR: Unable to set global default configuration: resource already exists: GlobalBGPConfig(name=LogLevel)
Terminating
Calico node failed to start

Expected Behavior

calico/node should be resilient to the resource already existing, and carry on starting.

Current Behavior

calico/node fails and restarts successfully.

Possible Solution

Steps to Reproduce (for bugs)

Start a lot of calico/node instances at once, see some of them restart.

Context

This was on GCE, 1k nodes starting at once. Looks like only a few hit this issue and started up just fine afterwards.

@bcreane
Copy link
Contributor

bcreane commented Aug 21, 2017

See #1038 for very useful (and quick to run) instructions for reproducing this problem with just a three node cluster.

@caseydavenport
Copy link
Member Author

caseydavenport commented Sep 1, 2017

I noticed another race on my latest large scale test run:

$ k logs calico-node-r143m -n kube-system -c calico-node -p
Checking datastore connection
Datastore connection verified
ERROR: Unable to set node resource configuration: update conflict: 'Node(name=gke-casey-2k-test-default-pool-e6a5c621-479x)'
Terminating
time="2017-09-01T17:22:15Z" level=error msg="Failed to apply object: too many retries" Key=Node(name=gke-casey-2k-test-default-pool-e6a5c621-479x)
Calico node failed to start

I think this is Calico fighting with something else in this case, so it's probably a separate issue.

I'll raise another.

EDIT: https://github.com/projectcalico/libcalico-go/issues/505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants