calico/node fails to start due to config race #1009

caseydavenport · 2017-08-09T18:09:46Z

$ kubectl logs calico-node-jgp8b -n kube-system -c calico-node -p
Checking datastore connection
Datastore connection verified
ERROR: Unable to set global default configuration: resource already exists: GlobalBGPConfig(name=LogLevel)
Terminating
Calico node failed to start

Expected Behavior

calico/node should be resilient to the resource already existing, and carry on starting.

Current Behavior

calico/node fails and restarts successfully.

Possible Solution

Steps to Reproduce (for bugs)

Start a lot of calico/node instances at once, see some of them restart.

Context

This was on GCE, 1k nodes starting at once. Looks like only a few hit this issue and started up just fine afterwards.

The text was updated successfully, but these errors were encountered:

bcreane · 2017-08-21T22:07:55Z

See #1038 for very useful (and quick to run) instructions for reproducing this problem with just a three node cluster.

caseydavenport · 2017-09-01T19:51:46Z

I noticed another race on my latest large scale test run:

$ k logs calico-node-r143m -n kube-system -c calico-node -p
Checking datastore connection
Datastore connection verified
ERROR: Unable to set node resource configuration: update conflict: 'Node(name=gke-casey-2k-test-default-pool-e6a5c621-479x)'
Terminating
time="2017-09-01T17:22:15Z" level=error msg="Failed to apply object: too many retries" Key=Node(name=gke-casey-2k-test-default-pool-e6a5c621-479x)
Calico node failed to start

I think this is Calico fighting with something else in this case, so it's probably a separate issue.

I'll raise another.

EDIT: https://github.com/projectcalico/libcalico-go/issues/505

caseydavenport added the priority/P1 label Aug 9, 2017

caseydavenport added this to the Calico v2.6.0 milestone Aug 9, 2017

bcreane mentioned this issue Aug 21, 2017

calico-node restarts on master after apply -f calico.yaml #1038

Closed

heschlie self-assigned this Aug 30, 2017

heschlie mentioned this issue Aug 31, 2017

Resolving race condition with Global configs in KDD #1070

Merged

3 tasks

caseydavenport closed this as completed in #1070 Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calico/node fails to start due to config race #1009

calico/node fails to start due to config race #1009

caseydavenport commented Aug 9, 2017

bcreane commented Aug 21, 2017

caseydavenport commented Sep 1, 2017 •

edited

Loading

calico/node fails to start due to config race #1009

calico/node fails to start due to config race #1009

Comments

caseydavenport commented Aug 9, 2017

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

bcreane commented Aug 21, 2017

caseydavenport commented Sep 1, 2017 • edited Loading

caseydavenport commented Sep 1, 2017 •

edited

Loading