Nomad thinks address of all other servers is 127.0.0.1 #1140

ghost · 2016-05-02T10:08:27Z

Nomad version

Nomad v0.3.2

Operating system and Environment details

Linux consul-master-eu-west-1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux

Issue

I've upgraded Nomad from v0.3.1 to v0.3.2 and restarted each of the servers in turn. Nomad is now unable to elect a leader because it thinks all other servers are reachable on 127.0.0.1:

Name                            Address    Port  Status  Leader  Protocol  Build  Datacenter    Region
consul-master-eu-west-1.europe  127.0.0.1  4648  alive   false   2         0.3.2  europe-west1  europe
consul-master-eu-west-2.europe  127.0.0.1  4648  failed  false   2         0.3.2  europe-west1  europe
consul-master-eu-west-3.europe  127.0.0.1  4648  failed  false   2         0.3.2  europe-west1  europe

The configuration of each node has the following format (substituing the correct IP address):

{
  "advertise": {
    "rpc": "10.133.0.4:4647"
  },
  "bind_addr": "0.0.0.0",
  "client": {
    "enabled": false
  },
  "data_dir": "/var/nomad",
  "datacenter": "europe-west1",
  "region": "europe",
  "server": {
    "bootstrap_expect": 3,
    "enabled": true
  }
}

Nomad Server logs (if appropriate)

The same pattern of errors is visible in the logs of all server nodes:

May  2 10:06:32 consul-master-eu-west-1 nomad[24615]: 2016/05/02 10:06:32 [INFO] serf: attempting reconnect to consul-master-eu-west-3.europe 127.0.0.1:4648
May  2 10:07:02 consul-master-eu-west-1 nomad[24615]: 2016/05/02 10:07:02 [INFO] serf: attempting reconnect to consul-master-eu-west-2.europe 127.0.0.1:4648
May  2 10:07:32 consul-master-eu-west-1 nomad[24615]: 2016/05/02 10:07:32 [INFO] serf: attempting reconnect to consul-master-eu-west-2.europe 127.0.0.1:4648

The text was updated successfully, but these errors were encountered:

ghost · 2016-05-02T10:18:36Z

Rolling back to v0.3.1 fixes the issue.

Name                            Address     Port  Status  Protocol  Build  Datacenter    Region
consul-master-eu-west-1.europe  10.133.0.4  4648  alive   2         0.3.1  europe-west1  europe
consul-master-eu-west-2.europe  10.133.0.6  4648  alive   2         0.3.1  europe-west1  europe
consul-master-eu-west-3.europe  10.133.0.7  4648  alive   2         0.3.1  europe-west1  europe

dadgar · 2016-05-04T17:00:25Z

Can you add the serf key to your advertise block.

igrayson · 2016-05-22T06:06:08Z

I hit up against this just now. Rolling back also fixed it for me.

Adding the serf key also fixes it on 0.3.2:

advertise {
  rpc  = "10.10.36.2:4647"
  http = "10.10.36.2:4648"
  serf = "10.10.36.2:4648"
}

I haven't gotten a 3-node leader election working, yet, so I can't confirm a full fix.

igrayson · 2016-05-22T08:16:25Z

Election started working with the addition of serf clause, after I wiped each node's data directory.

Before doing so, this general pattern repeated:

...
    2016/05/22 06:30:05 [INFO] raft: Duplicate RequestVote for same term: 778
    2016/05/22 06:30:05 [WARN] raft: Duplicate RequestVote from candidate: 10.10.3.95:4647
    2016/05/22 06:30:05 [WARN] raft: Remote peer 127.0.0.1:4647 does not have local node 10.10.3.95:4647 as a peer
    2016/05/22 06:30:05 [DEBUG] raft: Vote granted from 127.0.0.1:4647. Tally: 2
    2016/05/22 06:30:05 [INFO] raft: Election won. Tally: 2
    2016/05/22 06:30:05 [INFO] raft: Node at 10.10.3.95:4647 [Leader] entering Leader state
    2016/05/22 06:30:05 [INFO] nomad: cluster leadership acquired
    2016/05/22 06:30:05 [WARN] raft: Clearing log suffix from 925 to 926
    2016/05/22 06:30:05 [INFO] raft: Node at 10.10.3.95:4647 [Follower] entering Follower state
    2016/05/22 06:30:05 [INFO] nomad: cluster leadership lost
    2016/05/22 06:30:05 [ERR] nomad: failed to wait for barrier: leadership lost while committing log
    2016/05/22 06:30:05 [INFO] raft: pipelining replication to peer 127.0.0.1:4647
    2016/05/22 06:30:05 [INFO] raft: aborting pipeline replication to peer 127.0.0.1:4647
    2016/05/22 06:30:05 [ERR] worker: failed to dequeue evaluation: rpc error: rpc error: rpc error: rpc error: < snipped hundreds of lines of this > rpc error: rpc error: rpc error: rpc error: rpc error: No cluster leader
...

dadgar · 2016-05-24T16:48:55Z

Going to close this as you have to include the advertise address.

clumsy · 2016-10-13T19:24:27Z

@dadgar Doesn't it look like a bug that you need to wipe data-dir clean after changing the advertise options? I've spent quite a while to figure out the problem before I've found this single helpful comment from @igrayson

dadgar · 2016-10-13T20:25:28Z

@clumsy Got a chuckle out of your handle and the situation. I think it needs to be a validation error so you can never even get into this state.

github-actions · 2022-12-19T02:11:55Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar closed this as completed May 24, 2016

github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad thinks address of all other servers is 127.0.0.1 #1140

Nomad thinks address of all other servers is 127.0.0.1 #1140

ghost commented May 2, 2016 •

edited by ghost

Loading

ghost commented May 2, 2016 •

edited by ghost

Loading

dadgar commented May 4, 2016

igrayson commented May 22, 2016 •

edited

Loading

igrayson commented May 22, 2016

dadgar commented May 24, 2016

clumsy commented Oct 13, 2016

dadgar commented Oct 13, 2016

github-actions bot commented Dec 19, 2022

Nomad thinks address of all other servers is 127.0.0.1 #1140

Nomad thinks address of all other servers is 127.0.0.1 #1140

Comments

ghost commented May 2, 2016 • edited by ghost Loading

Nomad version

Operating system and Environment details

Issue

Nomad Server logs (if appropriate)

ghost commented May 2, 2016 • edited by ghost Loading

dadgar commented May 4, 2016

igrayson commented May 22, 2016 • edited Loading

igrayson commented May 22, 2016

dadgar commented May 24, 2016

clumsy commented Oct 13, 2016

dadgar commented Oct 13, 2016

github-actions bot commented Dec 19, 2022

ghost commented May 2, 2016 •

edited by ghost

Loading

ghost commented May 2, 2016 •

edited by ghost

Loading

igrayson commented May 22, 2016 •

edited

Loading