Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS not resolved in '-raft-adv-addr' after leader goes down #695

Closed
adrianchifor opened this issue Nov 26, 2020 · 18 comments · Fixed by #993
Closed

DNS not resolved in '-raft-adv-addr' after leader goes down #695

adrianchifor opened this issue Nov 26, 2020 · 18 comments · Fixed by #993

Comments

@adrianchifor
Copy link

adrianchifor commented Nov 26, 2020

I've managed to create an rqlite (5.5.0) cluster on Kubernetes (as a StatefulSet) and it comes up perfectly fine. It can even handle followers going down and it re-registers them with the new pod IPs, which is great.

However, when killing all nodes, the leader comes back and tries to ping the old nodes which no longer exist as the IPs changed, and everything gets stuck. I was expecting the leader to resolve DNS again for the other raft nodes and try to re-establish the cluster, but it looks like it only resolves DNS the first time or when followers re-join. -raft-adv-addr should probably be resolved on-demand and the IP not saved to the raft DB.

Leader errors after killing all nodes and getting new IPs:

2020-11-26T09:05:49.321Z [INFO]  raft: Node at [::]:4002 [Candidate] entering Candidate state in term 29043
2020-11-26T09:05:49.322Z [ERROR] raft: Failed to make RequestVote RPC to {Voter node1 100.72.3.247:4002}: dial tcp 100.72.3.247:4002: connect: no route to host
2020-11-26T09:05:49.322Z [ERROR] raft: Failed to make RequestVote RPC to {Voter node1 100.72.3.247:4002}: dial tcp 100.72.3.247:4002: connect: no route to host
2020-11-26T09:05:49.323Z [ERROR] raft: Failed to make RequestVote RPC to {Voter node2 100.74.1.150:4002}: dial tcp 100.74.1.150:4002: connect: no route to host
2020-11-26T09:05:49.323Z [ERROR] raft: Failed to make RequestVote RPC to {Voter node2 100.74.1.150:4002}: dial tcp 100.74.1.150:4002: connect: no route to host
2020-11-26T09:05:50.589Z [WARN]  raft: Election timeout reached, restarting election

node0

rqlited \
  -node-id=node0 \
  -http-addr=0.0.0.0:4001 \
  -raft-addr=0.0.0.0:4002 \
  -http-adv-addr=rqlite-0.rqlite:4001 \
  -raft-adv-addr=rqlite-0.rqlite:4002 \
  /data

node1 and node2

rqlited \
  -node-id=node1 \
  -http-addr=0.0.0.0:4001 \
  -raft-addr=0.0.0.0:4002 \
  -http-adv-addr=rqlite-1.rqlite:4001 \
  -raft-adv-addr=rqlite-1.rqlite:4002 \
  -join=http://rqlite-0.rqlite:4001 \
  /data

rqlited \
  -node-id=node2 \
  -http-addr=0.0.0.0:4001 \
  -raft-addr=0.0.0.0:4002 \
  -http-adv-addr=rqlite-2.rqlite:4001 \
  -raft-adv-addr=rqlite-2.rqlite:4002 \
  -join=http://rqlite-0.rqlite:4001 \
  /data

/status?pretty

...
"leader": {
    "addr": "[::]:4002",
    "node_id": "node0"
},
"metadata": {
    "node0": {
        "api_addr": "rqlite-0.rqlite:4001",
        "api_proto": "http"
    },
    "node1": {
        "api_addr": "rqlite-1.rqlite:4001",
        "api_proto": "http"
    },
    "node2": {
        "api_addr": "rqlite-2.rqlite:4001",
        "api_proto": "http"
    }
},
"node_id": "node0",
"nodes": [
    {
        "id": "node0",
        "addr": "[::]:4002"
    },
    {
        "id": "node1",
        "addr": "100.72.3.247:4002"  <--- Should be DNS
    },
    {
        "id": "node2",
        "addr": "100.74.1.150:4002"  <--- Should be DNS
    }
],
...

Is this expected behavior and maybe I'm just using the flags wrong? Any advice would be much appreciated!

@adrianchifor
Copy link
Author

Found this workaround https://github.com/techyugadi/kubestash/blob/master/stateful/rqlite/rqlitests.yml#L80

Followers remove themselves from the raft nodes list before dying, but this will fail if the leader is down or not responding.

Also feels like a hack for covering the failure of not re-resolving -raft-adv-addr DNS.

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

So this is a Hashicorp Raft thing, not an rqlite thing. For some reason this is the way its always worked -- it doesn't keep hostnames at the Raft layer, but keeps resolved IP addresses.

What you are indicating with "Should be DNS" is coming from Hashicorp code.

I'm not entirely sure why it works like this, but it always has. The code in question is what powers Hashicorp Consul, which is a well established piece of software. Perhaps some research on how Consul handles this might be the answer? Presumably whatever is the right way to handle nodes coming back up with different IP addresses with Consul can be applied to rqlite.

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

Here is the rqlite code that creates that output you reference:

https://github.com/rqlite/rqlite/blob/v5.6.0/store/store.go#L394

Note the call to GetConfiguration(). At no point does rqlite resolve hostnames and pass the resultant IP addresses to the Raft layer. It is the Hashicorp layer doing that.

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

I'll double-check my work, just to be sure. It's been a while since I looked at the networking layer of rqlite.

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

Well, well, I forgot how my own code works:

https://github.com/rqlite/rqlite/blob/master/cluster/join.go#L58

The rqlite layer does resolve addresses before sending the details to the node it is joining. However I still believe Hashicorp Raft takes this address, resolves it, and stores that in its internal config.

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

Specific source code in v5.6.0: https://github.com/rqlite/rqlite/blob/v5.6.0/cluster/join.go#L58

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

Trying out removing the resolution in this PR: #697

@otoolep
Copy link
Member

otoolep commented Nov 26, 2020

I have removed the resolve operation. However the original statement I made about the Hashicorp Raft layer still applies. rqlite now calls this function:

https://godoc.org/github.com/hashicorp/raft#Raft.AddVoter

with whatever is the advertised address for the joining node (hostname or IP -- hostname in your case). See:

f = s.raft.AddVoter(raft.ServerID(id), raft.ServerAddress(addr), 0, 0)

I might be misinterpreting what I am seeing, I'll continue looking into this.

@adrianchifor
Copy link
Author

Wow that was quick, thanks so much for looking into it! I had a suspicion it was the raft lib usage, was about to dig deeper so I'm happy to see it's sorted.

I assume this would go into v5.6.1 ? I'll test it out tomorrow morning if you can publish the Docker image.

@otoolep
Copy link
Member

otoolep commented Nov 27, 2020

I can release a new version, but I'm not convinced you'll see anything different. When I test with this change in place, it makes no difference. The Raft layer is still using IP addresses. That is why I need to look into it more, and see if there is still something I'm doing wrong.

In the meantime you might like to do some research and see how the community works with Hashicorp Consul, since it is built on the same Raft library.

https://www.consul.io/docs/agent/options.html

@adrianchifor
Copy link
Author

I have consul running in the same cluster so I'll check its configuration and report back if I find any resolution. Would still be worth testing with the new version.

@otoolep
Copy link
Member

otoolep commented Nov 28, 2020

Thanks @adrianchifor -- that would be great. You're working in an area I don't know a huge amount about (rqlite and k8s) so any guidance you can provide to make rqlite work better in this area would be much appreciated.

@otoolep
Copy link
Member

otoolep commented Nov 29, 2020

I looked into the Hashicorp Raft code, and when it comes to the leader it deals in network addresses, not hostnames. You can see this here:

Creation of Raft node: https://github.com/hashicorp/raft/blob/v1.2.0/api.go#L445
Where a node sets its own "Server Address": https://github.com/hashicorp/raft/blob/v1.2.0/api.go#L489

This second line is a call to the Go networking library, specifically Addr() at:

https://golang.org/pkg/net/#Listener

This returns a network address, not a hostname.

So the latest change I put in place will "fix" it for followers, but not for the leader. I'm not sure why the Raft library works like this.

@otoolep
Copy link
Member

otoolep commented Nov 29, 2020

Here is the status output will my latest changes in place (on master):

        "nodes": [
            {
                "id": "localhost:4002",
                "addr": "127.0.0.1:4002"
            },
            {
                "id": "localhost:4004",
                "addr": "localhost:4004"
            }
        ],

As you can see node 2's addr is set to the hostname, but addr for the first node (leader, which was brought up first) is a network address.

@otoolep
Copy link
Member

otoolep commented Nov 29, 2020

One clean way to deal with this, assuming the nodes know where the leader is when they come up, is to attempt to explicitly rejoin using the same node ID but the new IP address. When the leader sees a node come with a known ID, but new IP address, it will perform a remove of that node first, and then re-add the node -- all as part of the Join operation. You can see that logic here:

https://github.com/rqlite/rqlite/blob/v5.6.0/store/store.go#L647

This means you don't need to worry about the leader being up when the nodes die -- you can clean up when you re-join (and if the leader isn't up when you attempt to re-join, well your join-attempt is moot).

FWIW, it's safe to tell a node to join a cluster it's already a member of.

Might that help?

@otoolep
Copy link
Member

otoolep commented Nov 29, 2020

I'm going to revert the previous changes for now, as folks may be relying on the current behaviour, and the change isn't doing what we hope. I feel like I'm missing something here, and there is a different way to address the issue you are seeing.

@otoolep
Copy link
Member

otoolep commented Feb 5, 2022

Fixed in 7.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants