Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA dqlite seems to require the first server to be always UP #1391

Open
eleaner opened this issue Feb 6, 2020 · 24 comments
Open

HA dqlite seems to require the first server to be always UP #1391

eleaner opened this issue Feb 6, 2020 · 24 comments

Comments

@eleaner
Copy link

@eleaner eleaner commented Feb 6, 2020

Version:
k3s version v1.17.2+k3s1 (cdab19b)

Describe the bug
three nodes ha cluster

kubectl get node
NAME   STATUS   ROLES    AGE   VERSION
cbo1   Ready    master   38m   v1.17.2+k3s1
cbo3   Ready    master   35m   v1.17.2+k3s1
cbo2   Ready    master   37m   v1.17.2+k3s1

cbo1 is the first server
cbo2 and cbo3 joined to him

when either cbo2 or cbo3 are down - cluster is up

get node
NAME   STATUS     ROLES    AGE   VERSION
cbo1   Ready      master   42m   v1.17.2+k3s1
cbo2   Ready      master   41m   v1.17.2+k3s1
cbo3   NotReady   master   38m   v1.17.2+k3s1

however when cbo1 is down

kubectl get node
Error from server: rpc error: code = Unknown desc = failed to create dqlite connection: no available dqlite leader server found

To Reproduce
cbo1
/usr/local/bin/k3s server --cluster-init --tls-san cbo1.ip --flannel-backend ipsec
cbo2 and cbo3
/usr/local/bin/k3s server --server https://cbo1.ip:6443 --flannel-backend ipsec

Expected behavior
kubectl get node

NAME   STATUS   ROLES    AGE   VERSION
cbo1   NotReady    master   38m   v1.17.2+k3s1
cbo3   Ready    master   35m   v1.17.2+k3s1
cbo2   Ready    master   37m   v1.17.2+k3s1

Actual behavior

kubectl get node
Error from server: rpc error: code = Unknown desc = failed to create dqlite connection: no available dqlite leader server found

Additional context

@brandond
Copy link
Contributor

@brandond brandond commented Feb 6, 2020

Is your k3s.yaml pointed at just the one node, or does it point at an address or hostname that will remain available when the first node is down? You might have multiple masters but it doesn't do you much good if you can't talk to any of them but the first one.

@eleaner
Copy link
Author

@eleaner eleaner commented Feb 6, 2020

thanks, @brandond. I didn't think it is worth to mention

k3s.yaml points to all three servers
technically there re three A records in the DNS with 1s TTL
but I also addressed each one of the active servers directly - the result is the same

however, if I address the first server that is completely down I get
Unable to connect to the server: dial tcp cbo1.ip:6443: connect: operation timed out
and if I just stop k3s
The connection to the server cbo1.ip:6443 was refused - did you specify the right host or port?
that's as expected

@brandond
Copy link
Contributor

@brandond brandond commented Feb 7, 2020

What do the logs say?

I built a 3 node dqlite cluster on Ubuntu in VirtualBox just a day or two ago so I know it should work, mostly. CPU usage seemed to be unusual high so I went back to single master.

@eleaner
Copy link
Author

@eleaner eleaner commented Feb 7, 2020

I never said the cluster is not working
I only said that it cannot find dqlite leader when the first server is down

the most interesting thing in the logs seems to be
ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
I can see it every second, even if the cluster is up

Feb 07 07:52:43 cbo2 k3s[24261]: E0207 07:52:43.717128   24261 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Feb 07 07:52:43 cbo2 k3s[24261]: E0207 07:52:43.778638   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:43 cbo2 k3s[24261]: I0207 07:52:43.778679   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:44 cbo2 k3s[24261]: E0207 07:52:44.647233   24261 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'dial tcp 10.43.30.222:443: i/o timeout', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
Feb 07 07:52:44 cbo2 k3s[24261]: I0207 07:52:44.647272   24261 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
Feb 07 07:52:44 cbo2 k3s[24261]: E0207 07:52:44.779024   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:44 cbo2 k3s[24261]: I0207 07:52:44.779067   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:45 cbo2 k3s[24261]: E0207 07:52:45.779385   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:45 cbo2 k3s[24261]: I0207 07:52:45.779430   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:46 cbo2 k3s[24261]: E0207 07:52:46.779827   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:46 cbo2 k3s[24261]: I0207 07:52:46.779877   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:47 cbo2 k3s[24261]: E0207 07:52:47.780244   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:47 cbo2 k3s[24261]: I0207 07:52:47.780289   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:48 cbo2 k3s[24261]: E0207 07:52:48.722943   24261 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.30.222:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Feb 07 07:52:48 cbo2 k3s[24261]: E0207 07:52:48.780575   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:48 cbo2 k3s[24261]: I0207 07:52:48.780622   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:49 cbo2 k3s[24261]: E0207 07:52:49.780820   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:49 cbo2 k3s[24261]: I0207 07:52:49.780852   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:50 cbo2 k3s[24261]: E0207 07:52:50.781207   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:52:50 cbo2 k3s[24261]: I0207 07:52:50.781250   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:52:51 cbo2 k3s[24261]: I0207 07:52:51.353469   24261 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io

when the first server is down it adds information on failing dqlite

Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.550290916Z" level=error msg="error while range on /registry/storageclasses /registry/storageclasses: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.812393865Z" level=info msg="Connecting to proxy" url="wss://167.86.89.20:6443/v1-k3s/connect"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.813253780Z" level=error msg="Failed to connect to proxy" error="dial tcp 167.86.89.20:6443: connect: connection refused"
Feb 07 07:58:10 cbo2 k3s[24261]: time="2020-02-07T07:58:10.813287562Z" level=error msg="Remotedialer proxy error" error="dial tcp 167.86.89.20:6443: connect: connection refused"
Feb 07 07:58:11 cbo2 k3s[24261]: E0207 07:58:11.108462   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:11 cbo2 k3s[24261]: I0207 07:58:11.108506   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:12 cbo2 k3s[24261]: E0207 07:58:12.108723   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:12 cbo2 k3s[24261]: I0207 07:58:12.108766   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:12 cbo2 k3s[24261]: time="2020-02-07T07:58:12.333153888Z" level=error msg="error while range on /registry/leases /registry/leases: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:13 cbo2 k3s[24261]: time="2020-02-07T07:58:13.085579967Z" level=error msg="error while range on /registry/clusterrolebindings /registry/clusterrolebindings: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:13 cbo2 k3s[24261]: E0207 07:58:13.108975   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:13 cbo2 k3s[24261]: I0207 07:58:13.109016   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:14 cbo2 k3s[24261]: E0207 07:58:14.109329   24261 handle_charon.go:84] ClientConnection failed: dial unix /var/run/charon.vici: connect: connection refused
Feb 07 07:58:14 cbo2 k3s[24261]: I0207 07:58:14.109392   24261 handle_charon.go:87] Retrying in a second ...
Feb 07 07:58:14 cbo2 k3s[24261]: time="2020-02-07T07:58:14.399545671Z" level=error msg="error while range on /registry/csinodes /registry/csinodes: failed to create dqlite connection: no available dqlite leader server found"
Feb 07 07:58:14 cbo2 k3s[24261]: time="2020-02-07T07:58:14.482484472Z" level=error msg="error while range on /registry/minions /registry/minions: failed to create dqlite connection: no available dqlite leader server found"

@eleaner
Copy link
Author

@eleaner eleaner commented Feb 7, 2020

oh, and now it seems I have a different message returned by kubectl

kubectl get node
Error from server (ServiceUnavailable): the server is currently unable to handle the request

@brandond
Copy link
Contributor

@brandond brandond commented Feb 8, 2020

How long did you wait after adding the other servers before shutting down the first master? Is it possible that it hadn't finished replicating all the data and reaching quorum? Unfortunately the various k3s controllers don't export any metrics or write many non-error logs so it's quite hard to figure out what's going on, but I wonder if dqlite wasn't ready to run without the master when you shut it down.

@mikklfr
Copy link

@mikklfr mikklfr commented Feb 9, 2020

Hey ! I'm also having some issues by stress testings the nodes in HA embedded with 3 masters and 2 workers.
There is one open PR that seems to be related: #1377

@philkry
Copy link

@philkry philkry commented Feb 25, 2020

I am facing the exact same issue. Kinda defies the purpose of HA entirely as the first node becomes single point of failure.
I cannot add much more info than @eleaner already did. my logs look pretty much the same😕

@Kampe
Copy link

@Kampe Kampe commented Feb 27, 2020

I see the same issues with my HA cluster, raft gets really upset when you start knocking out master nodes, especially if one happened to be the leader at the time.

@castironclay
Copy link

@castironclay castironclay commented Mar 2, 2020

I am also experiencing the same problem on a 2 master ARM cluster. I even have a floating IP using heartbeat so the advertised IP of the cluster has moved over to Master2. I attempted to stop/start k3s on Master2 but it hangs and never successfully resumes. After starting the service back up on Master1 I can start the service on Master2.

@wilmardo
Copy link

@wilmardo wilmardo commented Mar 19, 2020

@castironclay 2 master nodes does not have a quorum, please see:
#1407

@txomon
Copy link

@txomon txomon commented Mar 22, 2020

I have hit this bug in my k3s setup in DO. I have 2 nodes in EU and 1 in US. I started the cluster with the node in US, and then added the other two nodes.

After stopping k3s service in US node, cluster became unresponsive, and although after restarting the service a few times workloads have shifted to the other 2 nodes, it seems that dqlite is still looking for the US node as a leader, and is incapable of handing over the leader role.

Is there any information you are currently lacking?

@fabiodcorreia
Copy link

@fabiodcorreia fabiodcorreia commented Apr 15, 2020

I have something similar with the latest version but exactly the same. I start 3 node master with vagrant. First start node1 wait to boot then node2 and 3.

When I shutdown node1 the nodes 2 and 3 keep working but soon as I shutdown node 2 then on node 3 I get this error every time I call kubectl. "failed to create dqlite connection: no available dqlite leader server found"

If I only have 2 nodes for example node 1 and 2 and I shutdown node 1 the same happens

@brontide
Copy link

@brontide brontide commented Apr 26, 2020

Ran into this as well, take down the initial node, the remaining masters are unable to determine a leader. I feel like this might be a problem with the documentation as the quickstart for dqlite is different from the rest of the multi-master configs because there is no fixed registration address, just instructions to point other nodes to the initial node by IP address. The remaining server processes are unable to mesh because they all point to one master and can't "figure it out".

@philkry
Copy link

@philkry philkry commented Apr 26, 2020

@brontide I also noticed the different documentation so i setup a virtual IP with keepalived and pointed the individual nodes to that IP upon cluster creation, however the behaviour stayed the same, even though the IP switched to another node when i took the first node offline. the nodes fail to find a leader..

@brontide
Copy link

@brontide brontide commented Apr 26, 2020

@philkry yeah, nodes would not even come up. Pretty clear that the bootstrapping of the dqlite cHA cluster is lacking. I've gone back to single master for now just to get off the ground.

@srgvg
Copy link
Contributor

@srgvg srgvg commented Apr 27, 2020

@brontide @philkry I can confirm, I set up three master nodes, with a loadbalancer in front of them, used as virtual ip to join them. Problem persists then, too.

@stellirin
Copy link
Contributor

@stellirin stellirin commented Apr 27, 2020

I tried working around this by chaining the master nodes (master3 -> master2 -> master1) and it failed completely, dqlite was continuously restarting on each node shortly after I connected the third node.

Actually I was hoping to make a ring of 5 master nodes this way, but it was already broken with three nodes so I gave up.

@brontide
Copy link

@brontide brontide commented Apr 27, 2020

I'm hoping that they eventually fix the underlying bootstrapping issue in the meantime I've switched back to single master and looking to backup the db via script in case I ever need to rebuild or replace the master. Seems like they have a good idea but the execution is still lacking.

@remkolems
Copy link

@remkolems remkolems commented May 10, 2020

Update 11 May 2020: still checking out if HA is indeed true by analyzing log files and testing.

Update: unsure if fully solved with v1.18.2+k3s1

  • Starting, restarting and stopping of k3s services seems much more robust across the master nodes.
  • Command used to install v1.18.2 -> sudo curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest INSTALL_K3S_EXEC="--flannel-backend=wireguard" sh -

I also have the same issue (/w v1.17.4+k3s1).

Question:
What is or should be the proper k3s.yaml config in a three node master HA with DQLite? Can't seem to find the proper location for documentation.

My k3s.yaml on all three master nodes (they are identical):

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <removed>
    server: https://127.0.0.1:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    password: <removed>
    username: admin

Isn't the --server part the problem? All my other subsequent master nodes starts/executes with:
ExecStart=/usr/local/bin/k3s server --flannel-backend=wireguard --server=https://k3s-mn-01.biz:6443 --token=<removed>

This is according to https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/

My first master node starts with:
ExecStart=/usr/local/bin/k3s server --flannel-backend=wireguard

@jamesorlakin
Copy link

@jamesorlakin jamesorlakin commented May 10, 2020

I can confirm a new cluster running v1.18.2+k3s1 on 3 Pi 4s exhibits this issue. I spun this up using k3sup but all the behaviours seem the same

Like @remkolems I'm not sure how later nodes should be joined in terms of configuration. The init script starts k3s with --server https://192.168.0.103:6443 (the first/initial node with --cluster-init). Does that mean it only ever looks for this node?

I would guess not, as /var/lib/rancher/k3s/server/db/state.dqlite/peers.db contains all the other peers, and logs talk about connecting WebSockets to other peers but nothing seems immediately dqlite-specific otherwise (in terms of reporting status etc).

(Also at the time of writing my node 2 appears to be whining about not finding node 3, all after rebooting node 1 - level=warning msg="no known leader address=192.168.0.105:6443 attempt=0". A simple find in the codebase shows this is from dqlite itself. kubectl get nodes reports everything as good though.)

@remkolems
Copy link

@remkolems remkolems commented May 10, 2020

I can confirm a new cluster running v1.18.2+k3s1 on 3 Pi 4s exhibits this issue. I spun this up using k3sup but all the behaviours seem the same

Like @remkolems I'm not sure how later nodes should be joined in terms of configuration. The init script starts k3s with --server https://192.168.0.103:6443 (the first/initial node with --cluster-init). Does that mean it only ever looks for this node?

I would guess not, as /var/lib/rancher/k3s/server/db/state.dqlite/peers.db contains all the other peers, and logs talk about connecting WebSockets to other peers but nothing seems immediately dqlite-specific otherwise (in terms of reporting status etc).

(Also at the time of writing my node 2 appears to be whining about not finding node 3, all after rebooting node 1 - level=warning msg="no known leader address=192.168.0.105:6443 attempt=0". A simple find in the codebase shows this is from dqlite itself. kubectl get nodes reports everything as good though.)

Just tested by restarting my first node, during restart/after successful restart of node (and services) used the following command multiple times on all other master nodes:

sudo kubectl get nodes -o wide

I still get the expected results (Ready/NotReady/Ready). Although, I didn't dive into the log files yet.
At what log files should we look at precisely to validate k3s?

image

Even after restarting my second node, I get the above result.

System specification
UP Squared ATOM Quad Core (memory 4GB / eMMC 32GB)
image

@philkry
Copy link

@philkry philkry commented May 11, 2020

behaviour changed for me with v1.18.2+k3s1 as well, but still no HA.
when i restart the first node, the cluster reassembles successfully after the node is back up (was not the case with previous version), but during the restart the control plane was not available.

during the restart logs are flooded with

May 11 10:06:25 node3 k3s[21207]: time="2020-05-11T10:06:25.160331591+02:00" level=warning msg="server unavailable err=failed to establish network connection: tls dial: dial tcp 192.168.64.3:6443: connect: connection refused address=192.168.64.3:6443 attempt=20"

followed by

May 11 10:06:26 node3 k3s[21207]: I0511 10:06:26.557062   21207 trace.go:116] Trace[725908169]: "Create" url:/api/v1/namespaces,user-agent:k3s/v1.18.2+k3s1 (linux/amd64) kubernetes/698e444,client:127.0.0.1 (started: 2020-05-11 10:05:52.547149949 +0200 CEST m=+135.705775041) (total time: 34.009764336s):
May 11 10:06:26 node3 k3s[21207]: Trace[725908169]: [34.009764336s] [34.009468004s] END
May 11 10:06:26 node3 k3s[21207]: E0511 10:06:26.559413   21207 controller.go:207] unable to create required kubernetes system namespace kube-system: Timeout: request did not complete within requested timeout 34s
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.869245125+02:00" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.871508631+02:00" level=error msg="Remotedialer proxy error" error="websocket: bad handshake"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.881083327+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=49"
May 11 10:06:27 node3 k3s[21207]: time="2020-05-11T10:06:27.881965120+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=68"

...

May 11 10:06:29 node3 k3s[21207]: I0511 10:06:29.169976   21207 trace.go:116] Trace[513043459]: "List etcd3" key:/minions,resourceVersion:,limit:500,continue: (started: 2020-05-11 10:05:29.165133706 +0200 CEST m=+112.323763994) (total time: 1m0.004767816s):
May 11 10:06:29 node3 k3s[21207]: Trace[513043459]: [1m0.004767816s] [1m0.004767816s] END
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.175960   21207 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.177627   21207 writers.go:105] apiserver was unable to write a JSON response: http: Handler timeout
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.178796   21207 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}
May 11 10:06:29 node3 k3s[21207]: E0511 10:06:29.179879   21207 writers.go:118] apiserver was unable to write a fallback JSON response: http: Handler timeout
May 11 10:06:29 node3 k3s[21207]: I0511 10:06:29.182012   21207 trace.go:116] Trace[16562761]: "List" url:/api/v1/nodes,user-agent:kubectl/v1.18.2+k3s1 (linux/amd64) kubernetes/698e444,client:127.0.0.1 (started: 2020-05-11 10:05:29.164887063 +0200 CEST m=+112.323517550) (total time: 1m0.016441022s):
May 11 10:06:29 node3 k3s[21207]: Trace[16562761]: [1m0.016441022s] [1m0.016235552s] END
May 11 10:06:29 node3 k3s[21207]: time="2020-05-11T10:06:29.188705508+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=41"
May 11 10:06:29 node3 k3s[21207]: time="2020-05-11T10:06:29.293854107+02:00" level=warning msg="no known leader address=192.168.64.3:6443 attempt=39"

until the first node is back up, i cannot connect on any of the remaining master nodes:

ubuntu@node2:~$ sudo kubectl get nodes
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

@cubic3d
Copy link

@cubic3d cubic3d commented May 18, 2020

I assume #1770 meant to replace dqlite by the embedded version of etcd - so I don't think any further fixes will be made to dqlite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development
To Triage
Development

No branches or pull requests