Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster connect/reconnect timeout #7131

Open
ptorrent opened this issue Nov 14, 2023 · 1 comment
Open

cluster connect/reconnect timeout #7131

ptorrent opened this issue Nov 14, 2023 · 1 comment

Comments

@ptorrent
Copy link

ptorrent commented Nov 14, 2023

Hello there,

Yesterday night one of my server lost internet connection for about 2 hours. I've the feeling that after a while, rethinkdb don't try to reconnect to other server.

As there is very low logs output, it's hard to know what the server is trying to do...

This is rethinkdb output when this happen:

INTERNET LOST ! (rethinkdb is restarted by us)

2023-11-14 01:08:14.875 starting => -d /rethinkdb_data --cache-size 4096 --canonical-address 27.0.0.1 --canonical-address 127.0.250.2 --join 127.0.250.1:29015 --join 127.0.250.6:29015 --join 127.0.250.9:29015 --join 127.0.250.10:29015 --join 127.0.250.12:29015 --server-name YYYY --http-port 30620 --bind-http 26.0.0.1 --bind-cluster 127.0.0.1 --bind-driver 127.0.0.1
2023-11-14 01:08:14.931 stderr WARNING: ignoring --server-name because this server already has a name.
2023-11-14 01:08:14.982 stdout Running rethinkdb 2.4.2 (GCC 10.2.1)...
2023-11-14 01:08:14.984 stdout Running on Linux 5.10.0-26-amd64 x86_64
2023-11-14 01:08:14.985 stdout Loading data from directory /rethinkdb_data
2023-11-14 01:08:15.047 stdout Listening for intracluster connections on port 29015

INTERNET READY

2023-11-14 03:54:58.563 stdout Connected to server "XXXX" ae839fc7-0a92-4c80-8863-cfecdf5b3e40
2023-11-14 03:54:59.232 stdout Listening for client driver connections on port 28015
2023-11-14 03:54:59.232 stdout Listening for administrative HTTP connections on port 30620
2023-11-14 03:54:59.232 stdout Listening on cluster addresses: 127.0.0.1, ::1
2023-11-14 03:54:59.232 stdout Listening on driver addresses: 127.0.0.1, ::1
2023-11-14 03:54:59.233 stdout Listening on http addresses: 26.0.0.1, 127.0.0.1, ::1
2023-11-14 03:54:59.233 stdout To fully expose RethinkDB on the network, bind to all addresses by running rethinkdb with the `--bind all` command line option.
2023-11-14 03:54:59.234 stdout Server ready, "YYYY" 2581ed80-cd44-4113-a55e-20c8dfb2d08a
2023-11-14 03:54:59.448 stdout Connected to server "ZZZZ" 82fb7ce4-ad40-41b0-a1fd-0a327fe65f62
2023-11-14 03:54:59.479 stdout Connected to proxy proxy-204e4c67-e93f-402c-b341-4cc1267b5898
2023-11-14 03:54:59.899 stdout Connected to proxy proxy-ff104d35-5312-40d0-b832-1bafd82ca248
2023-11-14 03:55:00.129 stdout Connected to proxy proxy-e813331b-41d9-48d6-807b-1a1585d6678f
2023-11-14 03:55:04.675 stderr warn: We were unable to connect to the following peers, or the --join address does not match the peer's canonical address: 127.0.250.6:29015, 127.0.250.9:29015, 127.0.250.10:29015

NOT RECONNECTING UNTIL

7:30 am (time I restart rethinkdb)

Process restart at 01:08 am because this is a automatic restart from our side if the cluster disconnect for to long time. Then no internet conncetion during 2hours. At 03:54 seems server has internet connection. All proxy are able to connec and it seems that the server is connected now to 2 others cluster servers (ZZZZ and XXXX). But what about others server ? Missing 3 cluster servers....

This is our rethinkdb network:

3x proxy
6x cluster server

To make it works again, I've to restart rethinkdb

Any idea ?

THanks for your support

@ptorrent ptorrent changed the title cluster-reconnect-timeout cluster connect/reconnect timeout Nov 14, 2023
@ptorrent
Copy link
Author

ptorrent commented Nov 21, 2023

Other point, if cluster server are not available at rethinkdb startup (in proxy mode). Rethinkdb will never try to reconnect again.

stderr warn: We were unable to connect to the following peers, or the --join address does not match the peer's canonical address: 127.0.250.6:29015, 127.0.250.9:29015, 127.0.250.10:29015

Rethinkdb will never connect again on 127.0.250.6:29015, 127.0.250.9:29015, 127.0.250.10:29015.

I thinks it's a big issue... I did a lot of tests with TCP proxies. I've the feeling that proxies must be connected at least once for be able to reconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant