Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session hosts are not being updated #1668

Closed
sseidman opened this issue Jan 6, 2023 · 1 comment · Fixed by #1669 or DataDog/gocql#1
Closed

Session hosts are not being updated #1668

sseidman opened this issue Jan 6, 2023 · 1 comment · Fixed by #1669 or DataDog/gocql#1

Comments

@sseidman
Copy link
Contributor

sseidman commented Jan 6, 2023

What version of Cassandra are you using?

3.11.13 deployed on cloud infrastructure

What version of Gocql are you using?

1.3.1

What version of Go are you using?

go version go1.19.2 darwin/arm64

What did you do?

Created a simple application that creates 500 concurrent gocql sessions, all of which are connected to the same cassandra cluster composed of 3 nodes where each node is located on its own cloud instance. Sessions are connecting to the cluster via a service address instead of actual IP's. Each session is continuously executing read queries on the cluster. All cassandra nodes are sequentially replaced by new cloud instances resulting in the same 3 node cluster, but with new IP's and host ID's.

What did you expect to see?

The expectation is that the gocql sessions would be updated with Topology Change Events and update the session pool to include the new IP/Host ID and remove the IP/Host ID of the node that is removed. As Cassandra nodes are replaced, we don't expect to see any loss in connection from clients as they maintain an updated view of the cluster ring.

What did you see instead?

By the time the last Cassandra node is replaced, all client sessions were unable to successfully complete any query and instead returned the following error: gocql: no hosts available in the pool. Clients also returned the following errors:

2023/01/06 00:24:42 gocql: unable to dial control conn {IP_1}:9042: dial tcp {IP_1}:9042: connect: connection refused
2023/01/06 00:24:42 gocql: unable to dial control conn {IP_@}:9042: dial tcp {IP_2}:9042: connect: connection refused
2023/01/06 00:24:42 gocql: control unable to register events: dial tcp {IP_2} connect: connection refused

The IP addresses in the above logs were the IP addresses of the original Cassandra nodes before replacement and not the current values for the nodes.

I tested the same application using various versions/commits of gocql and believe that this error was introduced when the following commit was merged #1632. The same application run with versions of gocql before that commit maintain updated session pools and maintain connectivity as cassandra nodes are replaced.

@mr-andreas
Copy link

I think that this is the same problem that I found and documented in #1582 (comment) and #1582 (comment). I also ended up identifying 64cda7b as the commit that introduced the problem.

The bug is consistently reproducible by following the steps in the first comment I linked above :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants