Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goCQL driver accepts wrong case datacenter which disables lwt partition routing optimization #201

Closed
pdbossman opened this issue Jun 29, 2024 · 14 comments
Assignees

Comments

@pdbossman
Copy link

Please answer these questions before submitting your issue. Thanks!

ScyllaDB 2024.1.4
goCQL v1.14.1

When data center in schema is lower case and goCQL is passed an uppercase dc, lwt routing optimization is lost.
Eg.
cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC1"))

I recommend you set it as this:
cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("dc1"))

goCQL should error with incorrect data center name.

cc: @hopugop can you attach the reproducer please.

@roydahan
Copy link
Collaborator

@mykaul what's "high" here?
To issue an ERROR message when the DC name isn't correct?

@roydahan
Copy link
Collaborator

@sylwiaszunejko let's start by understanding what other drivers do and why gocql swallows it or fallback to this kind of behavior (could be that someone asked for it).

@hopugop
Copy link

hopugop commented Jun 30, 2024

For my tests I created a 3-node cluster in a DC named datacenter1.
Here's the reproducer:

package main

import (
	"context"
	"fmt"
	"log"
	"github.com/gocql/gocql"
)

func main() {
	cluster := gocql.NewCluster("localhost:9042")

        // Correct datacenter name, should work as expected - routes LWT to the primary replica.
	//cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("datacenter1"))

        // Incorrect case, will fallback to route LWTs to all replicas.
	cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DaTaCeNtEr1"))

	session, err := cluster.CreateSession()
	defer session.Close()

	ctx := context.Background()
	session.Query("create keyspace example with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }").WithContext(ctx).Exec()
	session.Query("create table example.my_lwt_table(pk int, ck int, value text, PRIMARY KEY(pk,ck))").WithContext(ctx).Exec()

	for i := 1; i <= 10000; i++ { // 10k is enough to see impact on metrics over several seconds
		m := make(map[string]interface{})
                // Send all writes to partition "1", but different CKs.
		applied, err := session.Query("INSERT INTO example.my_lwt_table (pk, ck, value) VALUES (?, ?, ?) IF NOT EXISTS", 1, i, "a").WithContext(ctx).MapScanCAS(m)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Println(i, applied, m)
	}
}

@sylwiaszunejko
Copy link
Collaborator

@pdbossman do I understand correctly that you want the driver to check id the dc passed to the policy exists in the cluster? I checked in other drivers (python-driver, rust driver, java-driver 4.x) and I can't find any checking like that. I guess every driver assumes that the user will pass the dc in the correct case.

@roydahan
Copy link
Collaborator

roydahan commented Jul 1, 2024

In the original issue, someone claimed that with Python driver it immediately fails with an error.

@sylwiaszunejko
Copy link
Collaborator

@roydahan Could you provide error message? I cannot find this part of the code

@roydahan
Copy link
Collaborator

roydahan commented Jul 1, 2024

No one provided the error message.
Best would be to try quickly reproduce it with Python driver and see it.

@sylwiaszunejko
Copy link
Collaborator

OK, I see. In python-driver it will fail, but not because the dc name provided is wrong, but because in python-driver there is an additional field in the DCAwareRoundRobinPolicy named used_hosts_per_remote_dc and it is 0 by default, so if there is no hosts in the local dc (it is because the name is wrong) the driver cannot contact any node so the request will fail. But if we use different number e.g. TokenAwarePolicy(DCAwareRoundRobinPolicy("WRONG_DC", used_hosts_per_remote_dc=3)) everything will work as in gocql.

In java-driver 4.x there is also similar mechanism as in python-driver, there is a field usedHostsPerRemoteDc so if it is set to 0, request will fail but not because of the wrong dc.

In rust driver there is permit_dc_failover option in the DefaultPolicy. If it is set to true driver can use replicas in dcs other than local. By default it is false.

So rather than checking if the dc is correct we need some kind of control over the use of remote dcs.

@pdbossman
Copy link
Author

The behavior that was most undesirable was the loss of lwt optimized routing. This resulted in extremely high contention and it was very difficult to identify the cause. In addition, monitoring was displaying the dc as upper case data center, while the data center was defined as lower case in the schema.

@mykaul
Copy link

mykaul commented Jul 3, 2024

@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.

@sylwiaszunejko
Copy link
Collaborator

@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.

After your comments here #202 I thought that you agree that by default the driver should use remote hosts if there is no host available in the DC provided, now I am confused to be honest.

Or do you mean we should add checking if the DC name provided is present in the cluster even though there is no such check in any other driver?

@mykaul
Copy link

mykaul commented Jul 3, 2024

@sylwiaszunejko - the regretful default behavior should be not to connect to the DC, if the DC name is different than entered. Let's make sure this is the case.

After your comments here #202 I thought that you agree that by default the driver should use remote hosts if there is no host available in the DC provided, now I am confused to be honest.

Or do you mean we should add checking if the DC name provided is present in the cluster even though there is no such check in any other driver?

Indeed, the latter. It's a configuration error that should be caught immediately and fixed by the user.

Other drivers might need fixes as well.

@roydahan
Copy link
Collaborator

roydahan commented Jul 3, 2024

Even if we choose to do this enhancement of checking the DC name, it should be a new RFE, cross drivers that should start a discussion and identify possible corners we don't think of.

I think that the current PR should be merged as is, without changing the default and possibly breaking availability for current users.

@Lorak-mmk
Copy link

I think this was fixed by #206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants