Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to connect to nodes through proxy #97

Merged
merged 3 commits into from
Jun 30, 2022

Conversation

zimnx
Copy link
Collaborator

@zimnx zimnx commented Jun 6, 2022

Services deployed in cloud often are hidden behind proxy which
dispatches connections based on Server Name Identifier (SNI)
taken from TLS Client Hello packet.
New method of creating ClusterConfig - NewCloudCluster - allows to
connect to nodes behind SNI proxy based on provided configuration file.

Because each datacenter may have different TLS configurtion (CA, proxy
address etc), more granular method of configuring connection details was
needed. CloudCluster use special HostDialer which connect to nodes using
information taken from HostInfo (datacenter, host_id) to go through SNI
proxy.

Currently driver identifies nodes based on their broadcasted IP
address. In cloud case, broadcasted IP addresses are private and are not
meant to be used as a contact point, and they may change overtime.
Hence driver internals were changed to identify nodes based on their
host_id which is unique per node and it's persistant throught entire
node lifecycle.
Because CQL Events are still using broadcasted IP addresses, driver
keep mapping between already known IP addresses and host_ids.

cloud_config.go Outdated Show resolved Hide resolved
cloud_config.go Outdated
ProxyURL string `yaml:"proxyUrl,omitempty"`
}

type Context struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Cloud prefix to everything defined in the cloud context or move the code to cloud package for namespacing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to cloud pkg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed pkg dir, cloud is now in root dir

cloud_config.go Outdated
func (cc *CloudConnectionConfig) GetDatacenterCAPool(datacenterName string) (*x509.CertPool, error) {
dc, ok := cc.Datacenters[datacenterName]
if !ok {
return nil, fmt.Errorf("datacenter %s not found in cloud connection config", datacenterName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use %q everywhere, if you want to be super nice you can list available DoCs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cloud_config.go Outdated Show resolved Hide resolved
common_test.go Outdated Show resolved Hide resolved
@mmatczuk
Copy link

mmatczuk commented Jun 7, 2022

In a library there is no need to add pkg

// Servers may infer this from the endpoint the client submits requests to.
// In CamelCase.
// +optional
Kind string `yaml:"kind,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it? I find no usages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Kind and APIVersion will be used to distiguish version of configuration. It's not decided yet how we are going to call it (Kind), and from which version (APIVersion) we are going to start with (v1alpha1, v1beta1, v1 etc). Once we progress with API, client code would be able to parse just these two fields to find out which version of API is used and pass right data structure for unmarshalling.

// Servers should convert recognized schemas to the latest internal value, and
// may reject unrecognized values.
// +optional
APIVersion string `yaml:"apiVersion,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

Comment on lines 26 to 28
Contexts map[string]*Context `yaml:"contexts"`
// CurrentContext is the name of the context that you would like to use by default.
CurrentContext string `yaml:"currentContext"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of dynamically changing the context?
It is not even a use case if you thy changing it there is a race.

Copy link
Collaborator Author

@zimnx zimnx Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meant to be dynamic. This is representation of configuration file taken from disk which is read at a time of when connection to cluster is created, usually on application startup.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.
See my suggestion on making this package capable of producing gocql.Cluster and being just a helper.

Comment on lines 49 to 52
Username string `yaml:"username,omitempty"`
// Password is the password for basic authentication to the Scylla cluster.
// +optional `
Password string `yaml:"password,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the username and password from Authenticator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can overwrite it after calling NewCloudCluster. This is meant to set up a PasswordAuthenticator from configuration file without any code change.

Comment on lines 91 to 112
type Parameters struct {
// DefaultConsistency is the default consistency level used for queries.
// +optional
DefaultConsistency ConsistencyString `yaml:"defaultConsistency,omitempty"`
}

type ConsistencyString string

// just AnyConsistency etc is better, but there's already SerialConsistency defined elsewhere.
const (
DefaultAnyConsistency ConsistencyString = "ANY"
DefaultOneConsistency ConsistencyString = "ONE"
DefaultTwoConsistency ConsistencyString = "TWO"
DefaultThreeConsistency ConsistencyString = "THREE"
DefaultQuorumConsistency ConsistencyString = "QUORUM"
DefaultAllConsistency ConsistencyString = "ALL"
DefaultLocalQuorumConsistency ConsistencyString = "LOCAL_QUORUM"
DefaultEachQuorumConsistency ConsistencyString = "EACH_QUORUM"
DefaultSerialConsistency ConsistencyString = "SERIAL"
DefaultLocalSerialConsistency ConsistencyString = "LOCAL_SERIAL"
DefaultLocalOneConsistency ConsistencyString = "LOCAL_ONE"
)
Copy link

@mmatczuk mmatczuk Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use consistency settings in the driver?
Also SerialCosistency and Consistency are distinct things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver consistency settings use uint16, where in configuration file this option should be human readable. It is unmarshalled to uint16 Consistency here: https://github.com/zimnx/gocql/blob/e6c71c39c4fba48984118757290a2697cf1105dc/cluster.go#L315-L321

Yeah now i see there're two fields for query consistency, then I guess Config API should reflect that too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added second consistency to config and validation

return caPool, nil
}

func (cc *ConnectionConfig) GetInitialContactPoints() []string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a naming mismatch here and in Datacenter.Server

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial contact point is something that is related to how drivers works, initially they contact a list of endpoints and discover peers from system table. From API point of view, it's just a address to datacenter hence name mismatch.

ring.go Outdated
@@ -15,7 +14,7 @@ type ring struct {

// hosts are the set of all hosts in the cassandra ring that we know of
mu sync.RWMutex
hosts map[string]*HostInfo
hosts map[UUID]*HostInfo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change could go to a separate commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ring.go Outdated
Comment on lines 41 to 46
for _, hi := range r.hosts {
if hi.connectAddress.String() == ip {
return hi, true
}
}
return nil, false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely this code is in a driver hot loop did you profile the changes?
Use something with O(1) runtime.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now it's in events only so no a biggie I guess. Still it would be nice to preserve the mapping from IP to ID and fallback to getHost.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

events.go Outdated
@@ -159,57 +159,57 @@ func (s *Session) handleNodeEvent(frames []frame) {
}
}

for _, f := range events {
for _, event := range events {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remain the orig name? It would be easier to maintain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer a frame, but parsed event hence f doesn't have anything in common with variable semantic. I stumbled upon it, and fixed to maintain higher quality of code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that but it also modifies lines that would not be otherwise modified and thus makes it harder to review / maintain in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brought it back

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to submit the renaming in a separate commit/pull request though, we can update it both in gocql/gocql and scylladb/gocql.

events.go Outdated
Comment on lines 227 to 230
if ok && host.IsUp() {
if host.IsUp() {
return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner if looks like a dead code

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bugf/leftover, fixed

// Get host info and apply any filters to the host
hostInfo, err := s.hostSource.getHostInfo(ip, port)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not rename variables if we do not have to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cluster.go Outdated
Comment on lines 257 to 289
connConf := &cloud.ConnectionConfig{}

bundleFile, err := os.Open(bundlePath)
if err != nil {
return nil, fmt.Errorf("can't open bundle path: %w", err)
}
defer bundleFile.Close()

if err := yaml.NewDecoder(bundleFile).Decode(connConf); err != nil {
return nil, fmt.Errorf("can't decode bundle file at %q: %w", bundlePath, err)
}

if _, ok := connConf.Contexts[connConf.CurrentContext]; !ok {
return nil, fmt.Errorf("current context points to unknown context")
}

confContext := connConf.Contexts[connConf.CurrentContext]

if _, ok := connConf.AuthInfos[confContext.AuthInfoName]; !ok {
return nil, fmt.Errorf("context %q auth info points to unknown authinfo", connConf.CurrentContext)
}

if _, ok := connConf.Datacenters[confContext.DatacenterName]; !ok {
return nil, fmt.Errorf("context %q datacenter points to unknown datacenter", connConf.CurrentContext)
}

authInfo := connConf.AuthInfos[confContext.AuthInfoName]

caPool, err := connConf.GetRootCAPool()
if err != nil {
return nil, fmt.Errorf("can't create root CA pool: %w", err)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the parsing to cloud pkg.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't due to cycle in imports - i'm reusing default values from NewCluster here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@martin-sucha
Copy link

Would something like kiwicom@5b9259c work for you?

@martin-sucha If you can merge it upstream we would rebase and use HostDialer.

I tried cherry-picking your commit and using it here, unfortuantely it's not going to work with our shard-aware ports where we are using custom dialer created in runtime. We would need to somehow integrate HostDialer with shard-aware port logic, but it's outside of scope of this PR.

Yes, I tried cherry-picking it last week, but didn't manage to post a response here. I have an unfinished experimental code to make the HostDialer work with shard-aware ports, will try to finish it and send as a separate PR.

@martin-sucha
Copy link

I've opened gocql#1629 and #98 for the HostDialer. Please let me know in those PRs what do you think about it.

@zimnx
Copy link
Collaborator Author

zimnx commented Jun 22, 2022

Rebased on top of #98, Endpoint structure was removed and code looks cleaner now.

Requires #98 before merge.

@martin-sucha
Copy link

@zimnx It seems commits

  • Fix race condition in HostInfo.HostnameAndPort
  • Identify nodes by their host_id instead broadcasted_address

would be useful upstream, would you please submit them to gocql/gocql as well?

@zimnx
Copy link
Collaborator Author

zimnx commented Jun 22, 2022

Done:
gocql#1632
gocql#1631

@martin-sucha
Copy link

Thanks!

@zimnx zimnx force-pushed the mz/scaas branch 7 times, most recently from 6aa6646 to 337e28e Compare June 29, 2022 10:10
Function changed HostInfo.hostname without holding write lock.
Added write lock around it.
Currently driver identifies nodes based on their broadcasted IP
address. In cloud case, broadcasted IP addresses are private and are not
meant to be used as a contact point, and they may change overtime.
Hence driver internals were changed to identify nodes based on their
host_id which is unique per node and it's persistant throught entire
node lifecycle.

Because CQL Events are still using broadcasted IP addresses, driver
keep mapping between already known IP addresses and host_ids.

Prepared statement cache key was also changed to host_id, to not
invalidate it upon IP change.
Services deployed in cloud often are hidden behind proxy which
dispatches connections based on Server Name Identifier (SNI)
taken from TLS Client Hello packet.
New method of creating ClusterConfig - NewCloudCluster - allows to
connect to nodes behind SNI proxy based on provided configuration file.

Because each datacenter may have different TLS configurtion (CA, proxy
address etc), more granular method of configuring connection details was
needed. CloudCluster use special HostDialer which connect to nodes using
information taken from HostInfo (datacenter, host_id) to go through SNI
proxy.
@mmatczuk
Copy link

LGTM

@mmatczuk mmatczuk self-requested a review June 30, 2022 09:49
@mmatczuk mmatczuk merged commit 751bff9 into scylladb:master Jun 30, 2022
fruch added a commit to fruch/scylla-ccm that referenced this pull request Jul 5, 2022
So the different drivers can be test their new
implementations of sni-proxy

for using it, need to start the cluster from the commandline
like this:
```
❯ ccm start --sni-proxy
sni_proxy listening on: 127.0.0.1:443
```

using it from python code, would be a bit diffrent:

```python
nodes_info = get_cluster_info(self.cluster.get_path(),
                              address=self.cluster.nodelist()[0].address(),
                              port=9142)
docker_id, listen_address, listen_port = \
  start_sni_proxy(self.cluster.get_path(), nodes_info=nodes_info)
```

Ref: scylladb/gocql#97
fruch added a commit to scylladb/scylla-ccm that referenced this pull request Aug 7, 2022
So the different drivers can be test their new
implementations of sni-proxy

for using it, need to start the cluster from the commandline
like this:
```
❯ ccm start --sni-proxy
sni_proxy listening on: 127.0.0.1:443
```

using it from python code, would be a bit diffrent:

```python
nodes_info = get_cluster_info(self.cluster.get_path(),
                              address=self.cluster.nodelist()[0].address(),
                              port=9142)
docker_id, listen_address, listen_port = \
  start_sni_proxy(self.cluster.get_path(), nodes_info=nodes_info)
```

Ref: scylladb/gocql#97
fruch added a commit to fruch/scylla-bench that referenced this pull request Oct 2, 2022
using code introduce in scylladb/gocql#97
for connect via sni_proxy to the serverless operator

Ref: scylladb/gocql@751bff9
fruch added a commit to scylladb/scylla-bench that referenced this pull request Nov 2, 2022
using code introduce in scylladb/gocql#97
for connect via sni_proxy to the serverless operator

Ref: scylladb/gocql@751bff9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants