Add support for establishing per-shard connections using Scylla shard-aware ports #52

piodul · 2020-08-28T16:22:46Z

In commit 1572b9e41cd02e8b676404ba72a549398d281a66, Scylla gained
support for so-called "shard-aware" ports. These ports support the same
Cassandra native protocol as the regular native transport ports.
However, unlike the regular ports, shard-aware ports allow the driver to
choose the shard that will handle the connection by choosing an
appropriate source port.

Currently, regular native transport ports employ a load-balancing
strategy that assigns, for an incoming connection, the shard that has
the least amount of connections assigned. Until now, the driver worked
by opening new connections until a single connection per shard is
established. When a connection to an already connected shard is opened,
it is not immediately closed - only after the number of opened
connections exceeds 10x the number of shards. This behavior does not
work well in some realistic scenarios:

When many shard-aware clients connect at once, the load-balancing
strategy will not work well. Clients will open a very large number of
connections before reaching all of the shards.
When there are non-shard-aware clients connected to a node, the
distribution of their connections per shard may be uneven. If one
shard is occupied by many more connections than the others, it will
prevent the current load-balancing strategy from assigning a
connection to that shard, unless a large number of connections to
other shards is established first.

Shard-aware ports allow the driver to completely avoid those problems by
choosing towards which shard it wishes to connect to. The first
connection is still established towards a regular native transport port,
and the following connections for remaining shards can be established
towards shard-aware ports.

This PR adds support for establishing per-shard connections using
the shard-aware ports.

piodul · 2020-08-28T16:32:40Z

connectionpool.go

+		// Something's wrong, we got a connection to the wrong shard.
+		// It may be caused by network configuration issues (e.g. NAT
+		// modifies the source port).
+		return errors.New("gocql: connected to a wrong shard; make sure that the source port is not translated by NAT")


I'm not sure what to do in this case. If there is a NAT that consistently changes the source port so that a wrong shard keeps being assigned, the driver will get stuck in a loop where it keeps opening new connections.

One idea I have is to temporarily disable using shard-aware port for new connections if misconfigured NAT is detected. This way the client will not get stuck in a loop, and it will give opportunity to switch to the better method when NAT gets fixed.

Is it a good idea? How long should the timeout be?

I added a hardcoded one-minute timeout. I also added a test which verifies that this fallback behavior works.

piodul · 2020-08-28T16:33:00Z

scylla.go

+	scyllaPortBasedBalancingMin = 0x8000
+
+	// the maximum port number that can be chosen for port-based load balancing
+	scyllaPortBasedBalancingMax = 0xFFFF


Should this range be configurable?

piodul · 2020-08-28T16:50:29Z

connectionpool.go

+					i--
+					continue
+				}
+			}


Go does not seem to provide separate bind and connect operations. The net.Dialer allows specifying source address (including port), and has Dial function which both binds to the local address and tries to establish a connection. It's a little awkward for our case, as we'd like to know that a bind succeeded, and then tried to connect.

The problem is compounded that the fact that gocql also provides a Dialer interface which allows providing custom logic for dialing. I provided an extension DialerExt which accepts a source port. I don't really like the fact that we need to try multiple times in search of the port, each time calling custom client logic, but it seems to work.

Perhaps a different interface would be better? E.g. we would provide an iterator over a range of acceptable source ports, and the DialerExt would differentiate bind error from other errors internally?

Apart from that, is it the right way to plug all of this into the retry logic?

piodul · 2020-08-28T17:07:50Z

I have some ideas for the integration tests:

Start 10 Sessions concurrently and observe that no excessive connections were closed.
Provide a malicious DialerExt that changes source port -> source port + 1 (so that it always connects to a wrong shard), and see how the driver works around that.
Observe that no connections are made to the shard-aware port if it is not enabled in config.

Previously, I had some doubts on how to approach doing them, but I think I have an idea. I'll try to send them over the weekend.

piodul · 2020-09-04T12:09:30Z

I added three tests according to the scenarios listed in my previous comment. Those three scenarios run in two setups: with mocked and real Scylla. I made the mocked version because AFAIK we don't have a released version yet that support shard aware ports.

piodul · 2020-09-04T12:13:15Z

@mmatczuk @zimnx I added tests for the mechanism of establishing per-shard connection. I consider this PR to be ready for review now - could you take a look?

I left some comments with questions about things that I had doubts about how they should be implemented.

PS: CI tests passed https://travis-ci.org/github/scylladb/gocql/builds/724127399

piodul · 2020-09-10T07:41:23Z

@mmatczuk @zimnx ping

zimnx · 2020-09-10T14:15:53Z

I don't like that non-scylla specific code (host connection pool, session) is bloated with scylla. It makes harder to keep fork updated with upstream changes.

Maybe I'm not seeing something but perhaps all logic around establishing connection to specific shard could be moved to scyllaDialer.
When control connection finds out that cluster is Scylla not Cassandra, we could wrap config dialer with our own.
ScyllaConnPicker is responsible for telling host connection pool how many connections it needs. So we are kinda guaranteed that scyllaDialer will be called enough times.

@piodul WDYT?

piodul · 2020-09-10T15:36:44Z

@zimnx Yes, I agree that it will be cleaner. I probably needlessly fixated on the idea that shard/port information should be passed from the top to the connect function. I also think it is possible to hide the shard assignment/port allocation logic inside the dialer.

Thanks for the push in the right direction, I'll update the PR according to your suggestion.

piodul · 2020-09-11T14:23:17Z

@zimnx

When control connection finds out that cluster is Scylla not Cassandra, we could wrap config dialer with our own.

After closer inspection it turns out that it's not that simple. We have a separate dialer for each session, not for each host. The Dialer.DialContext method is called from within a method of the Session object, and we have no information about the host pool there. To connect to the shard aware port, the dialer needs to know what the shard aware port is, and towards which shards we do not have a connection yet.

I still think that it is possible to reduce the amount of changes in non-scylla specific code. I'll try to move the logic of allocating shard number to connect to outside hostConnPool. I don't know how to get rid of scyllaConnParams, though.

piodul · 2020-09-17T05:00:43Z

@zimnx I adjusted the PR so that it intersects the upstream code less. I could not get rid of the scConnParams due to the reasons in my previous comment. If you have any suggestions how to decrease changes in non-Scylla specific code - I'm all ears.

Tests are green: https://travis-ci.org/github/scylladb/gocql/builds/727829237

cluster.go

conn.go

scylla.go

Allows customizing what data will be included by TestServer in SUPPORTED message. This is done by supplying a testSupportedFactory function, which takes a connection and returns data to be included in SUPPORTED message for that connection. Because the driver gets information about the shard assigned to connection through a SUPPORTED message, this change enables using the TestServer mock in tests which verify that establishing a connection per shard works.

Until now, HostInfo translated its connectAddress using the AddressTranslator on the first occasion, and did not keep the untranslated address. Because the AddressTranslator interface translates not only an IP, but a pair of IP and a port, this interface could also be used to translate the connect address paired with the shard-aware port. We learn about the shard-aware port only after the first connection is established, which means that the translation of shard-aware address can only be done much later after the original connectAddress - hence the need to keep the untranslatedAddress.

scyllaConnPicker will keep information about the shard-aware port of the host that it is associated with. It will be used later to establish further connections using the shard-aware port. The shard-aware address (connect address + shard-aware port) is translated using the AddressTranslator interface.

Adds an object which, for a given shard and total shard count, will allow to iterate over source ports which can be used to connect to that particular shard.

Implements the Dialer interface for the scyllaConnPicker. scyllaConnPicker has all the information needed to connect to the shard aware port of the host that it is associated with. It will be used in later commits exactly for that purpose. Additionally, adds the framework for implementing own shard-aware dialers: ScyllaShardAwareDialer (a net.Dialer wrapper), and ScyllaGetSourcePort (which can be used in completely custom dialers to get information on the source port from the context).

In some rare cases, it might not be possible to correctly use the shard-aware port feature, e.g. due to configuration issues. This commit adds an opt-out switch to disable the new functionality and switch back to the non-shard-aware native transport port.

This commit actually causes the scyllaConnPicker to be used as a dialer for establishing new connections.

Up until now, for a particular host, each Conn used to have the same remote address. Now, it's no longer true because, for a host, we first establish a connection to the non-shard-aware port, and then the rest of the connections is established to the shard-aware port - therefore, both kinds of connections have different remote addresses. This had an undesirable effect on the prepared statement cache - it used to identify prepared statements by a key that included the remote address of the connection, which resulted in the statement cache being effectively reduced by half. This commit changes that - now, instead of using the remote address in the key, (*HostInfo).HostnameAndPort() is used, which indicates the non-shard-aware address of the replica.

… description

This commit reorganizes the logic responsible for establishing connections to the shard-aware port and makes it more robust. Now, gocql will temporarily fall back to using the non-shard-aware port when it detects a situation in which the shard-aware port cannot be properly used - which is either when the shard-aware port is unreachable, or the source port is translated by NAT and connection to the wrong shard is established. README.md is updated in order to explain the falling-back behavior.

This commit adds a test which checks that the driver properly falls back to using the non-shard-aware port when the shard-aware port is unreachable.

piodul · 2020-10-30T16:53:09Z

@mmatczuk Sorry for the delay. I pushed an updated version with fixes to review comments. All fixes are in new, mostly separate commits.

Apart from that, I would like to once more discuss the idea of falling back to the use of non-shard-aware port when we detect a case in which the shard-aware port cannot be used. I preemptively updated my PR so that it implements the fall back behavior.

These are the cases that I know of when the shard-aware port won't be usable:

The shard-aware port is unreachable (but the non-shard-aware port is reachable),
The shard-aware port is behind NAT which translates source ports,
User did not update their custom implementation of Dialer.

If we won't fall back to the non-shard-aware port but the driver keeps trying to use it, we may get stuck trying to fill hostConnPools but won't succeed in doing so. Some users who don't know about the feature might get annoyed if they upgrade their client and the cluster and will suddenly get worse performance. Even if we warn about it in the logs, it might be inconvenient to have to disable the shard-aware port manually.

In case of 1, we can try connecting to the shard-aware-port first, and if it doesn't succeed, then we can immediately try the non-shard-aware port. This kind of problem can't be detected without false positives, so I think it's better always to try both ports than disable the port temporarily.
In case of 2 and 3, we can stop calling the shard-aware port for a while if we detect that a wrong shard is chosen. This is surely a problem with configuration, so it's expected to persist for some time, and it's easier to implement via timeout than retry.

I documented those issues inside README.md so that the user can become aware of them when they read it. I also print a warning in case situation 2 or 3 is detected (I don't do it for 1 because there is no way to do it without false positives).

Please let me know what you think. I can change it back to not using the fall backs, but I think that the driver will behave a little nicer now.

martin-sucha · 2020-11-01T11:02:23Z

README.md

+- If you are using a custom type implementing `gocql.Dialer`, you can get the source port by using the `gocql.ScyllaGetSourcePort` function:
+  ```go
+  func (d *myDialer) DialContext(ctx context.Context, network, addr string) (net.Conn, error) {
+	  sourcePort := gocql.ScyllaGetSourcePort(ctx)


This hides the source port parameter to the context. Context values should only hold information that is not necessary for correct operation of the code (this basically boils down to information for metrics and tracing).

Why not introduce a new interface instead? That way we could disable the shard aware port at session creation if the dialer does not implement the interface:

type LocalAddressDialer interface { // DialWithLocalAddress establishes a new connection but the local address must be set to the provided value. // You can use net.Dialer.LocalAddr in your implementation. DialWithLocalAddress(ctx context.Context, network, localAddr, remoteAddr string) (net.Conn, error); }

I see DialerExt mentioned in some previous comments, but I don't see it in the code now. Why has it been removed?

Also, we could provide some error value for LocalAddressDialer to return so that we can differentiate bind errors from connect errors (and retry with backoff in the latter case).

Ah, I missed comment #52 (comment) previously.

I still think having an interface is better than putting the port into context. We still can have the current ScyllaShardAwareDialer - it could implement the DialWithLocalAddress interface and still work as intended (for wrapping net.Dialer).

@martin-sucha

This hides the source port parameter to the context. Context values should only hold information that is not necessary for correct operation of the code (this basically boils down to information for metrics and tracing).

Why not introduce a new interface instead? That way we could disable the shard aware port at session creation if the dialer does not implement the interface:

Technically speaking, the source port parameter is optional. From the perspective of the driver there is no difference if the application did not update their Dialer to connect from specified source port, or it's just behind NAT - new connections will be assigned to wrong shards. The driver will fall back to using non-shard-aware port if it realizes that connections are being handled by wrong shards.

I see your point that introducing a LocalAddressDialer interface will allow the driver to check if the Dialer supports setting source port so that it won't bother trying to connect to shard-aware port if it doesn't - that's why I included DialerExt in the first version.

An option would be add an interface which takes a struct with Scylla-specific parameters (so that we don't need to add a new interface each time we add a parameter):

type ScyllaDialer interface { DialContextWithParams(ctx context.Context, network, addr string, params *ScyllaDialerParams) (net.Conn, error) } type ScyllaDialerParams struct { SourcePort uint16 // other parameters... }

@mmatczuk WDUT?

Also, we could provide some error value for LocalAddressDialer to return so that we can differentiate bind errors from connect errors (and retry with backoff in the latter case).

We already do differentiate such errors with this function, used here. This code assumes that user's Dialer will use net.Dialer.DialContext, and will work correctly with errors returned from that function.

However, a custom dialer which does not use net.Dialer.DialContext might not be able to indicate through an error that a port is already in use. I'll add an error value for that purpose, as you suggested.

An option would be add an interface which takes a struct with Scylla-specific parameters (so that we don't need to add a new interface each time we add a parameter)

Having a struct defeats the purpose of the interface though. If we add a parameter later that the dialer does not know about, we couldn't check whether it implements the old or new behavior.

I think it's safer to introduce yet another interface in the future if the need arises.

We already do differentiate such errors with this function, used here. This code assumes that user's Dialer will use net.Dialer.DialContext, and will work correctly with errors returned from that function.

Ah, I missed this part, thanks for pointing it out!

However, a custom dialer which does not use net.Dialer.DialContext might not be able to indicate through an error that a port is already in use. I'll add an error value for that purpose, as you suggested.

Thanks!

We recently moved to Go versions 1.14 and 1.15, so we can use the `errors.Is` function to check for the EADDRINUSE error. This change also adds the `ErrScyllaSourcePortAlreadyInUse` error which can be returned from dialers which do not use `net.Dialer` internally in order to indicate that the source port is already bound.

mmatczuk · 2020-11-06T16:16:29Z

I'm not sure how effective scyllaShardAwarePortFallbackDuration is and what does it actually mean.
Typically application will start fill pools and use connections, when session is closed so are pools and connections.
I'd suggest to remove the backoff functionality and replace it with a hard fallback.

For example, you could set shardAwarePortDisabled to true, and drop all time based code.

piodul · 2020-11-12T11:34:21Z

@mmatczuk
My idea was that the fallback was supposed to help in the following situation:

Client application connects to a node, but detects that source ports are translated by NAT, so it falls back to using the old port,
The NAT issue is fixed in the meantime,
Fallback period ends,
Next time the node restarts, the shard-aware port will be used to connect.

I admit that this scenario is a bit contrived, and I am not knowledgeable enough in networking if it is realistic - if it isn't, I can change it to the hard fallback. In such case, as a last resort, the client application can be restarted to clear the hard fallback flag.

mmatczuk

LGTM

piodul commented Aug 28, 2020

View reviewed changes

piodul force-pushed the port-based-load-balancing branch from f782dad to 0a64dd2 Compare August 28, 2020 16:52

piodul requested review from mmatczuk, haaawk, jul-stas and zimnx August 28, 2020 17:08

piodul force-pushed the port-based-load-balancing branch 3 times, most recently from cbdb720 to 5649c2b Compare September 4, 2020 11:46

piodul marked this pull request as ready for review September 4, 2020 12:13

piodul mentioned this pull request Sep 10, 2020

A deadlock may occur if closing excess connections encounter an error #53

Closed

piodul force-pushed the port-based-load-balancing branch 2 times, most recently from 1fa0e0a to d489b8a Compare September 16, 2020 20:45