Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for establishing per-shard connections using Scylla shard-aware ports #52

Merged
merged 18 commits into from
Nov 12, 2020

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Aug 28, 2020

In commit 1572b9e41cd02e8b676404ba72a549398d281a66, Scylla gained
support for so-called "shard-aware" ports. These ports support the same
Cassandra native protocol as the regular native transport ports.
However, unlike the regular ports, shard-aware ports allow the driver to
choose the shard that will handle the connection by choosing an
appropriate source port.

Currently, regular native transport ports employ a load-balancing
strategy that assigns, for an incoming connection, the shard that has
the least amount of connections assigned. Until now, the driver worked
by opening new connections until a single connection per shard is
established. When a connection to an already connected shard is opened,
it is not immediately closed - only after the number of opened
connections exceeds 10x the number of shards. This behavior does not
work well in some realistic scenarios:

  1. When many shard-aware clients connect at once, the load-balancing
    strategy will not work well. Clients will open a very large number of
    connections before reaching all of the shards.
  2. When there are non-shard-aware clients connected to a node, the
    distribution of their connections per shard may be uneven. If one
    shard is occupied by many more connections than the others, it will
    prevent the current load-balancing strategy from assigning a
    connection to that shard, unless a large number of connections to
    other shards is established first.

Shard-aware ports allow the driver to completely avoid those problems by
choosing towards which shard it wishes to connect to. The first
connection is still established towards a regular native transport port,
and the following connections for remaining shards can be established
towards shard-aware ports.

This PR adds support for establishing per-shard connections using
the shard-aware ports.

// Something's wrong, we got a connection to the wrong shard.
// It may be caused by network configuration issues (e.g. NAT
// modifies the source port).
return errors.New("gocql: connected to a wrong shard; make sure that the source port is not translated by NAT")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do in this case. If there is a NAT that consistently changes the source port so that a wrong shard keeps being assigned, the driver will get stuck in a loop where it keeps opening new connections.

One idea I have is to temporarily disable using shard-aware port for new connections if misconfigured NAT is detected. This way the client will not get stuck in a loop, and it will give opportunity to switch to the better method when NAT gets fixed.

Is it a good idea? How long should the timeout be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a hardcoded one-minute timeout. I also added a test which verifies that this fallback behavior works.

scylla.go Outdated
scyllaPortBasedBalancingMin = 0x8000

// the maximum port number that can be chosen for port-based load balancing
scyllaPortBasedBalancingMax = 0xFFFF
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this range be configurable?

i--
continue
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go does not seem to provide separate bind and connect operations. The net.Dialer allows specifying source address (including port), and has Dial function which both binds to the local address and tries to establish a connection. It's a little awkward for our case, as we'd like to know that a bind succeeded, and then tried to connect.

The problem is compounded that the fact that gocql also provides a Dialer interface which allows providing custom logic for dialing. I provided an extension DialerExt which accepts a source port. I don't really like the fact that we need to try multiple times in search of the port, each time calling custom client logic, but it seems to work.

Perhaps a different interface would be better? E.g. we would provide an iterator over a range of acceptable source ports, and the DialerExt would differentiate bind error from other errors internally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from that, is it the right way to plug all of this into the retry logic?

@piodul
Copy link
Collaborator Author

piodul commented Aug 28, 2020

I have some ideas for the integration tests:

  1. Start 10 Sessions concurrently and observe that no excessive connections were closed.
  2. Provide a malicious DialerExt that changes source port -> source port + 1 (so that it always connects to a wrong shard), and see how the driver works around that.
  3. Observe that no connections are made to the shard-aware port if it is not enabled in config.

Previously, I had some doubts on how to approach doing them, but I think I have an idea. I'll try to send them over the weekend.

@piodul piodul force-pushed the port-based-load-balancing branch 3 times, most recently from cbdb720 to 5649c2b Compare September 4, 2020 11:46
@piodul
Copy link
Collaborator Author

piodul commented Sep 4, 2020

I added three tests according to the scenarios listed in my previous comment. Those three scenarios run in two setups: with mocked and real Scylla. I made the mocked version because AFAIK we don't have a released version yet that support shard aware ports.

@piodul
Copy link
Collaborator Author

piodul commented Sep 4, 2020

@mmatczuk @zimnx I added tests for the mechanism of establishing per-shard connection. I consider this PR to be ready for review now - could you take a look?

I left some comments with questions about things that I had doubts about how they should be implemented.

PS: CI tests passed https://travis-ci.org/github/scylladb/gocql/builds/724127399

@piodul piodul marked this pull request as ready for review September 4, 2020 12:13
@piodul
Copy link
Collaborator Author

piodul commented Sep 10, 2020

@mmatczuk @zimnx ping

@zimnx
Copy link
Collaborator

zimnx commented Sep 10, 2020

I don't like that non-scylla specific code (host connection pool, session) is bloated with scylla. It makes harder to keep fork updated with upstream changes.

Maybe I'm not seeing something but perhaps all logic around establishing connection to specific shard could be moved to scyllaDialer.
When control connection finds out that cluster is Scylla not Cassandra, we could wrap config dialer with our own.
ScyllaConnPicker is responsible for telling host connection pool how many connections it needs. So we are kinda guaranteed that scyllaDialer will be called enough times.

@piodul WDYT?

@piodul
Copy link
Collaborator Author

piodul commented Sep 10, 2020

@zimnx Yes, I agree that it will be cleaner. I probably needlessly fixated on the idea that shard/port information should be passed from the top to the connect function. I also think it is possible to hide the shard assignment/port allocation logic inside the dialer.

Thanks for the push in the right direction, I'll update the PR according to your suggestion.

@piodul
Copy link
Collaborator Author

piodul commented Sep 11, 2020

@zimnx

When control connection finds out that cluster is Scylla not Cassandra, we could wrap config dialer with our own.

After closer inspection it turns out that it's not that simple. We have a separate dialer for each session, not for each host. The Dialer.DialContext method is called from within a method of the Session object, and we have no information about the host pool there. To connect to the shard aware port, the dialer needs to know what the shard aware port is, and towards which shards we do not have a connection yet.

I still think that it is possible to reduce the amount of changes in non-scylla specific code. I'll try to move the logic of allocating shard number to connect to outside hostConnPool. I don't know how to get rid of scyllaConnParams, though.

@piodul piodul force-pushed the port-based-load-balancing branch 2 times, most recently from 1fa0e0a to d489b8a Compare September 16, 2020 20:45
@piodul
Copy link
Collaborator Author

piodul commented Sep 17, 2020

@zimnx I adjusted the PR so that it intersects the upstream code less. I could not get rid of the scConnParams due to the reasons in my previous comment. If you have any suggestions how to decrease changes in non-Scylla specific code - I'm all ears.

Tests are green: https://travis-ci.org/github/scylladb/gocql/builds/727829237

cluster.go Outdated Show resolved Hide resolved
cluster.go Outdated Show resolved Hide resolved
cluster.go Outdated Show resolved Hide resolved
conn.go Outdated Show resolved Hide resolved
scylla.go Outdated Show resolved Hide resolved
scylla.go Outdated Show resolved Hide resolved
scylla.go Outdated Show resolved Hide resolved
@mmatczuk mmatczuk force-pushed the master branch 2 times, most recently from 1e593d8 to e6cbbf9 Compare October 29, 2020 14:01
Allows customizing what data will be included by TestServer in SUPPORTED
message. This is done by supplying a testSupportedFactory function,
which takes a connection and returns data to be included in SUPPORTED
message for that connection.

Because the driver gets information about the shard assigned to
connection through a SUPPORTED message, this change enables using
the TestServer mock in tests which verify that establishing a connection
per shard works.
Until now, HostInfo translated its connectAddress using the
AddressTranslator on the first occasion, and did not keep the
untranslated address.

Because the AddressTranslator interface translates not only an IP, but a
pair of IP and a port, this interface could also be used to translate
the connect address paired with the shard-aware port. We learn about the
shard-aware port only after the first connection is established, which
means that the translation of shard-aware address can only be done much
later after the original connectAddress - hence the need to keep the
untranslatedAddress.
scyllaConnPicker will keep information about the shard-aware port of the
host that it is associated with. It will be used later to establish
further connections using the shard-aware port.

The shard-aware address (connect address + shard-aware port) is
translated using the AddressTranslator interface.
Adds an object which, for a given shard and total shard count, will
allow to iterate over source ports which can be used to connect to that
particular shard.
Implements the Dialer interface for the scyllaConnPicker.

scyllaConnPicker has all the information needed to connect to the shard
aware port of the host that it is associated with. It will be used in
later commits exactly for that purpose.

Additionally, adds the framework for implementing own shard-aware
dialers: ScyllaShardAwareDialer (a net.Dialer wrapper),
and ScyllaGetSourcePort (which can be used in completely custom dialers
to get information on the source port from the context).
In some rare cases, it might not be possible to correctly use the
shard-aware port feature, e.g. due to configuration issues. This commit
adds an opt-out switch to disable the new functionality and switch back
to the non-shard-aware native transport port.
This commit actually causes the scyllaConnPicker to be used as a dialer
for establishing new connections.
Up until now, for a particular host, each Conn used to have the same
remote address. Now, it's no longer true because, for a host, we first
establish a connection to the non-shard-aware port, and then the rest of
the connections is established to the shard-aware port - therefore, both
kinds of connections have different remote addresses.

This had an undesirable effect on the prepared statement cache - it used
to identify prepared statements by a key that included the remote
address of the connection, which resulted in the statement cache being
effectively reduced by half.

This commit changes that - now, instead of using the remote address in
the key, (*HostInfo).HostnameAndPort() is used, which indicates the
non-shard-aware address of the replica.
This commit reorganizes the logic responsible for establishing
connections to the shard-aware port and makes it more robust. Now, gocql
will temporarily fall back to using the non-shard-aware port when it
detects a situation in which the shard-aware port cannot be properly
used - which is either when the shard-aware port is unreachable, or the
source port is translated by NAT and connection to the wrong shard is
established.

README.md is updated in order to explain the falling-back behavior.
This commit adds a test which checks that the driver properly falls back
to using the non-shard-aware port when the shard-aware port is
unreachable.
@piodul
Copy link
Collaborator Author

piodul commented Oct 30, 2020

@mmatczuk Sorry for the delay. I pushed an updated version with fixes to review comments. All fixes are in new, mostly separate commits.

Apart from that, I would like to once more discuss the idea of falling back to the use of non-shard-aware port when we detect a case in which the shard-aware port cannot be used. I preemptively updated my PR so that it implements the fall back behavior.

These are the cases that I know of when the shard-aware port won't be usable:

  1. The shard-aware port is unreachable (but the non-shard-aware port is reachable),
  2. The shard-aware port is behind NAT which translates source ports,
  3. User did not update their custom implementation of Dialer.

If we won't fall back to the non-shard-aware port but the driver keeps trying to use it, we may get stuck trying to fill hostConnPools but won't succeed in doing so. Some users who don't know about the feature might get annoyed if they upgrade their client and the cluster and will suddenly get worse performance. Even if we warn about it in the logs, it might be inconvenient to have to disable the shard-aware port manually.

  • In case of 1, we can try connecting to the shard-aware-port first, and if it doesn't succeed, then we can immediately try the non-shard-aware port. This kind of problem can't be detected without false positives, so I think it's better always to try both ports than disable the port temporarily.
  • In case of 2 and 3, we can stop calling the shard-aware port for a while if we detect that a wrong shard is chosen. This is surely a problem with configuration, so it's expected to persist for some time, and it's easier to implement via timeout than retry.

I documented those issues inside README.md so that the user can become aware of them when they read it. I also print a warning in case situation 2 or 3 is detected (I don't do it for 1 because there is no way to do it without false positives).

Please let me know what you think. I can change it back to not using the fall backs, but I think that the driver will behave a little nicer now.

README.md Outdated
- If you are using a custom type implementing `gocql.Dialer`, you can get the source port by using the `gocql.ScyllaGetSourcePort` function:
```go
func (d *myDialer) DialContext(ctx context.Context, network, addr string) (net.Conn, error) {
sourcePort := gocql.ScyllaGetSourcePort(ctx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hides the source port parameter to the context. Context values should only hold information that is not necessary for correct operation of the code (this basically boils down to information for metrics and tracing).

Why not introduce a new interface instead? That way we could disable the shard aware port at session creation if the dialer does not implement the interface:

type LocalAddressDialer interface {
    // DialWithLocalAddress establishes a new connection but the local address must be set to the provided value.
    // You can use net.Dialer.LocalAddr in your implementation.
    DialWithLocalAddress(ctx context.Context, network, localAddr, remoteAddr string) (net.Conn, error);
}

I see DialerExt mentioned in some previous comments, but I don't see it in the code now. Why has it been removed?

Also, we could provide some error value for LocalAddressDialer to return so that we can differentiate bind errors from connect errors (and retry with backoff in the latter case).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed comment #52 (comment) previously.

I still think having an interface is better than putting the port into context. We still can have the current ScyllaShardAwareDialer - it could implement the DialWithLocalAddress interface and still work as intended (for wrapping net.Dialer).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-sucha

This hides the source port parameter to the context. Context values should only hold information that is not necessary for correct operation of the code (this basically boils down to information for metrics and tracing).

Why not introduce a new interface instead? That way we could disable the shard aware port at session creation if the dialer does not implement the interface:

Technically speaking, the source port parameter is optional. From the perspective of the driver there is no difference if the application did not update their Dialer to connect from specified source port, or it's just behind NAT - new connections will be assigned to wrong shards. The driver will fall back to using non-shard-aware port if it realizes that connections are being handled by wrong shards.

I see your point that introducing a LocalAddressDialer interface will allow the driver to check if the Dialer supports setting source port so that it won't bother trying to connect to shard-aware port if it doesn't - that's why I included DialerExt in the first version.

An option would be add an interface which takes a struct with Scylla-specific parameters (so that we don't need to add a new interface each time we add a parameter):

type ScyllaDialer interface {
    DialContextWithParams(ctx context.Context, network, addr string, params *ScyllaDialerParams) (net.Conn, error)
}

type ScyllaDialerParams struct {
    SourcePort uint16
    // other parameters...
}

@mmatczuk WDUT?

Also, we could provide some error value for LocalAddressDialer to return so that we can differentiate bind errors from connect errors (and retry with backoff in the latter case).

We already do differentiate such errors with this function, used here. This code assumes that user's Dialer will use net.Dialer.DialContext, and will work correctly with errors returned from that function.

However, a custom dialer which does not use net.Dialer.DialContext might not be able to indicate through an error that a port is already in use. I'll add an error value for that purpose, as you suggested.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option would be add an interface which takes a struct with Scylla-specific parameters (so that we don't need to add a new interface each time we add a parameter)

Having a struct defeats the purpose of the interface though. If we add a parameter later that the dialer does not know about, we couldn't check whether it implements the old or new behavior.

I think it's safer to introduce yet another interface in the future if the need arises.

We already do differentiate such errors with this function, used here. This code assumes that user's Dialer will use net.Dialer.DialContext, and will work correctly with errors returned from that function.

Ah, I missed this part, thanks for pointing it out!

However, a custom dialer which does not use net.Dialer.DialContext might not be able to indicate through an error that a port is already in use. I'll add an error value for that purpose, as you suggested.

Thanks!

We recently moved to Go versions 1.14 and 1.15, so we can use the
`errors.Is` function to check for the EADDRINUSE error.

This change also adds the `ErrScyllaSourcePortAlreadyInUse` error which
can be returned from dialers which do not use `net.Dialer` internally in
order to indicate that the source port is already bound.
@piodul piodul requested a review from mmatczuk November 6, 2020 15:05
@mmatczuk
Copy link

mmatczuk commented Nov 6, 2020

I'm not sure how effective scyllaShardAwarePortFallbackDuration is and what does it actually mean.
Typically application will start fill pools and use connections, when session is closed so are pools and connections.
I'd suggest to remove the backoff functionality and replace it with a hard fallback.

For example, you could set shardAwarePortDisabled to true, and drop all time based code.

@piodul
Copy link
Collaborator Author

piodul commented Nov 12, 2020

@mmatczuk
My idea was that the fallback was supposed to help in the following situation:

  • Client application connects to a node, but detects that source ports are translated by NAT, so it falls back to using the old port,
  • The NAT issue is fixed in the meantime,
  • Fallback period ends,
  • Next time the node restarts, the shard-aware port will be used to connect.

I admit that this scenario is a bit contrived, and I am not knowledgeable enough in networking if it is realistic - if it isn't, I can change it to the hard fallback. In such case, as a last resort, the client application can be restarted to clear the hard fallback flag.

Copy link

@mmatczuk mmatczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mmatczuk mmatczuk merged commit c1903f5 into scylladb:master Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants