New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid port collisions in tests, start CassandraLauncher first #2804
Conversation
We had an internal CassandraLauncher.scala that was not used. This is removed. |
I'm wrong about this. The internal CassandraLauncher is used in dev-mode. I will re-add it. |
82b8cbb
to
4d89265
Compare
@@ -34,13 +37,19 @@ class CassandraPersistenceSpec(system: ActorSystem) extends ActorSystemSpec(syst | |||
override def beforeAll(): Unit = { | |||
super.beforeAll() | |||
|
|||
// Join ourselves - needed because the Cassandra offset store uses cluster startup task | |||
val cluster = Cluster(system) | |||
cluster.join(cluster.selfAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this would be needed? It must be better to have Cassandra running as early as possible so that any things in the test or Lagom itself can connect if they startup as part of the cluster formation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both write and read sides depend on cluter and therefore it's ok to start the cluter first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that if they start before Cassandra is started the initial connect to Cassandra may fail. That may cause other problems.
val cluster = Cluster(system) | ||
cluster.join(cluster.selfAddress) | ||
|
||
// first ensure that this node is Up (ie: remote port is properly bound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote port is bound when the ActorSystem is started (before apply
returns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right about this. Somehow I convinced myself that the port would only bind when trying to form the cluster. This doesn't make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's getting really puzzling.
The erroing test logs says:
Remoting started with transport [Artery tcp]; listening on address [akka://PersistentEntityTestDriverCompatSpec@127.0.0.1:34359] with UID [3966907252480615512]
[info] PersistentEntityTestDriverCompatSpec: Starting Cassandra on port client port: 34359
So, artery bound on TCP port 34359 at actor system bootstrap. Fine.
Then Cassandra launcher tries to pick a free port and OS returns 34359, which then conflicts when we try to start the Cassandra client.
That doesn't make sense to me. Unless the actor system is not yet bootstrapped when we start the CassandraLauncher.
I can only explain it with the following order of events:
- ActorSystem reserves port 34359 and releases the socket
- CassandraLauncher reserves port 34359 and releases the socket
- Artery starts on port 34359
- CassandraLauncher starts and fails to bind on port 34359
(1 and 2 can be swapped)
Maybe forming the cluster before CassandraLauncher is exaggerated, but we should not try to reserve a port for Cassandra before we can confirm that Artery is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that scenario is true there is something wrong with the initialization in Artery. It's supposed to bind (and block until completed) before ActorSystem.apply returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's useful information. Although it makes it scarier now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it!
In the init of the test, we reserve the Cassandra port using CassandraLauncher.randomPort
(opens socket, pick port, close socket).
Then the ActorSystem bootstraps and take the same port. Artery bounds to it.
Then CassandraLauncher has the port already fixed and tries to bound to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nice catch
4d89265
to
090074a
Compare
This is ready for another review. Search overall where we were using the CassandraLauncher and made sure that we start it as soon as possible and before the ActorSystem. We do need to access I'm confident that this and akka/akka-persistence-cassandra#765 will remove a lot of flakiness we have seen in the past. 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, that looks like the right thing to do
changed title of PR |
@Mergifyio backport 1.6.x 1.5.x |
Command
|
Manual backport of #2804 on 1.6.x branch
Manual backport #2804 on 1.5.x branch
Refs: #2778