Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid port collisions in tests, start CassandraLauncher first #2804

Merged
merged 1 commit into from Apr 7, 2020

Conversation

octonato
Copy link
Member

@octonato octonato commented Apr 6, 2020

Refs: #2778

@octonato
Copy link
Member Author

octonato commented Apr 6, 2020

We had an internal CassandraLauncher.scala that was not used. This is removed.

@octonato
Copy link
Member Author

octonato commented Apr 6, 2020

We had an internal CassandraLauncher.scala that was not used. This is removed.

I'm wrong about this. The internal CassandraLauncher is used in dev-mode. I will re-add it.

@@ -34,13 +37,19 @@ class CassandraPersistenceSpec(system: ActorSystem) extends ActorSystemSpec(syst
override def beforeAll(): Unit = {
super.beforeAll()

// Join ourselves - needed because the Cassandra offset store uses cluster startup task
val cluster = Cluster(system)
cluster.join(cluster.selfAddress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this would be needed? It must be better to have Cassandra running as early as possible so that any things in the test or Lagom itself can connect if they startup as part of the cluster formation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both write and read sides depend on cluter and therefore it's ok to start the cluter first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that if they start before Cassandra is started the initial connect to Cassandra may fail. That may cause other problems.

val cluster = Cluster(system)
cluster.join(cluster.selfAddress)

// first ensure that this node is Up (ie: remote port is properly bound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote port is bound when the ActorSystem is started (before apply returns)

Copy link
Member Author

@octonato octonato Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right about this. Somehow I convinced myself that the port would only bind when trying to form the cluster. This doesn't make any sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's getting really puzzling.

The erroing test logs says:

 Remoting started with transport [Artery tcp]; listening on address [akka://PersistentEntityTestDriverCompatSpec@127.0.0.1:34359] with UID [3966907252480615512]
[info] PersistentEntityTestDriverCompatSpec: Starting Cassandra on port client port: 34359

So, artery bound on TCP port 34359 at actor system bootstrap. Fine.
Then Cassandra launcher tries to pick a free port and OS returns 34359, which then conflicts when we try to start the Cassandra client.
That doesn't make sense to me. Unless the actor system is not yet bootstrapped when we start the CassandraLauncher.

I can only explain it with the following order of events:

  1. ActorSystem reserves port 34359 and releases the socket
  2. CassandraLauncher reserves port 34359 and releases the socket
  3. Artery starts on port 34359
  4. CassandraLauncher starts and fails to bind on port 34359

(1 and 2 can be swapped)

Maybe forming the cluster before CassandraLauncher is exaggerated, but we should not try to reserve a port for Cassandra before we can confirm that Artery is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that scenario is true there is something wrong with the initialization in Artery. It's supposed to bind (and block until completed) before ActorSystem.apply returns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's useful information. Although it makes it scarier now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it!

In the init of the test, we reserve the Cassandra port using CassandraLauncher.randomPort (opens socket, pick port, close socket).

Then the ActorSystem bootstraps and take the same port. Artery bounds to it.

Then CassandraLauncher has the port already fixed and tries to bound to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice catch

@octonato
Copy link
Member Author

octonato commented Apr 7, 2020

This is ready for another review.

Search overall where we were using the CassandraLauncher and made sure that we start it as soon as possible and before the ActorSystem.

We do need to access randomPort in many places in order add it to the config, but better to only do it when Cassandra is already running. First start Cassandra, then read randomPort and then start the ActorSystem.

I'm confident that this and akka/akka-persistence-cassandra#765 will remove a lot of flakiness we have seen in the past. 🤞

Copy link
Contributor

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, that looks like the right thing to do

@octonato octonato changed the title Form Cluster before starting CassandraLauncher Start CassandraLauncher before anything else Apr 7, 2020
@patriknw patriknw changed the title Start CassandraLauncher before anything else Avoid port collisions in tests, start CassandraLauncher first Apr 7, 2020
@patriknw
Copy link
Contributor

patriknw commented Apr 7, 2020

changed title of PR

@mergify mergify bot merged commit 2af24e0 into lagom:master Apr 7, 2020
@octonato
Copy link
Member Author

octonato commented Apr 7, 2020

@Mergifyio backport 1.6.x 1.5.x

@octonato octonato deleted the cassandra-flaky-tests branch April 7, 2020 11:55
@mergify
Copy link
Contributor

mergify bot commented Apr 7, 2020

Command backport 1.6.x 1.5.x: failure

No backport have been created

  • Backport to branch 1.6.x failed

Cherry-pick of 090074a has failed:

On branch mergify/bp/1.6.x/pr-2804
Your branch is up to date with 'origin/1.6.x'.

You are currently cherry-picking commit 090074a50.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:

	modified:   persistence-cassandra/javadsl/src/test/java/com/lightbend/lagom/javadsl/persistence/PersistentEntityRefTest.java
	modified:   persistence-cassandra/javadsl/src/test/scala/com/lightbend/lagom/javadsl/persistence/cassandra/CassandraPersistenceSpec.scala
	modified:   persistence-cassandra/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/cassandra/PersistentEntityRefSpec.scala
	modified:   persistence-jdbc/javadsl/src/test/scala/com/lightbend/lagom/javadsl/persistence/jdbc/JdbcPersistenceSpec.scala
	modified:   persistence-jdbc/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/jdbc/JdbcPersistenceSpec.scala
	modified:   persistence-jdbc/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/slick/SlickPersistenceSpec.scala
	modified:   persistence/core/src/test/scala/com/lightbend/lagom/persistence/ActorSystemSpec.scala
	modified:   testkit/core/src/main/scala/com/lightbend/lagom/internal/testkit/CassandraTestServer.scala

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   persistence-cassandra/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/cassandra/CassandraPersistenceSpec.scala

  • Backport to branch 1.5.x failed

Cherry-pick of 090074a has failed:

On branch mergify/bp/1.5.x/pr-2804
Your branch is up to date with 'origin/1.5.x'.

You are currently cherry-picking commit 090074a50.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:

	modified:   persistence-cassandra/javadsl/src/test/java/com/lightbend/lagom/javadsl/persistence/PersistentEntityRefTest.java
	modified:   testkit/core/src/main/scala/com/lightbend/lagom/internal/testkit/CassandraTestServer.scala

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   persistence-cassandra/javadsl/src/test/scala/com/lightbend/lagom/javadsl/persistence/cassandra/CassandraPersistenceSpec.scala
	both modified:   persistence-cassandra/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/cassandra/CassandraPersistenceSpec.scala
	both modified:   persistence-cassandra/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/cassandra/PersistentEntityRefSpec.scala
	both modified:   persistence-jdbc/javadsl/src/test/scala/com/lightbend/lagom/javadsl/persistence/jdbc/JdbcPersistenceSpec.scala
	both modified:   persistence-jdbc/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/jdbc/JdbcPersistenceSpec.scala
	both modified:   persistence-jdbc/scaladsl/src/test/scala/com/lightbend/lagom/scaladsl/persistence/slick/SlickPersistenceSpec.scala
	both modified:   persistence/core/src/test/scala/com/lightbend/lagom/persistence/ActorSystemSpec.scala

mergify bot added a commit that referenced this pull request Apr 7, 2020
mergify bot added a commit that referenced this pull request Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants