-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build is sporadically hanging on unit tests #29
Comments
zkcluster will block the execution thread until connect to Zookeeper. Since we have started a embeded Zookeeper instance locally, there might be some timing issue. |
Good point. How do we make it safe? |
Here is the code https://github.com/paypal/squbs/blob/master/squbs-zkcluster/src/test/scala/org/squbs/cluster/ZkClusterMultiActorSystemTestKit.scala#L107 |
I do not think the builds are hanging because of zkcluster. Some tests in unicomplex (MultilistenerSpec) and in test-kit (CustomTestkitSpec) have issues. In my local, I even see compile issues with some test classes.. Will update when I have more information.. |
The issues I mentioned in my previous comment seem to be unrelated.. Still need to be addressed though.. I built zkcluster quite many times exclusively and no hanging at all. Also, builds with zkcluster excluded were also hanging. The problem seems to be caused by ActorMonitorSpec. Gets stuck in this loop: |
By the time you get to L65, the system is already fully initialized. Individual actor initialization is asynchronous and therefore some actors are not yet started. You're right, we could not use an event to wake it up. Yet, an infinite loop also does not seem to be the right thing. We definitely should fail the build instead. This solves the hang, but would get you into sporadic failures. It just means there are less than 12 actors are currently active. So we need to make sure we cause all 12 actors to be active before checking. This can be done by sending a message to the actors and awaiting their responses. Then we check for the 12 actors expose the stats through JMX. Let me try this out. |
We already used awaitAssert here https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L114 |
I really like this path. The only concern I have is that the awaitAssert itself does not force the actor to become active. So is it possible for some of these actors to get stuck in the actor shell creation (empty shell)? Because they never received a message, the creation of the actor itself will happen a bit later or even get optimized into lazy initialization? I'd still want to ping each of the critical actors in this test once, just to make sure they're good. The identify message is hopefully good enough to ensure the actor is indeed created (causing the JMX beans to be created). But if we want a sure path, we probably need to hit each actor with an app message. That can be done, too. |
I am open to that. |
I think this issue is resolved. Please let me know before I close it. Thx! |
It looks like we still have sporadic failures around those lines. Even though frequency is much less. I would keep this open until we fully fix it, link to the related gitter message by you: https://gitter.im/paypal/squbs?at=55ea283e0b6aa72b12ffd02d. |
Failures resolved. We should not have sporadic build failures any longer. |
Need to investigate hanging build.
The text was updated successfully, but these errors were encountered: