Build is sporadically hanging on unit tests #29

akara · 2015-08-21T01:35:54Z

Need to investigate hanging build.

zhuchenwang · 2015-08-21T17:57:15Z

zkcluster will block the execution thread until connect to Zookeeper. Since we have started a embeded Zookeeper instance locally, there might be some timing issue.

akara · 2015-08-21T17:59:55Z

Good point. How do we make it safe?

zhuchenwang · 2015-08-21T18:02:58Z

Here is the code https://github.com/paypal/squbs/blob/master/squbs-zkcluster/src/test/scala/org/squbs/cluster/ZkClusterMultiActorSystemTestKit.scala#L107
I might not have much time to take a look recently. If you guys couldn't figure out, I can take care of it later. BTW, can I access the travis CI now?

anilgursel · 2015-08-22T18:42:52Z

I do not think the builds are hanging because of zkcluster. Some tests in unicomplex (MultilistenerSpec) and in test-kit (CustomTestkitSpec) have issues. In my local, I even see compile issues with some test classes.. Will update when I have more information..

anilgursel · 2015-08-22T21:40:28Z

The issues I mentioned in my previous comment seem to be unrelated.. Still need to be addressed though..

I built zkcluster quite many times exclusively and no hanging at all. Also, builds with zkcluster excluded were also hanging.

The problem seems to be caused by ActorMonitorSpec. Gets stuck in this loop:

https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L65

akara · 2015-08-24T17:13:00Z

By the time you get to L65, the system is already fully initialized. Individual actor initialization is asynchronous and therefore some actors are not yet started. You're right, we could not use an event to wake it up. Yet, an infinite loop also does not seem to be the right thing. We definitely should fail the build instead.

This solves the hang, but would get you into sporadic failures. It just means there are less than 12 actors are currently active. So we need to make sure we cause all 12 actors to be active before checking. This can be done by sending a message to the actors and awaiting their responses. Then we check for the 12 actors expose the stats through JMX.

Let me try this out.

zhuchenwang · 2015-08-24T19:11:04Z

We already used awaitAssert here https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L114
Shall we just changed the test case all to use awaiAssert then we probably can get rid of the infinite loop?
The idea here is that if everything goes correct, the actors will be started at a certain point of time. Just give more chances to get the bean value. If some actors was not started correctly, that means something has to be wrong. Then the build should fail.

akara · 2015-08-24T19:47:49Z

I really like this path. The only concern I have is that the awaitAssert itself does not force the actor to become active. So is it possible for some of these actors to get stuck in the actor shell creation (empty shell)? Because they never received a message, the creation of the actor itself will happen a bit later or even get optimized into lazy initialization?

I'd still want to ping each of the critical actors in this test once, just to make sure they're good. The identify message is hopefully good enough to ensure the actor is indeed created (causing the JMX beans to be created). But if we want a sure path, we probably need to hit each actor with an app message. That can be done, too.

zhuchenwang · 2015-08-24T20:52:34Z

I am open to that.

Build is sporadically hanging #29

akara · 2015-09-08T06:17:19Z

I think this issue is resolved. Please let me know before I close it. Thx!

anilgursel · 2015-09-08T16:30:30Z

It looks like we still have sporadic failures around those lines. Even though frequency is much less. I would keep this open until we fully fix it, link to the related gitter message by you: https://gitter.im/paypal/squbs?at=55ea283e0b6aa72b12ffd02d.

#29: Fix rest of sporadic build…

akara · 2015-09-10T21:21:50Z

Failures resolved. We should not have sporadic build failures any longer.

akara added the bug label Aug 21, 2015

akara added this to the RELEASE-0.7.X milestone Aug 21, 2015

anilgursel changed the title ~~Build is sporadically hanging on zkcluster~~ Build is sporadically hanging Aug 24, 2015

anilgursel changed the title ~~Build is sporadically hanging~~ Build is sporadically hanging on unit tests Aug 24, 2015

anilgursel self-assigned this Aug 24, 2015

akara added a commit that referenced this issue Aug 25, 2015

Merge pull request #41 from anilgursel/master

f9ac22b

Build is sporadically hanging #29

az-qbradley added a commit that referenced this issue Sep 9, 2015

Merge pull request #67 from akara/codacy

deaff9f

#29: Fix rest of sporadic build…

akara closed this as completed Sep 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build is sporadically hanging on unit tests #29

Build is sporadically hanging on unit tests #29

akara commented Aug 21, 2015

zhuchenwang commented Aug 21, 2015

akara commented Aug 21, 2015

zhuchenwang commented Aug 21, 2015

anilgursel commented Aug 22, 2015

anilgursel commented Aug 22, 2015

akara commented Aug 24, 2015

zhuchenwang commented Aug 24, 2015

akara commented Aug 24, 2015

zhuchenwang commented Aug 24, 2015

akara commented Sep 8, 2015

anilgursel commented Sep 8, 2015

akara commented Sep 10, 2015

Build is sporadically hanging on unit tests #29

Build is sporadically hanging on unit tests #29

Comments

akara commented Aug 21, 2015

zhuchenwang commented Aug 21, 2015

akara commented Aug 21, 2015

zhuchenwang commented Aug 21, 2015

anilgursel commented Aug 22, 2015

anilgursel commented Aug 22, 2015

akara commented Aug 24, 2015

zhuchenwang commented Aug 24, 2015

akara commented Aug 24, 2015

zhuchenwang commented Aug 24, 2015

akara commented Sep 8, 2015

anilgursel commented Sep 8, 2015

akara commented Sep 10, 2015