Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build is sporadically hanging on unit tests #29

Closed
akara opened this issue Aug 21, 2015 · 12 comments
Closed

Build is sporadically hanging on unit tests #29

akara opened this issue Aug 21, 2015 · 12 comments
Assignees
Labels
Milestone

Comments

@akara
Copy link
Contributor

akara commented Aug 21, 2015

Need to investigate hanging build.

@akara akara added the bug label Aug 21, 2015
@akara akara added this to the RELEASE-0.7.X milestone Aug 21, 2015
@zhuchenwang
Copy link
Collaborator

zkcluster will block the execution thread until connect to Zookeeper. Since we have started a embeded Zookeeper instance locally, there might be some timing issue.

@akara
Copy link
Contributor Author

akara commented Aug 21, 2015

Good point. How do we make it safe?

@zhuchenwang
Copy link
Collaborator

Here is the code https://github.com/paypal/squbs/blob/master/squbs-zkcluster/src/test/scala/org/squbs/cluster/ZkClusterMultiActorSystemTestKit.scala#L107
I might not have much time to take a look recently. If you guys couldn't figure out, I can take care of it later. BTW, can I access the travis CI now?

@anilgursel
Copy link
Collaborator

I do not think the builds are hanging because of zkcluster. Some tests in unicomplex (MultilistenerSpec) and in test-kit (CustomTestkitSpec) have issues. In my local, I even see compile issues with some test classes.. Will update when I have more information..

@anilgursel
Copy link
Collaborator

The issues I mentioned in my previous comment seem to be unrelated.. Still need to be addressed though..

I built zkcluster quite many times exclusively and no hanging at all. Also, builds with zkcluster excluded were also hanging.

The problem seems to be caused by ActorMonitorSpec. Gets stuck in this loop:

https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L65

@akara
Copy link
Contributor Author

akara commented Aug 24, 2015

By the time you get to L65, the system is already fully initialized. Individual actor initialization is asynchronous and therefore some actors are not yet started. You're right, we could not use an event to wake it up. Yet, an infinite loop also does not seem to be the right thing. We definitely should fail the build instead.

This solves the hang, but would get you into sporadic failures. It just means there are less than 12 actors are currently active. So we need to make sure we cause all 12 actors to be active before checking. This can be done by sending a message to the actors and awaiting their responses. Then we check for the 12 actors expose the stats through JMX.

Let me try this out.

@zhuchenwang
Copy link
Collaborator

We already used awaitAssert here https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L114
Shall we just changed the test case all to use awaiAssert then we probably can get rid of the infinite loop?
The idea here is that if everything goes correct, the actors will be started at a certain point of time. Just give more chances to get the bean value. If some actors was not started correctly, that means something has to be wrong. Then the build should fail.

@akara
Copy link
Contributor Author

akara commented Aug 24, 2015

I really like this path. The only concern I have is that the awaitAssert itself does not force the actor to become active. So is it possible for some of these actors to get stuck in the actor shell creation (empty shell)? Because they never received a message, the creation of the actor itself will happen a bit later or even get optimized into lazy initialization?

I'd still want to ping each of the critical actors in this test once, just to make sure they're good. The identify message is hopefully good enough to ensure the actor is indeed created (causing the JMX beans to be created). But if we want a sure path, we probably need to hit each actor with an app message. That can be done, too.

@zhuchenwang
Copy link
Collaborator

I am open to that.

@anilgursel anilgursel changed the title Build is sporadically hanging on zkcluster Build is sporadically hanging Aug 24, 2015
@anilgursel anilgursel changed the title Build is sporadically hanging Build is sporadically hanging on unit tests Aug 24, 2015
@anilgursel anilgursel self-assigned this Aug 24, 2015
akara added a commit that referenced this issue Aug 25, 2015
Build is sporadically hanging #29
@akara
Copy link
Contributor Author

akara commented Sep 8, 2015

I think this issue is resolved. Please let me know before I close it. Thx!

@anilgursel
Copy link
Collaborator

It looks like we still have sporadic failures around those lines. Even though frequency is much less. I would keep this open until we fully fix it, link to the related gitter message by you: https://gitter.im/paypal/squbs?at=55ea283e0b6aa72b12ffd02d.

az-qbradley added a commit that referenced this issue Sep 9, 2015
#29: Fix rest of sporadic build…
@akara
Copy link
Contributor Author

akara commented Sep 10, 2015

Failures resolved. We should not have sporadic build failures any longer.

@akara akara closed this as completed Sep 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants