Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cancel of TestCaseExecutor interruption job if test succeeds #590

Conversation

LeoColman
Copy link
Member

As it is, the current TestCaseExecutor will by mistake recycle a thread from a threadpool between multiple test executions. That shouldn't be a problem. However, the scheduler will cancel the executor after a timeout has passed, and if another thread is running there, it will be interrupted.

This interruption is undesired, as it had nothing to do with the thread that successfully passed without a timeout.

This commit cancels the interruption job if the test ran successfully, avoiding this issue.

Fixes #588

@LeoColman
Copy link
Member Author

@sksamuel @sschuberth
Here's what I did here:
I created a scenario that I believe is a reproduction of the bug, however I'm not sure if it's actually a reproduction or another bug that was found while trying it.

This scenario exercises my main hypothesis of "I believe the tests are sharing a timeout". It works by creating a big timeout that will be "accumulated" between tests, and one of the tests will have a "Thread.sleep interrupted" message. This, I believe, is the executor being shutdown from another test.

When I added a job.cancel() inside TestCaseExecutor, this behavior stopped happening, and all tests complete flawlessly, because I won't run the interruption if the test succeeds.

@LeoColman
Copy link
Member Author

If you remove the cancel, a test from the added class will fail for apparently no reason.

@LeoColman LeoColman force-pushed the bugfix/588-tests-are-being-interrupted-before-timeout-is-reached branch from c44672f to fb5b43e Compare January 18, 2019 00:43
@LeoColman LeoColman force-pushed the bugfix/588-tests-are-being-interrupted-before-timeout-is-reached branch from fb5b43e to 40f91b1 Compare January 18, 2019 01:20
@LeoColman LeoColman force-pushed the bugfix/588-tests-are-being-interrupted-before-timeout-is-reached branch from 40f91b1 to 2b8d7fe Compare January 18, 2019 10:39

// This cancel is necessary, or else the scheduler is reutilized by another test, and the shutdown will happen
// in the other test. This leads to tests "sharing" a timeout, and another test being timed out due to this task.
scheduledInterrupt.cancel(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. If the scheduled runnable still runs, all it does is shutdown the executor. The executor that will have already been shutdown, so what's the issue? There's no problem if it still runs right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also pretty confused by this, as the Executor should be a different one every time. However, the issue seems to be on TestEngineListener, as it's shared between tests. executor will be listenerExecutor for single-threaded tests, and these will be interrupted as the executor shuts down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not completely sure if this is a Threads issue or a Coroutines issue of some sort, preventing the executor from shutting down if the test completes will fix the scenario that I created in TestCaseExecutorTest

@LeoColman
Copy link
Member Author

Running more tests now it made me realize this error has nothing to do with a threadpool. I'll remove it from the test scenario

@LeoColman
Copy link
Member Author

The error is specifically with TestEngineListener

As it is, the current TestCaseExecutor will by mistake recycle a thread from a threadpool between multiple test executions. That shouldn't be a problem. However, the scheduler will cancel the executor after a timeout has passed, and if another thread is running there, it will be interrupted.

This interruption is undesired, as it had nothing to do with the thread that successfully passed without a timeout.

This commit cancels the interruption job if the test ran successfully, avoiding this issue.

Fixes #588
@LeoColman LeoColman force-pushed the bugfix/588-tests-are-being-interrupted-before-timeout-is-reached branch from 2b8d7fe to ce3e124 Compare January 18, 2019 11:12
@sksamuel
Copy link
Member

The error is what you said earlier I think.
The executor that we interrupt is not always a unique executor.
Sometimes it's the shared listener executor, which we cannot shut down.

@LeoColman
Copy link
Member Author

Ok, so there are two options.
We either cancel the task if the test succeeds (current option)
Or only cancel the task if it's not ListenerExecutor.

I think it makes more sense to cancel the task

@sksamuel
Copy link
Member

Cancel won't do anything unless the user checks for it in their code. Which they won't 99% of the time.

The solution is to not share the listener executor. Always create a new one every time.
Then we can interrupt it and all will be well but... we can't call shutdown on the executor as that will prelude submitting the "after test" listener callbacks in the same thread.

Hmm...

@LeoColman
Copy link
Member Author

Cancel won't do anything unless the user checks for it in their code.

What do you mean with this?
Cancel is used to stop the interruption job, not to cancel the thread/coroutine

This have nothing to do with user code, unless the code doesn't respons to a Interrupt call from thread, but if it's the case, a non-shared executor won't help anyways

@sksamuel
Copy link
Member

Cancel won't do anything unless the user checks for it in their code.

What do you mean with this?
Cancel is used to stop the interruption job, not to cancel the thread/coroutine

This have nothing to do with user code, unless the code doesn't respons to a Interrupt call from thread, but if it's the case, a non-shared executor won't help anyways

Yes I meant the coroutine. We can cancel the scheduler but it won't help.

@sksamuel
Copy link
Member

I think my original approach is wrong, and kotlintest 3.2.0 is fundamentally broken on test suites that run for over 10 minutes that use threads=1 because they all share the listener executor.

Calling cancel on the scheduler - there's no point even adding it in the first place as we cannot interrupt this executor full stop.

This is my proposal. For each spec we create a new single threaded listener executor (delete the old shared executor).
We launch the before callbacks in this executor.
We then launch the tests using this executor or if threads>1 we create a new executor.
We add a scheduled event into the shared scheduler, which upon timeout fails that test.
We launch the after callbacks in the listener executor.

Only then do we call shutdown on the listener executor (and the test executor if threads>1).
Not shutdownNow but simply shutdown.
If the user has infinite loops then they're on their own.

I don't like it as it's just not correct to leave the threads running in the background but I'm not sure what else we can do.

@LeoColman
Copy link
Member Author

I don't think leaving threads running isn't a biiiig problem. As you said, users are on their own if their threads are not interruptable and run indefinetly. This is such a rare scenario that I don't think will even happen.

I'll try to implement your proposal and add it to this commit if you won't focus on this now.

@LeoColman
Copy link
Member Author

But if you want to take it and use this branch, let me know

@LeoColman
Copy link
Member Author

We should fix it ASAP, as this will break a lot of test suites in 3.2.0

@LeoColman
Copy link
Member Author

If you want to add the cancel on the scheduler as a hotfix now, while we implement the correct fix, I'm for it aswell.

@LeoColman
Copy link
Member Author

I'm trying to ignore the ExecutorListener, but even when doing this, tests in a same spec will still fail if the cancel doesn't happen. Unless we give an executor to each Test, and ignore the threads. I think cancel on the Interrupt is kinda OK for now.

Let me know if you think differently

@sksamuel
Copy link
Member

I can fix and release tonight if you want me to do it.
No point hot fixing, I'll do it properly for 3.2.1

@LeoColman
Copy link
Member Author

LeoColman commented Jan 18, 2019

Sure thing, I think it will be better if you do it.
I don't have a lot of experience on these runners.

Use this branch if you want, as the test can be used from it.

@sksamuel
Copy link
Member

sksamuel commented Jan 18, 2019 via email

@sksamuel
Copy link
Member

All tests now pass using an implementation as I described earlier.

@sksamuel sksamuel merged commit 71eb4d3 into master Jan 19, 2019
@sksamuel sksamuel deleted the bugfix/588-tests-are-being-interrupted-before-timeout-is-reached branch January 19, 2019 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants