Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition between meck:unload/1 and calls to the mocked module #150

Merged
merged 2 commits into from
Aug 22, 2015
Merged

Fix race condition between meck:unload/1 and calls to the mocked module #150

merged 2 commits into from
Aug 22, 2015

Conversation

dszoboszlay
Copy link
Contributor

When meck:unload(M) and a call to the mocked module being unloaded (e.g. M:foo()) happens concurrently, the call may fail with error {not_mocked, M} (similar to e.g. #144, although that may be a different issue).

The correct behaviour would be for M:foo() to either procude the mocked result or execute the original, unmocked code, but it shouldn't fail.

This could be achieved by meck_code_gen:exec/4 catching the error thrown when meck_proc is down, and failing back to a regular apply into the (no longer mocked) code instead. (Turning meck_proc:invalidate/1 into a cast from a call ensures only meck_proc:get_result_spec/3 could fail with said error from the functions used within meck_code_gen:exec/4.)

@eproxus
Copy link
Owner

eproxus commented Aug 21, 2015

Hi! I agree this edge case needs to be handled.

I looked at the solution, and I'll try to summarize how the test case works. Correct me if I'm wrong.

  1. It suspends the mocked process.
  2. It runs concurrent_req twice, once with a function that unloads the mock and once with a function that calls the module.
  3. These functions are spawned and monitored, and it is checked that their messages end up in the message queue of the mock process.
  4. Once they are in the message queue, the mock process is started and it is verified that both request processes have exited normally.

Isn't there a race condition, because both processes are spawned, and then check for Msgs + 1 messages being in the queue? E.g. one message is delivered and both processes think this is their message?

Also, what does this test in practice? The order in the message queue is not guaranteed (because of the above problem) and it is not guaranteed that the test will actually call the mock in the state where the process is gone but the module not yet reloaded?

@dszoboszlay
Copy link
Contributor Author

You are correct in bullet points 1-4, however there's a mistake in your conclusion.

concurrent_req waits for the spawned process to place its message into the meck_proc's message queue before returning. Therefore the first message (from meck:unload(meck_test_module)) will already be in the message queue when concurrent_req is called the second time (with meck_test_module:a()).

So this construct will guarantee that there will be two messages in the meck_proc's message queue in the intended order when the process is resumed. I know this is not a super easy to read test, but it is at least a reliable way of testing a race condition that would otherwise happen rarely. (Except on our Jenkins server, where this bug keeps hitting us surprisingly and annoyingly often...)

Regarding the what do we test in practice question: there are many possible ordering of events that could lead to this race condition. The test picks one possible ordering (which we can reproduce deterministically thanks to the suspend/resume mechanism):

  1. Process A sends a {stop, M} message to meck_proc.
  2. Process B calls into (the mocked version of) M.
  3. Process B sends a get_result_spec message to meck_proc.
  4. meck_proc starts processing the {stop, M} message.
  5. meck_proc deletes the mocked version of M.
  6. meck_proc terminates.
  7. The gen_server call in process B crashes.

But the following scenario would also lead to the same result:

  1. Process B calls into (the mocked version of) M.
  2. Process A sends a {stop, M} message to meck_proc.
  3. Process B sends a get_result_spec message to meck_proc.
  4. meck_proc starts processing the {stop, M} message.
  5. meck_proc deletes the mocked version of M.
  6. meck_proc terminates.
  7. The gen_server call in process B crashes.

Or:

  1. Process B calls into (the mocked version of) M.
  2. Process A sends a {stop, M} message to meck_proc.
  3. meck_proc starts processing the {stop, M} message.
  4. Process B sends a get_result_spec message to meck_proc.
  5. meck_proc deletes the mocked version of M.
  6. meck_proc terminates.
  7. The gen_server call in process B crashes.

Or:

  1. Process A sends a {stop, M} message to meck_proc.
  2. meck_proc starts processing the {stop, M} message.
  3. Process B calls into (the mocked version of) M.
  4. meck_proc deletes the mocked version of M.
  5. meck_proc terminates.
  6. Process B sends a get_result_spec message to meck_proc.
  7. The gen_server call in process B crashes.

And so on. Basically the only constraints for the issue to turn up are that:

  • Process B must call into M before meck_proc would delete the mocked version of M.
  • Process A must send the {stop, M} message before process B would send the get_result_spec message.

Btw. you can check that this test deterministically fails in 6b3eaf9, where the fix is not yet implemented. Which sort of proves the correctness of the test. 😄

@eproxus
Copy link
Owner

eproxus commented Aug 21, 2015

I think I get it: the call to meck_test_module is done while that module is still the mocked shim. This guarantees that the code will make a call to the mock process, which in turn has a stop message waiting before the call message, which in turn guarantees that the process will terminate before the call message arrives, thus triggering the bug.

@eproxus
Copy link
Owner

eproxus commented Aug 21, 2015

I think turning invalidate into a cast is fine, if the process is down by that point getting the state of the mock won't work anyway.

Now on to the more philosophical question: Is this actually a nice behavior? Getting the not_mocked error when trying to call a mock that is shut down (but not yet unloaded) is happening because you have test code that still thinks there is a valid mock. I.e. you've called unload before you were done with the test. This fix would obscure such issues and perhaps even produce "false" test results.

@dszoboszlay
Copy link
Contributor Author

I think the nice behaviour would be that while meck mocks/unmocks module M any other concurrently happening call into M would either execute the original code of M or return the mocked response. Which one would happen out of these two options can be left unspecified, but crashing the call because meck was doing something in the meantime is not reasonable.

Please also consider that once meck:unload(M) returns I can keep on using M - it will execute the original implementation of M, and won't throw {not_mocked, M}.

I also doubt this change would produce "false" test results. First of all, most tests are synchronous, they execute all operations in the same process that calls meck:new/1 and meck:unload/1. And even if the test would start concurrent processes to do calls into a mocked module, it would be definitely the test's responsibility to stop all such processes before cleaning up the mocks.

In my particular case when this issue comes up I start application A to test application B. I change A's behaviour slightly with meck, using the passthrough option to leave most of its functionality intact. However, A does some periodic tasks on its own (independent from the tests) that may concurrently call into the mocked module. You may of course argue that in that case I should first stop A, then call meck:unload/1, but that approach is not practical due to the huge overhead of starting and stopping A all the time.

And my final argument is that before this patch if your test happens to make a concurrent call to a module being unloaded three things may occur:

  1. You get the mocked reply.
  2. You execute the module's original code.
  3. You get a {not_mocked, M} error, which is the least likely to happen, and you probably won't even consider this option at all. So it would be a weird timing issue in your tests.

This patch would only eliminate the option 3, the weird timing issue. You may still have timing issues (if your test depends on specifically either option 1 or option 2), but that would be more likely to turn up and easier to debug.

@eproxus
Copy link
Owner

eproxus commented Aug 22, 2015

True, it is a good point that the behavior is the same after the module is fully unloaded anyway.

eproxus added a commit that referenced this pull request Aug 22, 2015
Fix race condition between meck:unload/1 and calls to the mocked module
@eproxus eproxus merged commit 710ad9a into eproxus:master Aug 22, 2015
@jsvisa
Copy link

jsvisa commented Sep 11, 2015

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants