Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Test hooks mishandle exit signals with link #7119

Closed
jcpetruzza opened this issue Apr 13, 2023 · 5 comments · Fixed by #7246
Closed

Common Test hooks mishandle exit signals with link #7119

jcpetruzza opened this issue Apr 13, 2023 · 5 comments · Fixed by #7246
Assignees
Labels
bug Issue is reported as a bug priority:low team:PS Assigned to OTP team PS

Comments

@jcpetruzza
Copy link

I have a Common Test hook that looks something like this:

pre_end_per_testcase(Suite, TC, ConfigOrSkipOrFail, State) ->
    case ConfigOrSkipOrFail of
        Config0 when is_list(ConfigOrSkipOrFail) -> 
            Handle <- start_something_that_MUST_be_closed_at_the_end(Suite, TC, Config0),
            Config1 = [{please_close_me, Handle} | Config0],
            {Config1, State};

        SkipOrFail ->
            {SkipOrFail, State}
    end.

post_end_per_testcase(Suite, TC, Config, Return, State) ->
    case proplists:get(please_close_me, Config) of
        undefined -> ok;
        Handle -> close_that_thing(Handle)
    end

Consider these possible testcases:

test_ok(Config) ->
    ok.

test_normal_failure(Config) ->
    ?assert(false).

test_timetrap(Config) ->
    ct:timetrap({seconds, 0}),
    ct:sleep(infinity).

test_exit_signal(Config) ->
    TestProc = self(),
    spawn(fun() -> exit(TestProc, boom) end),
    ct:sleep(infinity).

test_exit_signal_with_link(Config) ->
    spawn_link(fun() -> error(bam) end),
    ct:sleep(infinity).

I observe 3 possible behaviours of CT with respect the hook above (using otp-25), depending on the testcase:

  1. The good: for test_ok or test_normal_failure we get:
    - pre_end_per_testcase/4 is called, then
    - end_per_testcase/2 is called, and finally
    - post_end_per_testcase/5 is called, with Config bound to the return value of pre_end_per_testcase/4.
  2. The ugly: for test_timetrap or test_exit_signal we get:
    - pre_end_per_testcase/4 is not called,
    - end_per_testcase/2 is not called, and
    - post_end_per_testcase/4 is called, with Config bound to the same value the testcase received
  3. The bad: for test_exit_signal_with_link we get:
    • pre_end_per_testcase/4 is called, then
    • end_per_testcase/2 is called, and finally
    • post_end_per_testcase/5 is called, but with Config bound to the same value the testcase received!

As hook author, case 2 is a bit ugly, since one needs to account for the fact that sometimes pre_end_per_testcase may not get called, but the problematic one is case 3:

  • It can be a bit surprising, imo, that the behaviour here is not the same as for the other case of exit signals, but I assume there are good reasons for this.
  • Now, the fact that the Config passed to post_end_per_testcase/5 is not what pre_end_per_testcase/4 returned sounds like a bug? In the hook above, it means that there is a Handle that would not get closed. It is possible to work around this, I think, but it is not nice.
@jcpetruzza jcpetruzza added the bug Issue is reported as a bug label Apr 13, 2023
@u3s u3s added the team:PS Assigned to OTP team PS label Apr 14, 2023
@u3s
Copy link
Contributor

u3s commented Apr 26, 2023

  • can you try using CTHState for storing handle to a resource created and cleaned by CTH code? it seems to be maintained in case when test does not execute as expected; I think it also makes more sense for a resource managed by CTH;
  • additionally, you can share the handle with test code in Config, so that for example end_per_testcase can do something with resource if that is wanted;
  • why you're setting ct:timetrap({seconds, 0})? this is agressive and prevents from executing pre_end_per_testcase - at least in my environment. if I increase it to something bigger, I see pre_end_per_testcase called for every test case from you example ... can you confirm that observation?

Apart from above, loosing Config in case of test process crash does not surprise me that much in this scenario ... I don't think this is a bug at this stage.

@u3s u3s removed the bug Issue is reported as a bug label Apr 27, 2023
@u3s u3s added the waiting waiting for changes/input from author label May 5, 2023
@u3s
Copy link
Contributor

u3s commented May 10, 2023

ping @jcpetruzza

@u3s u3s added not a bug Issue is determined as not a bug by OTP bug Issue is reported as a bug labels May 10, 2023
@jcpetruzza
Copy link
Author

Thanks for looking into this, @u3s and sorry for the very late reply, I just returned from some holidays :)

why you're setting ct:timetrap({seconds, 0})? this is agressive and prevents from executing pre_end_per_testcase - at least in my environment. if I increase it to something bigger, I see pre_end_per_testcase called for every test case from you example ... can you confirm that observation?>

I agree that I shouldn't have used ct:timetrap({seconds, 0}) to reproduce the issue, specially since it has the side-effect of setting the timetrap for all the testcases that follow 🤦 Sorry about that!

Anyway, after correcting that, we see that test_timetrap and test_exit_signal also behave badly, just like test_exit_signal_with_link. So ultimately there's only two cases to consider "the good" and "the bad", with the latter being the buggy one.

Apart from above, loosing Config in case of test process crash does not surprise me that much in this scenario ... I don't think this is a bug at this stage.

I disagree. Consider this concrete example:

% my_cth.erl
-module(my_cth).

-export([
    init/2,
    pre_end_per_testcase/4,
    post_end_per_testcase/5
]).

init(_Id, _Opts) ->
    {ok, []}.

pre_end_per_testcase(_Suite, TC, Config1, State) ->
    ct:pal("~p: pre_end_per_testcase called!", [TC]),
    Config2 = [{pre_end_per_testcase_was_here, true} | Config1],
    {Config2, State}.

post_end_per_testcase(_Suite, TC, Config, _Return, State) ->
    ct:pal(
        "~p: post_end_per_testcase got injected resource: ~p",
        [TC, proplists:get_value(pre_end_per_testcase_was_here, Config, false)]
    ),
    {ok, State}.

% my_SUITE.erl
-module(my_SUITE).

-include_lib("stdlib/include/assert.hrl").
-include_lib("common_test/include/ct.hrl").

-export([
    all/0,
    groups/0,

    suite/0,

    init_per_testcase/2,
    end_per_testcase/2,

    test_ok/1,
    test_normal_failure/1,

    test_timetrap/0, test_timetrap/1,

    test_exit_signal/1,
    test_exit_signal_with_link/1
]).

all() ->
    [
        {group, the_good},
        {group, the_bad}
    ].

groups() ->
    [
        { the_good, [], [
              test_ok,
              test_normal_failure
          ]
        },
        { the_bad, [], [
              test_timetrap,
              test_exit_signal,
              test_exit_signal_with_link
          ]
        }
    ].

suite() ->
    [
        {ct_hooks, [my_cth]}
    ].

init_per_testcase(_TC, Config) ->
    Config.

end_per_testcase(TC, Config) ->
    ct:pal(
        "~p: end_per_testcase got injected resource: ~p",
        [TC, proplists:get_value(pre_end_per_testcase_was_here, Config, false)]
    ),
    ok.

test_ok(_Config) ->
    ok.

test_normal_failure(_Config) ->
    ?assert(false).

test_timetrap() ->
    [{timetrap, {seconds, 1}}].
test_timetrap(_Config) ->
    ct:sleep(infinity).

test_exit_signal(_Config) ->
    TestProc = self(),
    spawn(fun() -> exit(TestProc, boom) end),
    ct:sleep(infinity).

test_exit_signal_with_link(_Config) ->
    spawn_link(fun() -> error(bam) end),
    ct:sleep(infinity).

We can see how CT is behaving differently (at least in R25) by looking at the log output:

A "good" case:

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: pre_end_per_testcase called!

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: end_per_testcase got injected resource: true

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: post_end_per_testcase got injected resource: true

A "bad" case:

----------------------------------------------------
2023-05-14 17:01:15.492
test_exit_signal_with_link: pre_end_per_testcase called!

----------------------------------------------------
2023-05-14 17:01:15.492
test_exit_signal_with_link: end_per_testcase got injected resource: true

----------------------------------------------------
2023-05-14 17:01:15.493
test_exit_signal_with_link: post_end_per_testcase got injected resource: false

Why I think the latter is a bug:

  1. CT managed to successfully call pre_end_per_testcase/4 and passed the returned Config to end_per_testcase/2, which succeeded; there is no reason why it can't pass that same Config to post_end_per_testcase/5, as it does in the "good" case!
  2. As it is, it just violates, for no good reason, the protocol that says that the Config returned by pre_xxx is passed to the corresponding post_xxx, that hook authors need to able to rely on.

Regarding your other comments:

  • can you try using CTHState for storing handle to a resource created and cleaned by CTH code? it seems to be maintained in case when test does not execute as expected;

We do leverage the CTHState to be able to workaround the issue. We are using it as a lookup-table to see if there is a "more recent" Config we should use. It is tricky since it can lead to space-leaks and introduces a lot of unnecessary complexity, since CT's responsibility is to handle the Config lifetime correctly

I think [using CTHState] also makes more sense for a resource managed by CTH;

Not when the hook is managing individual resources for each testcase; need to account for parallel execution; the same testcase can occur in multiple groups, etc. The Config is the only reliable source of truth in that case, afaics.

additionally, you can share the handle with test code in Config, so that for example end_per_testcase can do something with resource if that is wanted;

Well, iiuc, that's precisely what we are trying to do, and why "the bad" case is problematic: the pre_end_per_testcase/4 function creates a resource, and puts it in the Config, so that the end_per_testcase/2 function can use it... but then in this case we can't dispose of it since the post_end_per_testcase/5 function doesn't get it's handle in the Config!

@u3s
Copy link
Contributor

u3s commented May 15, 2023

thanks for re-phrasing and updating the description. I now agree with your observation.

when looking at pids:

  • the_good test cases are executed in a single process (<A>: testcase, <A>: end_per_testcase, <A>:post_end_per_testcase)
  • the bad test cases are executed in 3 process (<A>: testcase, <B>:end_per_testcase, <C>:post_end_per_testcase)
  • B is provided with latest Config owned by A, but C is not receiving it

this looks strange. I will need to check source code to figure out the root cause of this behavior.

@u3s u3s linked a pull request May 16, 2023 that will close this issue
@u3s u3s removed not a bug Issue is determined as not a bug by OTP waiting waiting for changes/input from author labels May 16, 2023
@u3s
Copy link
Contributor

u3s commented Jul 3, 2023

#7246 fix is merged to maint branch

@u3s u3s closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:low team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants