Common Test hooks mishandle exit signals with link #7119

jcpetruzza · 2023-04-13T14:41:12Z

I have a Common Test hook that looks something like this:

pre_end_per_testcase(Suite, TC, ConfigOrSkipOrFail, State) ->
    case ConfigOrSkipOrFail of
        Config0 when is_list(ConfigOrSkipOrFail) -> 
            Handle <- start_something_that_MUST_be_closed_at_the_end(Suite, TC, Config0),
            Config1 = [{please_close_me, Handle} | Config0],
            {Config1, State};

        SkipOrFail ->
            {SkipOrFail, State}
    end.

post_end_per_testcase(Suite, TC, Config, Return, State) ->
    case proplists:get(please_close_me, Config) of
        undefined -> ok;
        Handle -> close_that_thing(Handle)
    end

Consider these possible testcases:

test_ok(Config) ->
    ok.

test_normal_failure(Config) ->
    ?assert(false).

test_timetrap(Config) ->
    ct:timetrap({seconds, 0}),
    ct:sleep(infinity).

test_exit_signal(Config) ->
    TestProc = self(),
    spawn(fun() -> exit(TestProc, boom) end),
    ct:sleep(infinity).

test_exit_signal_with_link(Config) ->
    spawn_link(fun() -> error(bam) end),
    ct:sleep(infinity).

I observe 3 possible behaviours of CT with respect the hook above (using otp-25), depending on the testcase:

The good: for test_ok or test_normal_failure we get:
- pre_end_per_testcase/4 is called, then
- end_per_testcase/2 is called, and finally
- post_end_per_testcase/5 is called, with Config bound to the return value of pre_end_per_testcase/4.
The ugly: for test_timetrap or test_exit_signal we get:
- pre_end_per_testcase/4 is not called,
- end_per_testcase/2 is not called, and
- post_end_per_testcase/4 is called, with Config bound to the same value the testcase received
The bad: for test_exit_signal_with_link we get:
- pre_end_per_testcase/4 is called, then
- end_per_testcase/2 is called, and finally
- post_end_per_testcase/5 is called, but with Config bound to the same value the testcase received!

As hook author, case 2 is a bit ugly, since one needs to account for the fact that sometimes pre_end_per_testcase may not get called, but the problematic one is case 3:

It can be a bit surprising, imo, that the behaviour here is not the same as for the other case of exit signals, but I assume there are good reasons for this.
Now, the fact that the Config passed to post_end_per_testcase/5 is not what pre_end_per_testcase/4 returned sounds like a bug? In the hook above, it means that there is a Handle that would not get closed. It is possible to work around this, I think, but it is not nice.

The text was updated successfully, but these errors were encountered:

u3s · 2023-04-26T13:07:42Z

can you try using CTHState for storing handle to a resource created and cleaned by CTH code? it seems to be maintained in case when test does not execute as expected; I think it also makes more sense for a resource managed by CTH;
additionally, you can share the handle with test code in Config, so that for example end_per_testcase can do something with resource if that is wanted;
why you're setting ct:timetrap({seconds, 0})? this is agressive and prevents from executing pre_end_per_testcase - at least in my environment. if I increase it to something bigger, I see pre_end_per_testcase called for every test case from you example ... can you confirm that observation?

Apart from above, loosing Config in case of test process crash does not surprise me that much in this scenario ... I don't think this is a bug at this stage.

u3s · 2023-05-10T09:57:22Z

ping @jcpetruzza

jcpetruzza · 2023-05-14T16:54:20Z

Thanks for looking into this, @u3s and sorry for the very late reply, I just returned from some holidays :)

why you're setting ct:timetrap({seconds, 0})? this is agressive and prevents from executing pre_end_per_testcase - at least in my environment. if I increase it to something bigger, I see pre_end_per_testcase called for every test case from you example ... can you confirm that observation?>

I agree that I shouldn't have used ct:timetrap({seconds, 0}) to reproduce the issue, specially since it has the side-effect of setting the timetrap for all the testcases that follow 🤦 Sorry about that!

Anyway, after correcting that, we see that test_timetrap and test_exit_signal also behave badly, just like test_exit_signal_with_link. So ultimately there's only two cases to consider "the good" and "the bad", with the latter being the buggy one.

Apart from above, loosing Config in case of test process crash does not surprise me that much in this scenario ... I don't think this is a bug at this stage.

I disagree. Consider this concrete example:

% my_cth.erl
-module(my_cth).

-export([
    init/2,
    pre_end_per_testcase/4,
    post_end_per_testcase/5
]).

init(_Id, _Opts) ->
    {ok, []}.

pre_end_per_testcase(_Suite, TC, Config1, State) ->
    ct:pal("~p: pre_end_per_testcase called!", [TC]),
    Config2 = [{pre_end_per_testcase_was_here, true} | Config1],
    {Config2, State}.

post_end_per_testcase(_Suite, TC, Config, _Return, State) ->
    ct:pal(
        "~p: post_end_per_testcase got injected resource: ~p",
        [TC, proplists:get_value(pre_end_per_testcase_was_here, Config, false)]
    ),
    {ok, State}.

% my_SUITE.erl
-module(my_SUITE).

-include_lib("stdlib/include/assert.hrl").
-include_lib("common_test/include/ct.hrl").

-export([
    all/0,
    groups/0,

    suite/0,

    init_per_testcase/2,
    end_per_testcase/2,

    test_ok/1,
    test_normal_failure/1,

    test_timetrap/0, test_timetrap/1,

    test_exit_signal/1,
    test_exit_signal_with_link/1
]).

all() ->
    [
        {group, the_good},
        {group, the_bad}
    ].

groups() ->
    [
        { the_good, [], [
              test_ok,
              test_normal_failure
          ]
        },
        { the_bad, [], [
              test_timetrap,
              test_exit_signal,
              test_exit_signal_with_link
          ]
        }
    ].

suite() ->
    [
        {ct_hooks, [my_cth]}
    ].

init_per_testcase(_TC, Config) ->
    Config.

end_per_testcase(TC, Config) ->
    ct:pal(
        "~p: end_per_testcase got injected resource: ~p",
        [TC, proplists:get_value(pre_end_per_testcase_was_here, Config, false)]
    ),
    ok.

test_ok(_Config) ->
    ok.

test_normal_failure(_Config) ->
    ?assert(false).

test_timetrap() ->
    [{timetrap, {seconds, 1}}].
test_timetrap(_Config) ->
    ct:sleep(infinity).

test_exit_signal(_Config) ->
    TestProc = self(),
    spawn(fun() -> exit(TestProc, boom) end),
    ct:sleep(infinity).

test_exit_signal_with_link(_Config) ->
    spawn_link(fun() -> error(bam) end),
    ct:sleep(infinity).

We can see how CT is behaving differently (at least in R25) by looking at the log output:

A "good" case:

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: pre_end_per_testcase called!

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: end_per_testcase got injected resource: true

----------------------------------------------------
2023-05-14 17:01:14.351
test_normal_failure: post_end_per_testcase got injected resource: true

A "bad" case:

----------------------------------------------------
2023-05-14 17:01:15.492
test_exit_signal_with_link: pre_end_per_testcase called!

----------------------------------------------------
2023-05-14 17:01:15.492
test_exit_signal_with_link: end_per_testcase got injected resource: true

----------------------------------------------------
2023-05-14 17:01:15.493
test_exit_signal_with_link: post_end_per_testcase got injected resource: false

Why I think the latter is a bug:

CT managed to successfully call pre_end_per_testcase/4 and passed the returned Config to end_per_testcase/2, which succeeded; there is no reason why it can't pass that same Config to post_end_per_testcase/5, as it does in the "good" case!
As it is, it just violates, for no good reason, the protocol that says that the Config returned by pre_xxx is passed to the corresponding post_xxx, that hook authors need to able to rely on.

Regarding your other comments:

can you try using CTHState for storing handle to a resource created and cleaned by CTH code? it seems to be maintained in case when test does not execute as expected;

We do leverage the CTHState to be able to workaround the issue. We are using it as a lookup-table to see if there is a "more recent" Config we should use. It is tricky since it can lead to space-leaks and introduces a lot of unnecessary complexity, since CT's responsibility is to handle the Config lifetime correctly

I think [using CTHState] also makes more sense for a resource managed by CTH;

Not when the hook is managing individual resources for each testcase; need to account for parallel execution; the same testcase can occur in multiple groups, etc. The Config is the only reliable source of truth in that case, afaics.

additionally, you can share the handle with test code in Config, so that for example end_per_testcase can do something with resource if that is wanted;

Well, iiuc, that's precisely what we are trying to do, and why "the bad" case is problematic: the pre_end_per_testcase/4 function creates a resource, and puts it in the Config, so that the end_per_testcase/2 function can use it... but then in this case we can't dispose of it since the post_end_per_testcase/5 function doesn't get it's handle in the Config!

u3s · 2023-05-15T08:18:51Z

thanks for re-phrasing and updating the description. I now agree with your observation.

when looking at pids:

the_good test cases are executed in a single process (<A>: testcase, <A>: end_per_testcase, <A>:post_end_per_testcase)
the bad test cases are executed in 3 process (<A>: testcase, <B>:end_per_testcase, <C>:post_end_per_testcase)
B is provided with latest Config owned by A, but C is not receiving it

this looks strange. I will need to check source code to figure out the root cause of this behavior.

u3s · 2023-07-03T09:39:26Z

#7246 fix is merged to maint branch

jcpetruzza added the bug Issue is reported as a bug label Apr 13, 2023

u3s added the team:PS Assigned to OTP team PS label Apr 14, 2023

IngelaAndin assigned u3s Apr 19, 2023

IngelaAndin added the priority:low label Apr 19, 2023

u3s removed the bug Issue is reported as a bug label Apr 27, 2023

u3s added the waiting waiting for changes/input from author label May 5, 2023

u3s added not a bug Issue is determined as not a bug by OTP bug Issue is reported as a bug labels May 10, 2023

u3s linked a pull request May 16, 2023 that will close this issue

Kuba/ct/post end per testcase fix/otp 18579 #7246

Merged

u3s removed not a bug Issue is determined as not a bug by OTP waiting waiting for changes/input from author labels May 16, 2023

u3s closed this as completed Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Test hooks mishandle exit signals with link #7119

Common Test hooks mishandle exit signals with link #7119

jcpetruzza commented Apr 13, 2023

u3s commented Apr 26, 2023 •

edited

Loading

u3s commented May 10, 2023

jcpetruzza commented May 14, 2023

u3s commented May 15, 2023

u3s commented Jul 3, 2023

Common Test hooks mishandle exit signals with link #7119

Common Test hooks mishandle exit signals with link #7119

Comments

jcpetruzza commented Apr 13, 2023

u3s commented Apr 26, 2023 • edited Loading

u3s commented May 10, 2023

jcpetruzza commented May 14, 2023

u3s commented May 15, 2023

u3s commented Jul 3, 2023

u3s commented Apr 26, 2023 •

edited

Loading