Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle invalid timeout values returned from gen_server callback functions #5683

Merged
merged 3 commits into from
Mar 18, 2022
Merged

fix: handle invalid timeout values returned from gen_server callback functions #5683

merged 3 commits into from
Mar 18, 2022

Conversation

pouriya
Copy link
Contributor

@pouriya pouriya commented Feb 5, 2022

gen_server module currently does not check returned timeout from init and all other handle_* callback functions. IMO in this case it should terminate with a {bad_return_value, _} too.

Example

-module(m).
-compile(export_all).
-compile(nowarn_export_all).

start_link(Timeout) ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, Timeout, []).

init(Timeout) ->
    {ok, undefined, Timeout}.

handle_info(_, S) ->
    % Max 'receive' timeout value + 1
    {noreply, S, 16#FFFFFFFF+1}.

terminate(_, _) ->
    ok.

Before this change

1> erlang:process_flag(trap_exit, true).
false

2> m:start_link(-1).
{ok,<0.80.0>} % Dies right after entering the gen_server loop
=CRASH REPORT==== 5-Feb-2022::03:18:40.013997 ===
  crasher:
    initial call: m:init/1
    pid: <0.80.0>
    registered_name: m
    exception error: bad receive timeout value
      in function  gen_server:loop/7 (gen_server.erl, line 411)
    ancestors: [<0.77.0>]
    message_queue_len: 0
    messages: []
    links: [<0.77.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 207
  neighbours:

3> flush().
Shell got {'EXIT',<0.80.0>,
              {timeout_value,
                  [{gen_server,loop,7,[{file,"gen_server.erl"},{line,411}]},
                   {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,249}]}]}}
ok


% let it start and live:
4> m:start_link(infinity).
{ok,<0.83.0>}

5> m ! oops.
oops
=CRASH REPORT==== 5-Feb-2022::03:19:09.405236 ===
  crasher:
    initial call: m:init/1
    pid: <0.83.0>
    registered_name: m
    exception error: bad receive timeout value
      in function  gen_server:loop/7 (gen_server.erl, line 411)
    ancestors: [<0.77.0>]
    message_queue_len: 0
    messages: []
    links: [<0.77.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 199
  neighbours:

6> flush().
Shell got {'EXIT',<0.83.0>,
              {timeout_value,
                  [{gen_server,loop,7,[{file,"gen_server.erl"},{line,411}]},
                   {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,249}]}]}}
ok

After this change

1> erlang:process_flag(trap_exit, true).
false

2> m:start_link(-1).
{error,{bad_return_value,{ok,undefined,-1}}}

=CRASH REPORT==== 5-Feb-2022::03:24:44.849737 ===
  crasher:
    initial call: m:init/1
    pid: <0.81.0>
    registered_name: m
    exception exit: {bad_return_value,{ok,undefined,-1}}
      in function  gen_server:init_it/6 (gen_server.erl, line 403)
    ancestors: [<0.77.0>]
    message_queue_len: 0
    messages: []
    links: [<0.77.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 203
  neighbours:


3> flush().
Shell got {'EXIT',<0.81.0>,{bad_return_value,{ok,undefined,-1}}}
ok


% let it start and live:
4> m:start_link(infinity).
{ok,<0.84.0>}

5> m ! oops.
oops

=ERROR REPORT==== 5-Feb-2022::03:25:09.760708 ===
** Generic server m terminating 
** Last message in was oops
** When Server state == undefined
** Reason for termination ==
** {bad_return_value,{noreply,undefined,4294967296}}

=CRASH REPORT==== 5-Feb-2022::03:25:09.761283 ===
  crasher:
    initial call: m:init/1
    pid: <0.84.0>
    registered_name: m
    exception exit: {bad_return_value,{noreply,undefined,4294967296}}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 825)
    ancestors: [<0.77.0>]
    message_queue_len: 0
    messages: []
    links: [<0.77.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 4652
  neighbours:


6> flush().
Shell got {'EXIT',<0.84.0>,{bad_return_value,{noreply,undefined,4294967296}}}
ok

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

CT Test Results

       2 files       83 suites   30m 13s ⏱️
1 576 tests 1 524 ✔️ 52 💤 0
1 833 runs  1 778 ✔️ 55 💤 0

Results for commit 45ca24d.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Feb 7, 2022
@RaimoNiskanen
Copy link
Contributor

I think a macro ?is_timeout/1 should test what the type timeout() defines
i.e ((X) =:= infinity orelse (is_integer(X) andalso 0 =< (X))).
That there is an upper limit for a time-out value is a system limit that should
have been documented as such, and that may be increased in the future.
Better to let it crash in the receive statement is that upper limit is violated.
Checking for the basic type and range catches practically all errors.

IIRCC (If I Read the Code Correctly), it seems that your changes in handle_msg/6
and in handle_common_reply/8 does not handle the hibernate and {continue,_} cases.

Should enter_loop/4,5 also check the Timeout argument?

@pouriya
Copy link
Contributor Author

pouriya commented Feb 12, 2022

Checking for the basic type and range catches practically all errors.

What if some users calculate timeout dynamically (e.g. read the value from config file)?


Better to let it crash in the receive statement is that upper limit is violated.

In any case (With or without these changes) if we violate the limit the code crashes. The only difference is that with these changes we have a normal crash with complete report:

=ERROR REPORT==== 5-Feb-2022::03:25:09.760708 ===
** Generic server m terminating 
** Last message in was oops
** When Server state == undefined
** Reason for termination ==
** {bad_return_value,{noreply,undefined,4294967296}}



it seems that your changes in handle_msg/6 and in handle_common_reply/8 does not handle the hibernate and {continue,_} cases.

Yes you're right! I think that the variable name Time1 confused me!


Questions for next changes

We should just go for ((X) =:= infinity orelse (is_integer(X) andalso (X) >= 0)) and do not check upper bound. Am I right?
We need new clauses in handle_msg/6 and in handle_common_reply/8 to check for hibernate and {continue, _}. Do we need to check more details too?
For example:

% inside handle_msg/6 or `handle_common_reply/8 checks:
    {noreply, S, Timeout} when ?is_timeout(Timeout) ->
        ...;
    {noreply, S, hibernate} ->
        ...;
    {noreply, S, {continue, _}} ->
        ...;

% or
-define(
    is_timeout_or_hibernate_or_continue(X)
    (((X) =:= infinity orelse (is_integer(X) andalso (X) >= 0)) orelse X =:= hibernate orelse (tuple_size(X) == 2 andalso element(1, X) == continue))
).
% ...
% inside handle_msg/6 or `handle_common_reply/8 checks:
    {noreply, S, TimeoutOrHibernateOrContinue} when ?is_timeout_or_hibernate_or_continue(TimeoutOrHibernateOrContinue) ->
        ...;

Should enter_loop/4-5 also check the Timeout argument? I think yes they should but I forgot to add the check. Should I add it?

@RaimoNiskanen
Copy link
Contributor

What if some users calculate timeout dynamically (e.g. read the value from config file)?

If it violates the type (integer() >= 0 | 'infinity') an informative error is in order. But it is (theoretically) very hard or impossible to predict what the upper integer limit is since it is in fact a system limit and how that behaves is subject to change. For instance; a reasonable future extension would be to allow absolute time-outs, and then we have a dynamically calculated time-out value that for a specific absolute time-out time might or might not be accepted by receive. A static type check of the value can not know if it is ok, and then I think a {bad_return_value,_} error exception is misguiding.

The simplest solution is to let receive do its thing and crash for a too large time-out if that should happen in run time.

Do we need to check more details too?

Checking for timeout() | 'hibernate' | {'continue',_} would be sufficient.

I would prefer to not have one gigantic macro for "is timeout or hibernate or continue", and instead separate them and/or use multiple clauses. I think {continue,_} is more readable than is_continue(Time). is_timeout(Time) orelse Time =:= hibernate could be on one clause.

A warning: tuple_size(T), element(1, T) =:= continue is a neat trick in a guard, but if you write a macro is_continue(T) as (tuple_size(T) =:= 2 andalso element(1, T) =:= continue) and later use it as is_continue(T) orelse T =:= hibernate I think you would have a bug; tuple_size(hibernate) should crash and the whole guard fail before testing T =:= hibernate. But T =:= hibernate orelse is_continue(T) would accidentally work. So a general guard macro using andalso/orelse should not skip the is_tuple(T) test before tuple_size(T)...

Should enter_loop/4-5 also check the Timeout argument? I think yes they should but I forgot to add the check. Should I add it?

I think that would be nice to have. What should enter_loop/5 check, then...?
That Mod is an atom, that Options is a list, and that Timeout is a time-out, 'hibernate' or {continue,_}, I'd say. ServerName will cause a crash in gen:get_proc_name/1 if incorrect; that should be ok.

The type spec now only allows Timeout :: timeout()but that is not what the code does... There is no apparent reason to prohibit hibernate - one might want to enter hibernation immediately, and I do not realize why it should be impossible to go right into handle_continue/2.

@IngelaAndin IngelaAndin added the waiting waiting for changes/input from author label Feb 23, 2022
@pouriya
Copy link
Contributor Author

pouriya commented Mar 11, 2022

Another question: For {'$gen_call', _, _} message checks we have these checks and we have the same checks in handle_common_reply here. Why do we need them in handle_msg({'$gen_call', From, Msg} ...) function when this function also calls handle_common_reply here?

@RaimoNiskanen
Copy link
Contributor

Why do we need them in handle_msg({'$gen_call', From, Msg} ...) function when this function also calls handle_common_reply here?

I see no point. It might be an optimization. I do not know how much of a performance gain it might give... It sure is tempting to remove the clauses from handle_msg, to improve clarity!

@RaimoNiskanen
Copy link
Contributor

Unfortunately, now when the last 24 maintenance release is out, this PR will have to be rebased to 'master'... After that we have code freeze for 25.rc2 on Friday. Hopefully we can get this in.

@pouriya
Copy link
Contributor Author

pouriya commented Mar 14, 2022

Here (in Iran) we're getting close to Nowruz (It's celebrated on Monday). So l can change things over the next two days and before the holidays. Anything else?

@RaimoNiskanen
Copy link
Contributor

I can not think of anything else to fix.

Regarding optimization I sense that the argument order of handle_common_reply/8,9 could be improved to minimize register moves, but it is better to see what our benchmarks say before changing that, and I can do it myself on top of your branch.

So, go ahead and rebase...

… returned value of callback and enter_loop functions
@CLAassistant
Copy link

CLAassistant commented Mar 15, 2022

CLA assistant check
All committers have signed the CLA.

@RaimoNiskanen RaimoNiskanen changed the base branch from maint to master March 16, 2022 15:30
@RaimoNiskanen RaimoNiskanen added the testing currently being tested, tag is used by OTP internal CI label Mar 16, 2022
@@ -617,23 +617,23 @@ enter_loop(Mod, Options, State, {continue, _}=Continue) when is_atom(Mod) andals
Options :: [enter_loop_opt()],
State :: term(),
ServerName :: server_name() | pid(),
Timeout :: timeout() | hibernate | {continue, term()}
Timeout :: timeout()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Thank you 👍🏼

@RaimoNiskanen
Copy link
Contributor

I forgot to mention that the branch base should also be changed from 'maint' to 'master' in GitHubs GUI, so it looked like a mess for a while...

Also, documentation build failed due to -spec changes, which is a blocking issue for our daily builds.

@RaimoNiskanen RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Mar 17, 2022
@RaimoNiskanen
Copy link
Contributor

After an internal discussion we decided to change the code style of guard expressions to use the same style as the rest of the module, that is: traditional comma-and-semicolon separated.

@RaimoNiskanen RaimoNiskanen added the testing currently being tested, tag is used by OTP internal CI label Mar 17, 2022
@pouriya
Copy link
Contributor Author

pouriya commented Mar 17, 2022

I think that andalso and orelse are simpler to understand. What is your opinion? What if we update all traditional comma-and-semicolon separated guards?

@RaimoNiskanen
Copy link
Contributor

My opinion is that comma-and-semicolon guards are easier to read, when line broken consistently. Probably since I have used Erlang since before andalso and orelse were usable in guards...

So this is a matter of taste, therefore we prefer to keep the style of the module.

@RaimoNiskanen RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Mar 18, 2022
@RaimoNiskanen RaimoNiskanen merged commit f096a54 into erlang:master Mar 18, 2022
@RaimoNiskanen
Copy link
Contributor

Thank you for your pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS waiting waiting for changes/input from author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants