Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix io:columns/0 timeout when invoked from application callback #351

Closed
wants to merge 1 commit into from

Conversation

josevalim
Copy link
Contributor

This patch fixes an issue where io:columns/0 times out when
invoked from any application callback (or any supervisor/supervised module
since the group leader is inherited).

To reproduce the issue, one must simply call io:columns() from any
application callback. You will notice the process will block for 2 seconds
which then times out and returns {:error, :enotsup}.

Note this bug only happens inside the erlang shell (using -noshell or
escripts do not trigger the bug).

To fix the bug, it is important to understand how io requests flow from
application callback processes. Here are the steps followed:

  1. Since io:columns/1 is timing out, the first step is to find out who is the
    group leader for the application callback process. Using process_info/1, we
    can see the parent process is the application_master and it does handle
    io_requests by delegating to the group_leader:

    IoReq when element(1, IoReq) =:= io_request ->
    State#state.gleader ! IoReq,
    main_loop(Parent, State);

  2. By inspecting the application_master process, we can find the group_leader
    the message is sent above is the registered process named user. The process is
    running the group module which does handle io:columns/1 requests:

    io_request({get_geometry,columns},Drv,Buf) ->
    case get_tty_geometry(Drv) of
    {W,_H} ->
    {ok,W,Buf};
    _ ->
    {error,{error,enotsup},Buf}
    end;
    io_request({get_geometry,rows},Drv,Buf) ->
    case get_tty_geometry(Drv) of
    {_W,H} ->
    {ok,H,Buf};
    _ ->
    {error,{error,enotsup},Buf}
    end;

  3. Those requests are delegated to the underlying driver defined here:

    put(user_drv, Drv),

  4. The driver is the user_drv process. The user_drv process does handle
    tty_geometry requests, except that a clause above ends up ignoring the
    tty_geometry requests from the user process:

    {User,Req} -> % never block from user!
    io_request(Req, Iport, Oport),
    server_loop(Iport, Oport, Curr, User, Gr);
    {Curr,tty_geometry} ->
    Curr ! {self(),tty_geometry,get_tty_geometry(Iport)},
    server_loop(Iport, Oport, Curr, User, Gr);

This patch moves the user clause below the specific driver messages.

This patch fixes an issue where io:columns/0 times out when invoked from
any application callback (or any supervisor/supervised module since the
group leader is inherited).

To reproduce the issue, one must simply call io:columns() from any
application callback. You will notice the process will block for 2 seconds
which then times out and returns {:error, :enotsup}.

Note this bug only happens inside the erlang shell (using -noshell or
escripts do not trigger the bug).

To fix the bug, it is important to understand how io requests flow from
application callback processes. Here are the steps followed:

1. Since io:columns/1 is timing out, the first step is to find out who is the
group leader for the application callback process. Using process_info/1, we
can see the parent process is the application_master and handles
io_requests by delegating them to the group_leader.

2. By inspecting the application_master process, we can find the group_leader
the message is sent is the registered process named user. The process is
running the group module which does handle io:columns/1 requests delegating
them to user_drv process.

3. The user_drv process does handle tty_geometry requests, except that a
clause above ends up short-circuiting all tty_geometry requests from the
user process.

This patch moves the user clause below the specific driver messages.
@ferd
Copy link
Contributor

ferd commented May 1, 2014

I'm not against this patch, this sounds like a true bug.

However, I'm wondering what design decision leads to fetching the number of IO columns in an application callback. Is this done in one of the start (or start phases) on behalf of a supervisor or other process, and is there an expectation that the terminal is never resized? If so, this sounds like the logic should be moved elsewhere where it can be modified more frequently.

(if it's done for a one-off message that happens to need the width just for itself, then I have pretty much nothing to say about it)

@josevalim
Copy link
Contributor Author

Notice it is not only the application callback. Since the supervisor and all supervisor children "inherit" from the application callback process, all of them will have the same issue. Calling io:columns/0 in a GenServer or inside a Cowboy handler will all trigger the 2 seconds wait. I mentioned the application callback because it is the easiest way to reproduce the issue.

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed

@josevalim
Copy link
Contributor Author

This patch also fix a bug where there is a 2 seconds delay to read the shell encoding (via io:getopts/1) when starting an application. It is exactly what happens with io:colums/1 since the group_leader process also sends get_unicode_state to the shell. You can see this clause is also being reordered in the patch.

@garazdawi
Copy link
Contributor

merged, will be visible on github soon (tm)

@garazdawi garazdawi closed this Oct 20, 2014
@josevalim
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants