Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io:get_line/1 cannot read cyrillic input on Windows with -noshell #8113

Closed
josevalim opened this issue Feb 13, 2024 · 14 comments
Closed

io:get_line/1 cannot read cyrillic input on Windows with -noshell #8113

josevalim opened this issue Feb 13, 2024 · 14 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Milestone

Comments

@josevalim
Copy link
Contributor

Describe the bug

Take the following program foo.erl:

-module(foo).
-compile(export_all).

start() ->
  io:setopts([{binary, true}]),
  io:format("~p~n", [io:getopts()]),
  loop().

loop() ->
  Res = io:get_line(">> "),
  %% erlang:display(<<0, Res/binary>>),
  io:put_chars(["---> ", Res, $\n]),
  loop().

And then:

$ erlc foo.erl
$ erl -noshell -s foo
>> Привет
---> ਢ

Once we typed "Привет" as input, it printed the wrong output. When I uncommented erlang:display/1, I could see that the issue is that we are reading the wrong input. We got the following bytes:

<<0,194,143,195,160,194,168,194,162,194,165,195,162,10>>

But we expected:

<<0,208,191,209,128,208,184,208,178,208,181,209,130,33,10>>

Affected versions

Erlang/OTP 26.2. The reporter also tried the Windows installer provided by pull request #8103 with the same results.

Additional context

This issue was originally submitted here: elixir-lang/elixir#13301

To quote the reporter:

I have tested consuming Cyrillic input in Erlang and Elixir from CMD stand-alone, on Windows Terminal and VS Code, as well from Powershell stand-alone and on WT and VSC. No option seemed to work well. As for dotnet, the issue with encoding is only present in PS on WT, Cmd stand-alone required me to type chcp 65001 to turn on UTF8, while Cmd on WT has no such problem

@garazdawi
Copy link
Contributor

This is a known limitation/bug of the current implementation: https://github.com/erlang/otp/blob/OTP-26.2.2/erts/emulator/nifs/common/prim_tty_nif.c#L425-L427

If anyone has any tips on how to solve this problem feel free to help out. My current idea is to disallow migration from -noshell to any other mode using shell:start_interactive/.... I've yet to check and see if that will cause any issues.

The gist of the issue is that we need to be able to go from line-mode to character-mode when reading from stdin. This means that if we use ReadFile in blocking mode, we will loose input when switching as it cannot be interrupted. So we use ReadFile in overlapped mode, but then Unicode does not work :(

@josevalim
Copy link
Contributor Author

I see, thanks. @garazdawi, do you mean we lose intermediate input that you type but do not press enter? Or any input? For what is worth, it was never possible to type into iex before the shell effectively started, so I think we would be fine with discard input.

@garazdawi
Copy link
Contributor

You will loose any input until enter is pressed the first time after switching. So if you start iex with erl -noshell and the upgrade to a shell, then anything the user types until the first enter till be ignored. It would however still be possible to start from -noinput and go to a shell or "-noshell".

@josevalim
Copy link
Contributor Author

I see, perhaps in IEx we can start with -noinput and then immediately start the shell. Please keep me posted. Feel free to close this if this is being tracked elsewhere. Thanks!

@garazdawi
Copy link
Contributor

I was thinking a bit more about this issue and maybe it would be possible to fix this if we update user_drv+group to do lazy reads when in read line mode. That is when group is in echo_off mode and we need more data, we change it so that it needs to explicitly ask for it from user_drv. I think this would make it possible for us to use ReadFile in blocking mode and switch to ReadConsoleInput when needed. Come to think of it, this would probably also have solved the problem in #7230 in a much better way...

@josevalim
Copy link
Contributor Author

I am not sure how much these are related but "lazy reads" sound interesting to me because Erlang buffers all IO, even when shell is disabled. Compare this:

~/OSS/ecto[master]$ ruby -e "sleep 5"
jkldsjakldjsa
dsajkldjsaldjsa
jdsklajdlsa
~/OSS/ecto[master]$ jkldsjakldjsa
-bash: jkldsjakldjsa: command not found
~/OSS/ecto[master]$ dsajkldjsaldjsa
-bash: dsajkldjsaldjsa: command not found
~/OSS/ecto[master]$ jdsklajdlsa
-bash: jdsklajdlsa: command not found

Because the Ruby program did not read anything, it is left in the buffer and then sent to the shell once the Ruby program terminates.

In Erlang's case, all IO is buffered:

~/OSS/ecto[master]$ erl -noshell -eval "timer:sleep(5000), erlang:halt()."
jhkldsjakldsa
jkldjdsklajkol
jkldjskla
~/OSS/ecto[master]$

This leads to issues such as yes | some_erlang_or_elixir_script eventually running out of memory because it keeps buffering it. Ultimately, I don't think it is a major issue, but maybe it serves as food for thought. :)

@josevalim
Copy link
Contributor Author

@garazdawi do you think this is related to #7621? If we fix one, we would fix both?

@garazdawi
Copy link
Contributor

Maybe. It depends a bit on what is triggering the interrupt in #7621.

I’m planning to try to at least do a prototype for this in the next month or so now that 27 is finally released.

@garazdawi
Copy link
Contributor

I did a prototype for lazy reads, but I hit a snag... today the Erlang emulator will terminate if stdin is closed. Unfortunately it is rather hard to detect that stdin has been closed (that I can find) without reading from it, maybe I can do something creative with poll+POLLHUP on Unix and PeekNamedPipe on Windows.

@josevalim
Copy link
Contributor Author

@garazdawi in my experience, most programming languages do not terminate when stdin is closed. I happen to run into this because Erlang ports close stdin, but since most runtimes do not terminate, they may leave zombie processes.

Would it be an option to move the termination logic to the Erlang code, such as the Erlang shell, since that's the one reading input in a loop? So the shell still terminates, but those running erl -noshell have decide by themselves if it terminates on stdin close or not.

@garazdawi
Copy link
Contributor

Yeah, we are a bit different here (and in many other places...). We could possibly update -noshell to work the way you describe if stdin is a tty, but I don't think we can do that for when it is a pipe without breaking a lot of testcases (not just ours, but anyone running an Erlang node in os:cmd or open_port).

I'll tinker around a bit with POLLHUP and PeekNamedPipe after my summer holiday and see if I can get that to work. Since the original problem was with Windows and switching from cooked to raw mode, then maybe I'll just focus on making Windows lazy and leave Unix as is for now.

@garazdawi
Copy link
Contributor

Just realized that using POLLHUP/PeekNamedPipe cannot work as we would end up spinning checking it if there is unread data on stdin but it is not closed...

So the remaining options that I can see right now are:

  • "fix" -noshell to not automatically terminate when stdin is closed, breaking backwards compatability.
  • Add yet another option to get lazy reads.

A potential compromise is to change -noshell to not automatically terminate when stdin is a tty and it is closed, but leave it as is when it is a pipe and also add an option to make -noshell be lazy when it is a pipe.

Doing the compromise would solve the problem on Windows when we want to switch from cooked (-noshell) to raw (start:interactive/0) and adds an option to get a more "normal" behaviour when in a pipe.

Anyway, I'll get back to this after summer.

@josevalim
Copy link
Contributor Author

FWIW, all of these options work for me. If there is an option for lazy reads, I would probably enable that by default in Elixir, and that may be less work for everyone! Thanks.

@garazdawi
Copy link
Contributor

Fixed in #8962

@garazdawi garazdawi added this to the OTP-28.0 milestone Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

3 participants