Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erlang/OTP 26 stdin fails on file:read/file:read_line #7230

Closed
josevalim opened this issue May 15, 2023 · 14 comments · Fixed by #7384
Closed

Erlang/OTP 26 stdin fails on file:read/file:read_line #7230

josevalim opened this issue May 15, 2023 · 14 comments · Fixed by #7384
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Milestone

Comments

@josevalim
Copy link
Contributor

Describe the bug

$ echo "duπa" | erl -noshell -eval "io:setopts(standard_io, [{encoding, latin1}, {binary, true}]), io:format('~p~n', [file:read_line(standard_io)]), erlang:halt()"
{error, collect_line}

Expected behavior

On Erlang/OTP 25 it worked without problems:

$ echo "duπa" | erl -noshell -eval "io:setopts(standard_io, [{encoding, latin1}, {binary, true}]), io:format('~p~n', [file:read_line(standard_io)]), erlang:halt()"
{ok,<<100,117,207,128,97,10>>}

The goal is to read a bytestream, "duπa" is just an example.

Affected versions

Only on Erlang/OTP 26. This patch fixes stdout, perhaps similar is needed for stdin?

@josevalim josevalim added the bug Issue is reported as a bug label May 15, 2023
@rickard-green rickard-green added the team:VM Assigned to OTP team VM label May 15, 2023
@frazze-jobb frazze-jobb self-assigned this May 15, 2023
@lukaszsamson
Copy link
Contributor

@frazze-jobb when can we expect a release with fix for that? This one is breaking elixirLS (Elixir Language Server)

@frazze-jobb
Copy link
Contributor

Yes, am on parental leave this week, but will look into it next week

@frazze-jobb
Copy link
Contributor

Would it be okey to start the shell later?
echo "duπa" | erl -noinput -eval "io:setopts(standard_io, [{encoding, latin1}, {binary, true}]), shell:start_interactive(noshell), io:format('~p~n', [file:read_line(standard_io)]), erlang:halt()"

In that case it would be possible to solve this similarly to #7211

@lukaszsamson
Copy link
Contributor

lukaszsamson commented Jun 7, 2023

@frazze-jobb I can start the shell later but I need something that will work on lower OTP versions. Your example crashes OTP 25

echo "duπa" | erl -noinput -eval "io:setopts(standard_io, [{encoding, latin1}, {binary, true}]), shell:start_interactive(noshell), io:format('~p~n', [file:read_line(standard_io)]), erlang:halt()"
{"init terminating in do_boot",{undef,[{shell,start_interactive,[noshell],[]},{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,744}]},{erl_eval,exprs,6,[{file,"erl_eval.erl"},{line,136}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}}
init terminating in do_boot ({undef,[{shell,start_interactive,[noshell],[]},{erl_eval,do_apply,7,[{_},{_}]},{erl_eval,exprs,6,[{_},{_}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done

@josevalim
Copy link
Contributor Author

@frazze-jobb just so I understand better the problem, it is not possible to change the encoding after the shell is started, even with -noshell?

@frazze-jobb
Copy link
Contributor

It is possible to change the encoding, problem is that the driver has already read stdin before "io:setopts" has been evaluated

@lukaszsamson
Copy link
Contributor

Alternatively, is it possible to read from stdin device without starting a shell?

@josevalim
Copy link
Contributor Author

@frazze-jobb I see! What do you think about a VM flag or app configuration? -kernel stdin_encoding latin1? This way you can read the default configuration, it can be set early on, and it should be backwards compatible.

@ferd
Copy link
Contributor

ferd commented Jun 7, 2023

For what it's worth, you can access the direct binary stream by setting stdio as a file descriptor when using the erlang:open_port/2 BIFs:

cat ~/Desktop/Audio.wav | head -n1
RIFF�sWAVEJUNK4fmt D��data��s������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

$ cat ~/Desktop/Audio.wav | head -n1 | erl -noinput -eval 'P = erlang:open_port({fd,0,1}, [binary]), receive {P,{data,X}} -> erlang:port_command(P, ["input was: ", X]) end, halt()'
input was: RIFF�sWAVEJUNK4fmt D��data��s������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

This will preserve the exact encoding that was passed in the input. If it was unicode, it remains unicode, if it was binary garbage, it remains binary garbage. The 'port_command' value does expect iodata() out, so any unicode output subsequences you have have to be flattened out to valid binaries before being sent out (so avoid any charlists that have unicode codepoints in the values and force it down to valid utf8 binaries with unicode:characters_to_binary if you do have unicode)

@lukaszsamson
Copy link
Contributor

@ferd will it work in windows?

@ferd
Copy link
Contributor

ferd commented Jun 7, 2023

hm, I think unfortunately windows is pickier about drivers hijacking stdio, even though Erlang is told not to use them. It might fail there, but I haven't had a windows computer in a functional state in a long while to check it.

@garazdawi
Copy link
Contributor

Nice idea @ferd !

@ferd will it work in windows?

I tried it in Windows 10 using cmd.exe and it works there. So I would expect it to work anywhere. This is basically how "oldshell" used to work before the OTP 26 rewrite.

@lukaszsamson
Copy link
Contributor

The downside it that as far as I understand it just passes byte blobs with no concept of lines. In order to use it in elixirLS I'd need to implement buffering and reading by lines.

@garazdawi
Copy link
Contributor

You can pass {line, 80} to open_port to make it send messages based on lines. And the buffer can just be the process mailbox :)

That being said, I think we will add an option as Josè described so that you can set the encoding of standard_io when you start erlang.

@garazdawi garazdawi self-assigned this Jun 9, 2023
garazdawi pushed a commit to garazdawi/otp that referenced this issue Jun 16, 2023
As group now acts as the proxy when running "oldshell" or
"noshell" it needs to be able to read and write raw binaries.
Latin1 encoding allows all possible bytes, so by fixing latin1
we allow any bytes to be passed into and out of Erlang unmodified.

fixes erlang#7230
@garazdawi garazdawi added this to the OTP-26.0.2 milestone Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
6 participants