Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow file:write to stdout with latin1 encoding on OTP 26.2.3 #8305

Open
VLanvin opened this issue Mar 25, 2024 · 4 comments · May be fixed by #9013
Open

Very slow file:write to stdout with latin1 encoding on OTP 26.2.3 #8305

VLanvin opened this issue Mar 25, 2024 · 4 comments · May be fixed by #9013
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@VLanvin
Copy link

VLanvin commented Mar 25, 2024

Describe the bug
file:write/2 to erlang:group_leader() with latin1 encoding is unexpectedly slow on OTP 26.2.3. It is fine on OTP 26.0.2.

To Reproduce
Run the following script on OTP 26.2.3 with the input file (3MB) provided in this archive (redirect output to /dev/null).

main(_) ->
  Data = binary_to_term(element(2, file:read_file("input"))),
  io:setopts(erlang:group_leader(), [binary, {encoding, latin1}]),
  file:write(erlang:group_leader(), Data),
  ok.

time escript test.erl >> /dev/null takes ~1.6s. Commenting out the file:write or removing latin1 encoding reduces runtime to ~0.3s.

Expected behavior
file:write should be near-instantaneous there.

Affected versions
OTP 26.2.3
OTP 26.1.2
Doesn't happen on OTP 26.0.2

Additional context
This is a minimal repro of what happens in ELP's parse server.
The input file provided in the archive was obtained by running the parse server on OTP's unicode_util and dumping the resulting term.

@VLanvin VLanvin added the bug Issue is reported as a bug label Mar 25, 2024
@garazdawi
Copy link
Contributor

I haven't checked, but I assume this is because with the new stdio implementation in 26 we keep all internal data as unicode. So the data is first converted from latin1 to unicode to then be converted back again before outputted to stdout. Prior to OTP 26 the stdio used when redirecting to a non-terminal was not unicode aware, so it would just shuffle the bytes.

If my hypothesis is correct , the solution would be to make group aware that user_drv is currently in latin1 mode and then skip the convertion. A PR would be very welcome!

@garazdawi garazdawi self-assigned this Mar 25, 2024
@michalmuskala
Copy link
Contributor

Ah, now that we know where to look we'll work on sending a PR

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Mar 26, 2024
@michalmuskala
Copy link
Contributor

Unfortunately, I did not have time yet to look into it a bit more deeply, but I'm leaving this comment for others that might be struggling with the regression as well.

Inside ELP we've "solved" this by migrating to "raw" stdio through a port. This involves starting the node with -noinput (note that -noinput -noshell causes some spurious error logs), and later on "opening" the stdin/out as a port with:

open_port({fd, 0, 1}, Opts)

with file descriptors 0 and 1 being standard for stdin/out.

It actually ends up more performant than before, and allowing for options such as active and {packet, N} on the port makes it somewhat more convenient to use.

I still plan on working on a fix for this in OTP, but it's somewhat less of a priority for us right now with this work-around.

@garazdawi
Copy link
Contributor

Can you check and see if #9013 solves the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants