-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let File.write
and File.write!
take advantage of writev
#5154
Conversation
As a performance optimization, have these functions use `:file.write/3`, usually with `:raw` mode, instead of `:file.write_file/3`. This will cause the BEAM to use the `writev` system call instead of `write`. The advantage of this is that, when writing an iolist, the BEAM need not concatenate the items and produce a single binary for writing. Instead, it can let the operating system read each item from memory individually; the final output is produced only in the target file. Skipping the binary concatenation saves work, saves memory, and creates less garbage for later collection. Note that `:raw` mode cannot be used when a character encoding is given. In `:raw` mode, the BEAM does not create a separate process to handle the file, and (I think) it would normally be this process which performs the character encoding. So if an encoding like `:utf8` is given, we do not use `:raw` mode and therefore do not use `writev`.
@@ -1340,6 +1355,28 @@ defmodule File do | |||
defp normalize_modes([], true), do: [:binary] | |||
defp normalize_modes([], false), do: [] | |||
|
|||
defp normalize_write_modes(modes) do | |||
normalize_modes([:write] ++ modes, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm setting :binary
because, even though I'm not sure it's necessary, it's more like what the code used to do. We previously used :file.write_file/3
, and the docs say that "the mode flags binary and write are implicit" when calling that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
# and raw mode is also requested. | ||
# http://erlang.org/doc/man/file.html#open-2 | ||
defp set_raw_unless_encoding_specified(modes) do | ||
case specifies_encoding?(modes) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keyword.has_key?(modes, :encoding)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. Docs say "A keyword is a list of two-element tuples" - I didn't know that would work if not all the elements in the list were tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably List.keymember?/3
will better communicate what data structure we are working with here (even if it's totally the same as Keyword.has_key?/2
technically).
# and raw mode is also requested. | ||
# http://erlang.org/doc/man/file.html#open-2 | ||
defp set_raw_unless_encoding_specified(modes) do | ||
case Keyword.has_key?(modes, :encoding) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case
looks like an if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated about that. This was one line shorter and seemed readable, but I'm happy to change it if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
if Keyword.has_key?(modes, :encoding), do: modes, else: [:raw | modes]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the if
as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, changing. 😄
I don't understand how this is different to using Note that there is a comment in |
Good question. I agree that it shouldn't be different, but it seems to be. If I do this in :file.write_file("/tmp/tmp.txt", ["foo", "bar"], [:raw]) ...and watch it with Evan Miller's dtrace script, I see it using {:ok, file} = :file.open("/tmp/tmp.txt", [:write, :raw])
:file.write(file, ["foo", "bar"]) ... it uses |
@fishcakez write_file is converting everything into a binary before hand. That said, @nathanl, we should convert everything to a binary if a encoding was given (i.e. we are not in raw mode). |
Ah this is a bug in |
Yup, that's pretty much it. Should we instead fix Erlang/OTP? |
Because we don't trust Erlang's UTF-8 encoding? I added a test for the conversion of charlists, but noticed that Erlang complained if I gave it a larger codepoint, like 9731 (☃). |
Yeah I think we should fix this in Erlang/OTP whatever we do here. |
Ok, we can fix it in here meanwhile. |
Sounds good. |
That would be great. When we're ready to merge this, I'll make sure that the relevant change is one commit so that it can be easily reverted later if Erlang's bug is fixed. |
{:ok, file} -> | ||
case F.write(file, content) do | ||
:ok -> | ||
F.close(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check :file.write_file
and copy the close handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fishcakez Pushed a commit, copying what I see at https://github.com/erlang/otp/blob/5492ce9951aced8686dbef99d0693e7c6da50c7d/lib/kernel/src/file.erl#L400-L419
write_file expects IO data. So there is no unicode conversion happening here. I would expect any list with more than > 255 to fail. The reason we convert to binary is because when raw mode is not used, we spawn another process, and sending the binary to the other process will be more performant. The conversion can be done by calling |
Also, when not using raw mode, convert data to binary.
Based on @fishcakez feedback, this no longer sets |
Once all concerns are addressed, I'll squash this down and revise the commit message. Like I said, I'll make the code changes one commit so that it can easily be reverted once |
@nathanl, at this point, given we have basically reimplemented file:write_file, I believe we should send this fix directly to Erlang/OTP. The reason why I was OK with previous patch was because we were defaulting to That said, I don't believe we should merge this, sorry. On the positive side, there is no way we could get to this conclusion without your patch and work, so the work here is definitely not being wasted. Let me know if you need help in providing a patch to Erlang/OTP or if you prefer me to do it. Thank you! |
@josevalim No worries! I think I will give it a try, but I may come back for help, since I've not written any Erlang yet. Do I understand correctly that this line is where they convert the input to a binary, and that should be changed to only happen if |
Yes! |
@nathanl feel free to ping me on IRC, during the conf, or by e-mail any time then! Closing this as mentioned! |
@josevalim and @fishcakez Please see my PR on Erlang/OTP at erlang/otp#1149 |
@nathanl thank you, it looks great. Btw, Erlang indentation rules is a bit weird, you may have to amend your patch for that. They use 4 spaces indentation but every 8 spaces becomes a tab. |
@josevalim @fishcakez My PR on OTP was merged!! 😲 🎊 🎆 ❗ |
@nathanl yay! congratulations! 🎉 |
@nathanl nice work ❤️ |
Closes #5150
As a performance optimization, have these functions use
:file.write/3
,usually with
:raw
mode, instead of:file.write_file/3
. This willcause the BEAM to use the
writev
system call instead ofwrite
.The advantage of this is that, when writing an iolist, the BEAM need not
concatenate the items and produce a single binary for writing. Instead,
it can let the operating system read each item from memory individually;
the final output is produced only in the target file. Skipping the
binary concatenation saves work, saves memory, and creates less garbage
for later collection.
Note that
:raw
mode cannot be used when a character encoding is given.In
:raw
mode, the BEAM does not create a separate process to handlethe file, and (I think) it would normally be this process which performs
the character encoding. So if an encoding like
:utf8
is given, we donot use
:raw
mode and therefore do not usewritev
.