Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let File.write and File.write! take advantage of writev #5154

Closed
wants to merge 6 commits into from
Closed

Let File.write and File.write! take advantage of writev #5154

wants to merge 6 commits into from

Conversation

nathanl
Copy link
Contributor

@nathanl nathanl commented Aug 24, 2016

Closes #5150

As a performance optimization, have these functions use :file.write/3,
usually with :raw mode, instead of :file.write_file/3. This will
cause the BEAM to use the writev system call instead of write.

The advantage of this is that, when writing an iolist, the BEAM need not
concatenate the items and produce a single binary for writing. Instead,
it can let the operating system read each item from memory individually;
the final output is produced only in the target file. Skipping the
binary concatenation saves work, saves memory, and creates less garbage
for later collection.

Note that :raw mode cannot be used when a character encoding is given.
In :raw mode, the BEAM does not create a separate process to handle
the file, and (I think) it would normally be this process which performs
the character encoding. So if an encoding like :utf8 is given, we do
not use :raw mode and therefore do not use writev.

As a performance optimization, have these functions use `:file.write/3`,
usually with `:raw` mode, instead of `:file.write_file/3`. This will
cause the BEAM to use the `writev` system call instead of `write`.

The advantage of this is that, when writing an iolist, the BEAM need not
concatenate the items and produce a single binary for writing. Instead,
it can let the operating system read each item from memory individually;
the final output is produced only in the target file. Skipping the
binary concatenation saves work, saves memory, and creates less garbage
for later collection.

Note that `:raw` mode cannot be used when a character encoding is given.
In `:raw` mode, the BEAM does not create a separate process to handle
the file, and (I think) it would normally be this process which performs
the character encoding. So if an encoding like `:utf8` is given, we do
not use `:raw` mode and therefore do not use `writev`.
@@ -1340,6 +1355,28 @@ defmodule File do
defp normalize_modes([], true), do: [:binary]
defp normalize_modes([], false), do: []

defp normalize_write_modes(modes) do
normalize_modes([:write] ++ modes, true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm setting :binary because, even though I'm not sure it's necessary, it's more like what the code used to do. We previously used :file.write_file/3, and the docs say that "the mode flags binary and write are implicit" when calling that function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# and raw mode is also requested.
# http://erlang.org/doc/man/file.html#open-2
defp set_raw_unless_encoding_specified(modes) do
case specifies_encoding?(modes) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keyword.has_key?(modes, :encoding)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. Docs say "A keyword is a list of two-element tuples" - I didn't know that would work if not all the elements in the list were tuples.

Copy link
Member

@lexmag lexmag Aug 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably List.keymember?/3 will better communicate what data structure we are working with here (even if it's totally the same as Keyword.has_key?/2 technically). :bowtie:

# and raw mode is also requested.
# http://erlang.org/doc/man/file.html#open-2
defp set_raw_unless_encoding_specified(modes) do
case Keyword.has_key?(modes, :encoding) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case looks like an if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated about that. This was one line shorter and seemed readable, but I'm happy to change it if you prefer.

Copy link
Contributor Author

@nathanl nathanl Aug 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

if Keyword.has_key?(modes, :encoding), do: modes, else: [:raw | modes]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the if as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, changing. 😄

@fishcakez
Copy link
Member

I don't understand how this is different to using :file.write_file/3 with :raw mode added?

Note that there is a comment in :file saying :file.write_file/3 should be moved to the file server to be more efficient. Whereas :file.write_file/2 is carried out on the file server using the equivalent of raw mode.

@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

I don't understand how this is different to using :file.write_file/3 with :raw mode added?

Good question. I agree that it shouldn't be different, but it seems to be. If I do this in iex:

:file.write_file("/tmp/tmp.txt", ["foo", "bar"], [:raw])

...and watch it with Evan Miller's dtrace script, I see it using write. But if I do this:

{:ok, file}  = :file.open("/tmp/tmp.txt", [:write, :raw])
:file.write(file, ["foo", "bar"])

... it uses writev. I don't know why this difference exists.

@josevalim
Copy link
Member

josevalim commented Aug 24, 2016

@fishcakez write_file is converting everything into a binary before hand.

That said, @nathanl, we should convert everything to a binary if a encoding was given (i.e. we are not in raw mode).

@fishcakez
Copy link
Member

Ah this is a bug in :file, shouldn't call make_binary/1 if raw file.

@josevalim
Copy link
Member

josevalim commented Aug 24, 2016

Ah this is a bug in :file, shouldn't call make_binary/1 if raw file.

Yup, that's pretty much it. Should we instead fix Erlang/OTP?

@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

That said, @nathanl, we should convert everything to a binary if a encoding was given (i.e. we are not in raw mode).

Because we don't trust Erlang's UTF-8 encoding? I added a test for the conversion of charlists, but noticed that Erlang complained if I gave it a larger codepoint, like 9731 (☃).

@fishcakez
Copy link
Member

Yeah I think we should fix this in Erlang/OTP whatever we do here.

@josevalim
Copy link
Member

Yeah I think we should fix this in Erlang/OTP whatever we do here.

Ok, we can fix it in here meanwhile.

@fishcakez
Copy link
Member

Ok, we can fix it in here meanwhile.

Sounds good.

@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

Yeah I think we should fix this in Erlang/OTP whatever we do here.

That would be great. When we're ready to merge this, I'll make sure that the relevant change is one commit so that it can be easily reverted later if Erlang's bug is fixed.

{:ok, file} ->
case F.write(file, content) do
:ok ->
F.close(file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check :file.write_file and copy the close handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim
Copy link
Member

Because we don't trust Erlang's unicode conversion?

write_file expects IO data. So there is no unicode conversion happening here. I would expect any list with more than > 255 to fail. The reason we convert to binary is because when raw mode is not used, we spawn another process, and sending the binary to the other process will be more performant. The conversion can be done by calling IO.iodata_to_binary.

Also, when not using raw mode, convert data to binary.
@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

Based on @fishcakez feedback, this no longer sets :raw mode automatically. So we do now get the advantage of writev, but only if the user does File.write("/tmp/tmp.txt", [foo, bar, foo], [:raw]).

@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

Once all concerns are addressed, I'll squash this down and revise the commit message. Like I said, I'll make the code changes one commit so that it can easily be reverted once :file.write_file is fixed in Erlang/OTP.

@josevalim
Copy link
Member

@nathanl, at this point, given we have basically reimplemented file:write_file, I believe we should send this fix directly to Erlang/OTP. The reason why I was OK with previous patch was because we were defaulting to raw but @fishcakez convinced us it was a bad idea.

That said, I don't believe we should merge this, sorry. On the positive side, there is no way we could get to this conclusion without your patch and work, so the work here is definitely not being wasted. Let me know if you need help in providing a patch to Erlang/OTP or if you prefer me to do it.

Thank you!

@nathanl
Copy link
Contributor Author

nathanl commented Aug 24, 2016

@josevalim No worries! I think I will give it a try, but I may come back for help, since I've not written any Erlang yet.

Do I understand correctly that this line is where they convert the input to a binary, and that should be changed to only happen if raw is not in the list of modes?

@josevalim
Copy link
Member

Do I understand correctly that this line is where they convert the input to a binary, and that should be changed to only happen if raw is not in the list of modes?

Yes!

@josevalim
Copy link
Member

@nathanl feel free to ping me on IRC, during the conf, or by e-mail any time then! Closing this as mentioned!

@nathanl
Copy link
Contributor Author

nathanl commented Aug 26, 2016

@josevalim and @fishcakez Please see my PR on Erlang/OTP at erlang/otp#1149

@josevalim
Copy link
Member

@nathanl thank you, it looks great. Btw, Erlang indentation rules is a bit weird, you may have to amend your patch for that. They use 4 spaces indentation but every 8 spaces becomes a tab.

@nathanl
Copy link
Contributor Author

nathanl commented Sep 29, 2016

@josevalim @fishcakez My PR on OTP was merged!! 😲 🎊 🎆 ❗

@josevalim
Copy link
Member

@nathanl yay! congratulations! 🎉

@fishcakez
Copy link
Member

@nathanl nice work ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants