Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct usage note on OpenOptions::append() #120781

Merged
merged 1 commit into from Apr 16, 2024
Merged

Conversation

hniksic
Copy link
Contributor

@hniksic hniksic commented Feb 8, 2024

This PR aims to correct the following usage note in OpenOptions::append(), which currently contains misleading information:

One maybe obvious note when using append-mode: make sure that all data that belongs together is written to the file in one operation. This can be done by concatenating strings before passing them to write(), or using a buffered writer (with a buffer of adequate size), and calling flush() when the message is complete.

The above is misleading because, despite appearances, neither concatenating data before passing it to write(), nor delaying writes using BufWriter, ensures atomicity. File::write(), as well as the underlying write(2) system call, makes no guarantees that the data passed to it will be written out in full. It is allowed to write out only a part of the data, and has a return value that tells you how much it has written, at which point it has already returned and modified the file with partial data. Given this limitation, the only way to ensure atomicity of appends is through external locking.

Attempting to ensure atomicity by issuing data in a single write() is a footgun often stumbled upon by beginners, which shouldn't be advertised in the docs. The worst thing about the footgun is that it appears to work at first, only failing when the string becomes sufficiently large, or when some internal properties of the output file descriptor change (e.g. it is switched from regular file to a special file that talks to a socket or TTY), making it accept smaller writes. Additionally, the suggestion to use BufWriter skims over the issue of buffer sizes, as well as the fact that BufWriter::flush() contains a loop that can happily issue multiple writes. This loop is completely opaque to the caller, so you can't even assert atomicity after-the-fact.

The PR makes the following changes:

  • removes the paragraph that suggests concatenating strings to pass them to write() for atomicity or using BufWriter
  • adds a paragraph explaining why attempting to use write() to append atomically is not a good idea.

@rustbot
Copy link
Collaborator

rustbot commented Feb 8, 2024

r? @m-ou-se

rustbot has assigned @m-ou-se.
They will have a look at your PR within the next two weeks and either review your PR or
reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 8, 2024
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented Feb 8, 2024

I think this is now overcompensating, going too far in the other direction.

It is still necessary to coalesce buffers into a single write syscall to have a chance of it being atomic. Certainly, whether it'll end a full or a short write will be platform- and filedescriptor-type-dependent. But for example regular, disk-backed filesystems make an effort to not perform short writes.

For example if multiple threads are logging to a single file then tearing may be tolerable in edge-cases but it's preferable that it doesn't happen. So it'll still be important for the logger to perform a single write() call per log statement.

@hniksic
Copy link
Contributor Author

hniksic commented Feb 8, 2024

It is still necessary to coalesce buffers into a single write syscall to have a chance of it being atomic. Certainly, whether it'll end a full or a short write will be platform- and filedescriptor-type-dependent. But for example regular, disk-backed filesystems make an effort to not perform short writes.

The problem is that there is absolutely no documented guarantee for this, nor for how large a write is "short".

Maybe OpenOption::append() should simply refrain from discussing this topic - would that be acceptable as a compromise?

@the8472
Copy link
Member

the8472 commented Feb 8, 2024

The problem is that there is absolutely no documented guarantee for this, nor for how large a write is "short".

At least on linux length of the write primarily matters for pipes. For disk-backed filesystems most of them will (afaik) even take a gigabyte-sized write buffer and try to make that append atomic. Which might be good enough for some uses (logging) but not for others (database journal).

Maybe OpenOption::append() should simply refrain from discussing this topic - would that be acceptable as a compromise?

I don't think that'd help anyone.

Instead we should describe the properties. That it's only atomic for a single write and that whether writes get cut short will be platform and file-dependent. And that for some uses coalescing multiple writes into one can be "good enough" and that for more stringent requirements locking or careful study of platform-specific documentation would be necessary.

@m-ou-se
Copy link
Member

m-ou-se commented Feb 15, 2024

Instead of the "less useful than it appears" wording, how about something like this?

"However, this does not necessarily guarantee that data appended by different processes or threads does not interleave. The amount of data accepted a single write() call depends on the operating system and file system."

@m-ou-se m-ou-se added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Feb 15, 2024
@hniksic
Copy link
Contributor Author

hniksic commented Feb 15, 2024

@m-ou-se Thanks for the suggestion. I've now toned down the paragraph by:

  • replacing the "less useful" remark with the text suggested by @m-ou-se
  • removing the last sentence (mentioning that only single-byte data can be sent without locking) which wasn't adding much value

I can edit it further as needed. @the8472 please let me know if this looks better. I'm not sure if you insist that we explicitly state that interleaving is "good enough" for some cases - I'd expect the user to understand that anyway. The current wording tries to explain the properties impartially, hopefully fulfilling the spirit of your suggestion.

Comment on lines 983 to 985
/// successful `write()` is allowed to write only part of the given data, so even if
/// you're careful to provide the whole message in a single call to `write()`, there
/// is no guarantee that it will written out in full.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still steer users towards coalescing their writes (as the previous version did), even if it's not sufficient for some use-cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now restored a mention of coalescing of writes.

@bors
Copy link
Contributor

bors commented Apr 15, 2024

☔ The latest upstream changes (presumably #123945) made this pull request unmergeable. Please resolve the merge conflicts.

@the8472 the8472 assigned the8472 and m-ou-se and unassigned m-ou-se and the8472 Apr 15, 2024
Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. This looks good now. (Other than that one extra `.)

Can you rebase it and squash your commits?

library/std/src/fs.rs Outdated Show resolved Hide resolved
@hniksic
Copy link
Contributor Author

hniksic commented Apr 16, 2024

@m-ou-se I'll do so shortly, thanks for the review.

Avoid implying that concatenating data before passing it to `write()` (with
or without `BufWriter`) ensures atomicity.
@hniksic
Copy link
Contributor Author

hniksic commented Apr 16, 2024

I've now rebased and squashed the commits, and fixed a typo ("will written out" -> "will be written out").

@m-ou-se
Copy link
Member

m-ou-se commented Apr 16, 2024

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Apr 16, 2024

📌 Commit d8745f9 has been approved by m-ou-se

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 16, 2024
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 16, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 16, 2024
…llaumeGomez

Rollup of 14 pull requests

Successful merges:

 - rust-lang#120781 (Correct usage note on OpenOptions::append())
 - rust-lang#121694 (sess: stabilize `-Zrelro-level` as `-Crelro-level`)
 - rust-lang#122521 (doc(bootstrap): add top-level doc-comment to utils/tarball.rs)
 - rust-lang#123491 (Fix ICE in `eval_body_using_ecx`)
 - rust-lang#123574 (rustdoc: rename `issue-\d+.rs` tests to have meaningful names (part 6))
 - rust-lang#123687 (Update ar_archive_writer to 0.2.0)
 - rust-lang#123721 (Various visionOS fixes)
 - rust-lang#123797 (Better graphviz output for SCCs and NLL constraints)
 - rust-lang#123990 (Make `suggest_deref_closure_return` more idiomatic/easier to understand)
 - rust-lang#123995 (Make `thir_tree` and `thir_flat` into hooks)
 - rust-lang#123998 (Opaque types have no namespace)
 - rust-lang#124001 (Fix docs for unstable_features lint.)
 - rust-lang#124006 (Move size assertions for `mir::syntax` types into the same file)
 - rust-lang#124011 (rustdoc: update the module-level docs of `rustdoc::clean`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 17d62bc into rust-lang:master Apr 16, 2024
12 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 16, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Apr 16, 2024
Rollup merge of rust-lang#120781 - hniksic:master, r=m-ou-se

Correct usage note on OpenOptions::append()

This PR aims to correct the following usage note in `OpenOptions::append()`, which currently contains misleading information:

> One maybe obvious note when using append-mode: make sure that all data that belongs together is written to the file in one operation. This can be done by concatenating strings before passing them to [write()](https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.write), or using a buffered writer (with a buffer of adequate size), and calling [flush()](https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush) when the message is complete.

The above is misleading because, despite appearances, neither concatenating data before passing it to `write()`, nor delaying writes using `BufWriter`, ensures atomicity. `File::write()`, as well as the underlying `write(2)` system call, makes no guarantees that the data passed to it will be written out in full. It is allowed to write out only a part of the data, and has a return value that tells you how much it has written, at which point it has already returned and modified the file with partial data. Given this limitation, the only way to ensure atomicity of appends is through external locking.

Attempting to ensure atomicity by issuing data in a single `write()` is a footgun often stumbled upon by beginners, which shouldn't be advertised in the docs. The worst thing about the footgun is that it *appears* to work at first, only failing when the string becomes sufficiently large, or when some internal properties of the output file descriptor change (e.g. it is switched from regular file to a special file that talks to a socket or TTY), making it accept smaller writes. Additionally, the suggestion to use `BufWriter` skims over the issue of buffer sizes, as well as the fact that `BufWriter::flush()` contains a *loop* that can happily issue multiple writes. This loop is completely opaque to the caller, so you can't even assert atomicity after-the-fact.

The PR makes the following changes:

* removes the paragraph that suggests concatenating strings to pass them to `write()` for atomicity or using `BufWriter`
* adds a paragraph explaining why attempting to use `write()` to append atomically is not a good idea.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants