Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document FromRawFd unsafety #72175

Closed
mahkoh opened this issue May 13, 2020 · 18 comments
Closed

Document FromRawFd unsafety #72175

mahkoh opened this issue May 13, 2020 · 18 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one.

Comments

@mahkoh
Copy link
Contributor

mahkoh commented May 13, 2020

Currently the documentation of FromRawFd uses vague language:

This function is also unsafe as the primitives currently returned have the contract that they are the sole owner of the file descriptor they are wrapping. Usage of this function could accidentally allow violating this contract which can cause memory unsafety in code that relies on it being true.

What kind of memory safety issues is this quote referring to?

Using the nix crate it is not hard to write FromRawFd using only safe code:

use nix::{
    fcntl::{open, OFlag},
    sys::stat::Mode,
    unistd::dup2,
};
use std::{
    fs::{self, File},
    io::Read,
    os::unix::io::AsRawFd,
};

fn main() {
    fs::write("a", "a").unwrap();
    fs::write("b", "b").unwrap();

    let mut a = File::open("a").unwrap();
    let b = open("b", OFlag::O_RDONLY, Mode::empty()).unwrap();

    dup2(b, a.as_raw_fd()).unwrap();

    let mut a_contents = String::new();
    a.read_to_string(&mut a_contents).unwrap();

    assert!(a_contents == "b");
}

This is the simplest way but you can also use close on the returned file descriptor and then call File::open again to create two Files with the same underlying file descriptor.

What about performing the various fcntl operation on the value returned by as_raw_fd? Can these also cause memory unsafety? Must all methods operating on file descriptors in nix be marked unsafe because as_raw_fd is a safe function?

Given the current state of the AsRawFd and the existence of other safe crates which allow manipulating the returned file descriptor, there should be very good reasons to keep from_raw_fd unsafe and these should be documented. Otherwise from_raw_fd should be made safe with a clear warning that odd (but not unsafe) things might happen if the assumptions about ownership are violated.

@sfackler
Copy link
Member

Otherwise from_raw_fd should be made safe

That would be a breaking change.

@mahkoh
Copy link
Contributor Author

mahkoh commented May 14, 2020

You're right.

Maybe the correct way would be to add a new trait FromRawFdSafe and a blanket impl of FromRawFd for T: FromRawFdSafe? Or maybe a new safe default function from_raw_fd_safe in FromRawFd? Both ideas seem extremely ugly.

Assuming that dup2 should not be unsafe, I don't think there is currently a correct way to implement an actually unsafe FromRawFd. You can always create a new thread that sprays the file descriptors namespace with dup2 calls so no code that works with file descriptors can assume anything about the state of file descriptors.

If we assume this then the second idea (a new safe default function from_raw_fd_safe which forwards to from_raw_fd) seems like the better solution.

@Elinvynia Elinvynia added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools labels May 20, 2020
@sfackler
Copy link
Member

#74699 would provide a more concrete source of UB if FromRawFd is misused.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jul 24, 2020

I don't think there is any UB here. The constructor is guarded with an assert. But it does split the issue in two parts:

  • UB for fds that are not owned by the object or manipulated via fcntl, dup2, etc.
  • UB for integers that were never valid file descriptors

@notriddle
Copy link
Contributor

dup2 does the same check that #74699 does.

According to the Single Unix Specification dup2 page:

[EBADF]
The fildes argument is not a valid open file descriptor or the argument fildes2 is negative or greater than or equal to {OPEN_MAX}.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jul 24, 2020

dup2 can be used to replace the meaning of the integer stored in an object implementing FromRawFd without using FromRawFd directly. E.g. let file = File::open(), dup2(xyz, file.as_raw_fd()).

In any case, I maintain that there is no unsafety and that there never will be. The value of the standard unix functions dup2, fcntl, etc. being safe to use is much higher than the value of being able to do some obscure optimization based on sole ownership. If there even is such an optimization.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jul 24, 2020

FromRawFd was made unsafe in 2705051. The justification was providing a safe version of mmap. Since nobody has been able to do this since, the justification seems to have been misguided.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jul 24, 2020

It's actually more ridiculous than that. The referenced issue rust-lang/rfcs#1043 contains an elementary mistake:

fn main() {
    let f = File::create(...);
    let f2 = File::from_raw_fd(f.as_raw_fd());
    let m = MemoryMap::new(f).unwrap();
    drop(f2);
    // Use m and segfault...
}

Since mmap performs the equivalent of dup, there would not have been a segfault in the first place. Furthermore, the use of file-backed mmap can never be safe because the file (and thus the mapped memory) can be changed by other processed at will.

@MikailBag
Copy link
Contributor

I think the problem is: it is possible to break Rust program by external means.
I don't think marking from_raw_fd as unsafe solves this problem: if rust program can break itself using some syscalls, then some other process can execute the same syscalls on behalf of our program (e.g. using ptrace) and the program will break in the exact same way.
I think this should be guarded against either with a clause like "If OS changes program address space in a weird way, its behavior is undefined", or maybe with a very clever definition of "data races" which treats such a manipulations as write access.

@sunfishcode
Copy link
Member

My understanding of the meaning of from_raw_fd being unsafe comes from these comments from @RalfJung. I was unsure at the time, but have come to see how it makes sense.

I agree with the assessment above about mmap, however there are other problems. A file descriptor is essentially a pointer into a different memory. File descriptors can dangle, leak, alias, be shared between threads, and be forged, just like regular pointers. "Dereferencing" a file descriptor can race with other dereferences, corrupt I/O, or cause data loss.

And, misuse of file descriptors can break encapsulation. A library in Rust can have non-pub data which the rest of the program can't directly access. However, if that data is loaded from a file or a socket, the existence of a safe File::from_raw_fd would mean that other code in the program could accidentally intercept that data. POSIX aggressively recycles file descriptors, so if a program has a dangling file descriptor somewhere, there's a decent chance it'll alias a newly allocated file descriptor somewhere else.

It's an arbitrary choice, because it's not about UB of the Rust code itself, but it feels within the spirit of Rust's unsafe. So I agree it'd be good to revise the documentation for from_raw_fd, and suggest the following wording:

This function is also unsafe as the primitives currently returned have the contract that they are the sole owner of the file descriptor they are wrapping. Usage of this function could accidentally allow violating this contract which can cause I/O corruption or broken encapsulation in code that relies on it being true.

(This ignores that as of #76969, RawFd implements FromRawFd and doesn't own the file descriptor, but that's a separate topic.)

@RalfJung
Copy link
Member

RalfJung commented Apr 8, 2021

Using the nix crate it is not hard to write FromRawFd using only safe code:

At the time that comment was written, nix also had a safe fork. Luckily, this was fixed since then. But based on this I'd be cautious to give much weight to the argument that "if nix has a safe way to do this, it's probably okay" -- there's also a high chance that nix should make more operations unsafe.

Specifically, dup2 can be used to mess with file descriptors you do not own, and thus should be unsafe.

It's an arbitrary choice, because it's not about UB of the Rust code itself, but it feels within the spirit of Rust's unsafe.

Yeah, that is also what I said back then and I still think that way. If you want to resolve this officially, we probably need a T-libs and/or T-lang FCP: ultimately the question is if we want to consider file descriptors as "a resource that can be owned and exclusively controlled as part of a type's safety invariant" (if yes, we need to make from_raw_fd unsafe), or if we consider file descriptors as "just integers and no library may attach any special invariants to them or consider them exclusively owned".

@sunfishcode
Copy link
Member

sunfishcode commented Apr 9, 2021

If you want to resolve this officially, we probably need a T-libs and/or T-lang FCP

Thanks for the suggestion; I may be interested in doing this. I wrote up a draft of an RFC here:

https://github.com/sunfishcode/rfcs/blob/main/text/0000-rawfd-ownership.md

As a caveat, I haven't written a Rust RFC before, and I'd appreciate any feedback!

Edit: The URL to the draft changed; see below.

@RalfJung
Copy link
Member

Looks like a good start!

RawFd, RawHandle, and RawSocket represent resources that can be owned

The point is that the "raw" versions are not owned, right? Creating them is safe. Turning them into something that can actually access a file/socket/... is unsafe.

A new entry in the "Behavior considered undefined" section of The Rust Reference stating that functions which assume ownership or rely in the validity of RawFd, RawHandle, and RawSocket values are unsafe.

I don't think this quite fits. That page documents aspects of the operational semantics of Rust. Your RFC does not touch the operational semantics, it only affects the logic that one can use to reason about type invariants in Rust. Using the terminology of safety and validity, the UB page is all about validity invariants, but your RFC is about safety invariants.

Another way to think about this is that we do not want to add new UB to Rust for this, that seems wrong. We just want to provide the option of claiming ownership of a file descriptor. These are two wholly unrelated concepts.

We should definitely document this somewhere in the reference, but I do not think this is the right page.

@sunfishcode
Copy link
Member

The point is that the "raw" versions are not owned, right? Creating them is safe. Turning them into something that can actually access a file/socket/... is unsafe.

Thanks! Thinking about this more, I've tentatively concluded that ownership is not the right concept for this RFC. The really important thing is to avoid accidental aliasing. There are some interesting questions about ownership here, but I think we can treat those as a separate issue along with the discussion of how #76969 interacts with ownership.

Another way to think about this is that we do not want to add new UB to Rust for this, that seems wrong. We just want to provide the option of claiming ownership of a file descriptor. These are two wholly unrelated concepts.

Makes sense. I updated the doc to work in terms of the standard library's description of the unsafe keyword, though I'm not sure that's the right place either yet.

With both these changes, I've now substantially revised the RFC in terms of what I'm calling "I/O safety", and it now has a new name:

https://github.com/sunfishcode/rfcs/blob/main/text/0000-io-safety.md

@RalfJung
Copy link
Member

Without having read it in full detail yet, that looks great to me. :) I think the details of the wording are best hashed out during the RFC process.

I don't understand the part in the "Say that handles can be owned" section though. That example is not in contradiction with saying that handles can be owned, just like the existence of raw pointers is not in contradiction with saying that memory can be owned.

To the contrary: the way that the stdlib achieves the property of "I/O safety" is through ownership of file handles. If one were to set out and formally prove I/O safety, ownership of file handles would be a necessary part of the story -- just like when we set out to formally prove memory safety, ownership of memory is necessarily part of the story, despite the fact that the Rust ownership and borrowing system does not even have a notion of "memory".

@sunfishcode
Copy link
Member

My idea there was to say that this RFC itself doesn't define ownership of resource handles, however it doesn't preclude it either. To hopefully reduce confusion, I've now removed that section and added some ideas to the Unresolved questions section.

@sunfishcode
Copy link
Member

Based on feedback in rust-lang/socket2#218, I've now updated the proposal to add a std::io::OwnsRaw trait, which simplifies use cases with AsRawFd/IntoRawFd arguments. I've also revised the motivation section to emphasize encapsulation.

@sunfishcode
Copy link
Member

Since the last comment here, OwnsRaw is now replaced by a new set of types and traits (OwnedFd, BorrowedFd, AsFd) which explicitly model ownership and borrowing.

#93562 removes the language quoted above about memory unsafety and adds new language describing what from_raw_fd means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one.
Projects
None yet
Development

No branches or pull requests

7 participants