Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine and document the distinction between I/O safety and regular safety on unsafe stdlib APIs #434

Closed
Manishearth opened this issue Jul 26, 2023 · 62 comments · Fixed by rust-lang/rust#114780

Comments

@Manishearth
Copy link
Member

Previously: rust-lang/rust#72175, rust-lang/rust#93562

The stdlib has a bunch of unsafe APIs (e.g. FromRawFd) that are primarily unsafe because they care about "I/O safety".

Rust libraries are free to expand the scope of what they consider unsafe, but this is typically a crate-local decision. The stdlib due to its special status risks imposing this on other crates.

Basically, if a crate or project wishes to not consider I/O safety a problem (which is often necessary in more complicated I/O code! Exactly the kind of code that would wish to use these APIs) these APIs are not useful to them: it is currently unclear as to what usages of these APIs are undefined behavior vs a violation of I/O safety.

There is a valid optimization that could be performed here in the future which would be to use niches for -1 fds on Unix (etc), so there is a potential for this API being Actually UB, but that's actually something that can be checked (unlike "is a real owned FD").

It would be nice if we could settle on what is Actually UB in these APIs, and what is "a violation of I/O safety", and document it.

cc @thomcc

@Manishearth
Copy link
Member Author

Manishearth commented Jul 26, 2023

The API that triggered this question is the following:

https://github.com/google/fleetspeak-rs/blob/d031cfc8ee07aa0670e0040d24ab99dd689e4124/crates/fleetspeak/src/lib.rs#L264-L282

(constructing a file from an fd from an env var. Reasonable to do for IPC cases, but a violation of the API requirements, currently.

@bjorn3
Copy link
Member

bjorn3 commented Jul 26, 2023

Violating I/O safety can trigger UB. For example if you have some code that has a userfaultfd open and then some other code violates I/O safety to close the userfaultfd fd (eg by creating a File from it which is then dropped) will unmap the memory backed by the userfaultfd, resulting in UB on the next access.

@ChrisDenton
Copy link

ChrisDenton commented Jul 26, 2023

I think "I/O safety" is a red herring here. It has nothing to do with the reasons that from_raw_fd is unsafe (see above). It has been unsafe a long long time before the term "I/O safety" was coined.

@Manishearth
Copy link
Member Author

@bjorn3 Okay, but isn't that analogous to the /proc/mem/self issue? There are lots of weird things you can do with specific kinds of fds that are UB in Rust already.

@the8472
Copy link
Member

the8472 commented Jul 26, 2023

Not quite, opening /proc/mem/self is reaching outside Rust.
Using from_raw_fd is passing something around inside Rust, into a type that has preconditions or from a type that says "you can use this Fd only for specific operations".
You can make up any RawFd you want, they're inert, like invalid pointers. To actually use them you need to access them, so converting a RawFd to some type that could cause any kinds of effects depending on what that Fd is. It might even be one that's already open and in use for memory-mapped stuff.

@Manishearth
Copy link
Member Author

It might even be one that's already open and in use for memory-mapped stuff.

Right but ... I can do that by opening the memory-mapped file directly with safe File APIs, too. That's kinda what I'm getting at: we do not seem to have a principled approach to this, it seems strange to single this out as potentially creating undefined behavior.

The annoying thing here is not just that this method has preconditions, it has preconditions that cannot be checked.

@the8472
Copy link
Member

the8472 commented Jul 26, 2023

You can hide your memory-mapped files, unlink them (on unix) or make the directory non-traversible if you're paranoid. But again, it's basically reaching out to a system that has insufficient sandboxing. The OS offers you footguns, we can't do anything about that without shutting down all IO. Inside Rust we do care.

@the8472
Copy link
Member

the8472 commented Jul 26, 2023

it has preconditions that cannot be checked.

You kind of can if you're main. No other threads are running yet, only FDs 0,1,2 should be open. So anything else was explicitly passed from the parent and you can take ownership of it. Once you have a raw fd you can probe whether it is valid then use it.
That still leaves the uncheckable condition that nothing else on the system is using it, but that does fall under the "insufficiently paranoid OS" problem.

@thomcc
Copy link
Member

thomcc commented Jul 26, 2023

Yeah, I was kind of opposed to codifying I/O safety as a type of safety for these reasons. It is subtle, as has been discussed here, because there is real UB that can occur if I/O safety is violated.

I agree that it's not great that we have APIs with preconditions that cannot be checked (in practice anyway -- being early main is not an option most of the time).

@Manishearth
Copy link
Member Author

You kind of can if you're main

And then you need to worry about other crates doing stuff; which you can control by not using other crates but it's still an action-at-a-distance thing.

(And yeah, even if you whittle this down to "you only call trusted APIs before doing this", your dependencies are free to link in life-before-main C code)

Yeah, I was kind of opposed to codifying I/O safety as a type of safety for these reasons. It is subtle, as has been discussed here, because there is real UB that can occur if I/O safety is violated.

I think now that this can of worms has been opened we need to either fully specify it or decide it is an additional library invariant that callers are not required to follow; the status quo does not make much sense.

@the8472
Copy link
Member

the8472 commented Jul 26, 2023

your dependencies are free to link in life-before-main C code)

Well sure, but C is never safe. We can't do anything about that. C programs would also break if other threads start mucking around with random file descriptors they don't semantically own.

@thomcc
Copy link
Member

thomcc commented Jul 26, 2023

Yeah, I think I lean towards the "decide it is an additional library invariant that callers are not required to follow", but with a long list of caveats, which definitely do need to be worked though.

I think it's worth CCing @sunfishcode, who was the author of the RFC, to see if they have a more nuanced take here, I suspect that they will.

@ChrisDenton
Copy link

Note that to the extent that "I/O safety" is documented, it's documented here: https://doc.rust-lang.org/std/os/unix/io/index.html

One problem with file_from_env_var is that it allows creating two std::fs::Files from the same fd, with both expecting to have exclusive access (e.g. they close on drop).

@Manishearth
Copy link
Member Author

with both expecting to have exclusive access (e.g. they close on drop).

That's still a library invariant though, not UB

@the8472
Copy link
Member

the8472 commented Jul 26, 2023

AsRawFd is safe, a Mmap type can implement it and it would be doing nothing wrong. Now you have a RawFd. This is analogous to vec.as_mut_ptr(). Both of them are inert. The UB only happens when you pass it to something else that does things with them they shouldn't. That's why that is unsafe.

We could have put the unsafe on the extraction instead. But neither did we do that for raw pointers.

The only difference is that there are some Fds which are harmless to use with some syscalls. But I guess you could do the same with pointers, like calling madvise on them or something.

@sunfishcode
Copy link
Member

sunfishcode commented Jul 27, 2023

Here's how I understand the situation:

I/O safety is not about the type of the fd. I/O safety is only about the lifetime of the fd. (Clarifying edit: by "type" here, I meant the dynamic type of the OS resource, such as file, pipe, socket, etc.)

I/O safety was added as part of a retroactive explanation for why from_raw_fd is unsafe.

/proc/self/mem is equivalent to a computer having a USB robotic arm with a soldering iron that can reach the computer's own motherboard. It can Do Things. And one can reasonably question the wisdom of such a device. But, it does not affect memory safety or I/O safety.

The safety of double mapping is addressed by slice::from_raw_parts's and other functions' safety conditions. You can create double mappings and it's fine. The moment you bring into existence a Rust slice pointing at one of those mappings, you must make a potentially tricky guarantee.

I/O safety hazards can almost always be reduced to memory safety hazards. For example, a double-close is sufficient to disrupt a tricky but otherwise sound mmap + slice::from_raw_parts guarantee in an unrelated part of the program, and cause memory corruption.

Now, looking at the code in question here, it's using lazy_static, which means this isn't running in early main; it runs on the first dereference, so it could be running after lots of arbitrary code. There's nothing in the Rust language preventing arbitrary safe code doing std::env::set_var and putting bogus numbers in those environment variables, at which point FromRawFd::from_raw_fd's documented safety condition would not be met. And if file_from_env_var is safe, nothing prevents other safe code from doing the same thing and claiming ownership of those fds first. So we have two theoretical sources of UB.

Unfortunately, I don't yet know what to recommend for the code in question. Ideally, there should be more ergonomic and safer ways to pass file descriptors between process, though that's not a simple fix.

@thomcc
Copy link
Member

thomcc commented Jul 27, 2023

The cc crate is in a similar position, due to trusting MAKEFLAGS when the parallel feature is enabled (https://github.com/rust-lang/cc-rs/blob/main/src/lib.rs#L1328-L1371). In theory we could require cc::Build::new() to be unsafe, but in practice this would be pretty undesirable.

@Manishearth
Copy link
Member Author

Now, looking at the code in question here, it's using lazy_static, which means this isn't running in early main; it runs on the first dereference, so it could be running after lots of arbitrary code. There's nothing in the Rust language preventing arbitrary safe code doing std::env::set_var and putting bogus numbers in those environment variables, at which point FromRawFd::from_raw_fd's documented safety condition would not be met. And if file_from_env_var is safe, nothing prevents other safe code from doing the same thing and claiming ownership of those fds first. So we have two theoretical sources of UB.

I mean, I'd rather not focus on the code in question too much. There are a bunch of options there, including just using /proc/self/fd/<fd>. It's an example of a valid use case, is all.

I/O safety hazards can almost always be reduced to memory safety hazards

Yeah, but I guess my position is that all of that is outside of Rust's model, much like the "/proc/self/mem is when your computer has a robotic arm with a soldering iron".

It's the same thing with linker safety.

@sunfishcode
Copy link
Member

There are a bunch of options there, including just using /proc/self/fd/<fd>.

/proc/self/fd/<fd> does avoid the set_var problem, however it doesn't avoid the double-ownership problem.

It's an example of a valid use case, is all.

👍

I/O safety hazards can almost always be reduced to memory safety hazards

Yeah, but I guess my position is that all of that is outside of Rust's model, much like the "/proc/self/mem is when your computer has a robotic arm with a soldering iron".

It's the same thing with linker safety.

With /proc/self/mem and linker safety, the boundaries where they go outside Rust's model are clearer. Liink-time intrigue comes from special linker flags, global_asm, non-Rust libraries, or similar things. And /proc/self/mem is considered to be a debugger, and programs probably don't want to cause debuggers to attach to their own processes (even if Linux makes it easier than one might like to do so unwittingly).

But with fd numbers coming in from the outside, if I do something like:

fn foo() {
    // SAFETY: We're supposing we don't need to propagate this `unsafe` to our callers.
    let file = unsafe { File::from_raw_fd(var("THE_FD").unwrap()) };
}

and I happen to call foo twice, then I've just caused a double-close, which potentially causes memory corruption elsewhere in the process, and I don't see what boundary was crossed.

@Manishearth
Copy link
Member Author

however it doesn't avoid the double-ownership problem

Yes, I understand, I was illustrating how the same operation is possible in safe rust anyway without jumping through hoops.

But with fd numbers coming in from the outside, if I do something like:

Yes, and you have the same problem with taking in a filepath argument and someone passing in a special file.

My core point is that there is a line being drawn here between "things that are not rust's problem" and "things that are rust's problem" and I think it is arbitrary and I'm not sure it holds up to scrutiny. My request to the unsafe-code-guidelines group, in part, is to figure out how to make this less arbitrary, and more importantly, useful.

the boundaries where they go outside Rust's model are clearer

I kinda feel like "dealing with weird OS stuff" falls in the same bucket as this list. Or, at least, it does not clearly not fall in that bucket. It's very much similar to "linking to non-Rust crates" in effect in my mind.


This thread started because the standard library is documented as having an extra safety invariant it chooses to uphold, one that is called out as not being a part of the Rust language's memory safety model.

If the standard library chooses to do this, it should document the difference thoroughly in safety invariants.

However, I am being repeatedly informed that no, this is a part of the Rust language's memory safety model. Ok, fine, in that case we need to settle on and document that. Like yes, I very clearly see the sources of unsafety. I'm just not clear on the line being drawn, and that is a problem for people who write or review unsafe code, and also probably needs to be balanced against valid usecases in the IPC/etc world to ensure they're not left with no correct path forward.

I'm asking the group of people in charge of formally specifying this stuff to formally specify this stuff. (Yes, I understand that they can't do this right now and don't mind being patient, but there's a reason I filed the issue on this particular repo)

Also, I'll point out that this is the current safety comment on that API:

The fd passed in must be a valid and open file descriptor.

Here is code that fully satisfies that documentation, and exhibits exactly the memory safety problems that people have been mostly focusing on in this thread:

let fd = rand(); // or get it from `env()`, or from `args()`

unsafe {
   let status = fcntl(fd, F_GETFD); // just gets fd status, always safe
   if status == -1 || errno.is_error() { // invalid fd
     return;
   }
}

let file = File::from_raw_fd(fd);

As far as I can tell, from_raw_fd as used in std is perfectly safe with invalid/closed fds, it's everything else that's the issue!

(this is much like how segmentation faults are the safest thing that can happen to your program; it's better than not getting the segfault and instead having actual UAF/"mis"compiles/etc)

@digama0
Copy link

digama0 commented Jul 27, 2023

I'm asking the group of people in charge of formally specifying this stuff to formally specify this stuff. (Yes, I understand that they can't do this right now and don't mind being patient, but there's a reason I filed the issue on this particular repo)

The reason why it is important to differentiate between a library invariant and language UB in this case is because only the latter is our (t-opsem's) jurisdiction. Having determined that this is a library UB issue, the onus for documenting this goes to t-libs-api, if I understand correctly, and is off-topic for this repo (or at least, off topic for t-opsem, I think we are still discovering what we want this repo to be about).

Here is code that fully satisfies that documentation, and exhibits exactly the memory safety problems that people have been mostly focusing on in this thread:

I assume your complaint is that the documentation says nothing about there not being any other extant Files to the same fd? Your example does not exhibit UB / memory unsafety AFAIK because the only thing you did is create a File object which is just a wrapper around an fd. Presumably you want to have another File with the same fd and write to it from two handles...? Is that even UB / causes data races? You should spell out this example a bit more so that actual memory errors result.

@sunfishcode
Copy link
Member

[the standard library] should document the difference thoroughly in safety invariants.

I agree. You've reported an interesting problem which we hadn't considered before. It's not yet clear what the answer should be. My understanding of all of the ideas suggested so far is that they either don't fully solve the problem, or they introduce new problems. We have more work to do.

But with fd numbers coming in from the outside, if I do something like:

Yes, and you have the same problem with taking in a filepath argument and someone passing in a special file.

The same problem doesn't occur with filenames. With fds, you could get memory corruption from code patterns elsewhere in the program. With filenames, you could have a bug, but it won't cause memory corruption, unless some other boundary gets crossed.

Also, I'll point out that this is the current safety comment on that API:

The fd passed in must be a valid and open file descriptor.

Here is code that fully satisfies that documentation, and exhibits exactly the memory safety problems that people have been mostly focusing on in this thread:

That example would be the I/O safety analog of doing this:

   let addr = rand(); // or whatevs

   unsafe {
       let status = madvise(addr as _, 1024, MADV_NORMAL); // just gets addr status, "always safe"
       if status == -1 || errno.is_error() { // invalid addr
         return;
       }
   }

   let buf = slice::from_raw_parts_mut(addr as *mut u8, 1024);

This may avoid a segfault, but it still invokes UB. In the same way, under I/O safety, using fcntl that way may avoid a subsequent EBADF, but it still invokes UB. The unsafe block around the fcntl is the point at which the code assumed responsibility.

@digama0

Your example does not exhibit UB / memory unsafety AFAIK because the only thing you did is create a File object which is just a wrapper around an fd.

Creating a File object means it'll do a close when dropped. Unix reuses fds after they're closed, so if someone else in the process was using that fd, they could suddenly be doing I/O on the next fd that yet another entirely unrelated piece of code in the process happens to open. If either of those two places was doing mmap in a way that otherwise would have been safe, they can now corrupt memory.

@Manishearth
Copy link
Member Author

With filenames, you could have a bug, but it won't cause memory corruption, unless some other boundary gets crossed.

Given that you can always pass in /proc/self/fd/<fd> (or some other strange device) which is equivalent to what this API does; yes, you can cause memory corruption. It may be "outside the model", but that's exactly what I'm trying to tease apart here.

That example would be the I/O safety analog of doing this: [...] This may avoid a segfault, but it still invokes UB.

Yes, I understand. I explicitly said "exhibits exactly the memory safety problems that people have been mostly focusing on in this thread", clearly I know there is memory unsafety there. What I'm saying is that the documentation is not correct. "valid" fd is a specific preexisting concept, and here we are effectively inventing a new kind of validity around ownership; which we should have proper terminology on (some introduced by your RFC) and use it in the documentation. The current documentation and semantics; the thing this issue is about,


A thing that's a bit frustrating to me here is that people keep explaining the UB potential here to me: Yes, I understand the kinds of things you can do with such APIs. I understand that you can break an mmap implementation, or that you can break the allocator (if it uses mmap). This is not a discussion I am attempting to have at the level where I'm curious what the UB potential is of this API (besides pinning down if we care about niche optimizations and such).

I made the very reasonable assumption by believing the text of the RFC that the standard library has been making a distinction between "safety" and "I/O safety". It's a distinction that makes a lot of sense to attempt to make in the context of the proc/self/mem issue: it means "memory unsafety" can be very clearly bounded as a concept, and "I/O unsafety" becomes a nebulously bounded thing that has a clear line with "memory unsafety" but a less clear line between itself and "safety", which might be an ok state of affairs. This distinction is also one that's incredibly useful for practical applications dealing with I/O. This is the context in which I opened the issue.

I do still think that it's reasonable to try and slice this space that way1. I can see that a lot of people think we should slice it a different way; where "io unsafety" is a subset of memory unsafety. That's fine, that's reasonable as well, though we need to work on that fuzzy boundary between "memory unsafety" and "safety" then (and "io safety" may still work as a useful tool in that work!).

Footnotes

  1. And I'm not the only one: the nix crate, used by and partially maintained by tokio (a practical I/O application), has cropped up a couple times in past discussions around this since it does things like expose a safe dup, and does not seem to want to change that.

@Manishearth
Copy link
Member Author

The reason why it is important to differentiate between a library invariant and language UB in this case is because only the latter is our (t-opsem's) jurisdiction. Having determined that this is a library UB issue, the onus for documenting this goes to t-libs-api, if I understand correctly, and is off-topic for this repo (or at least, off topic for t-opsem, I think we are still discovering what we want this repo to be about).

I do not see how the conclusion of this discussion is that it's a library invariant, perhaps I am using the term differently from you? It is absolutely an invariant upheld by the library, but the reasons it is upholding it seem to be language reasons.

In my view a library invariant is something another library doing the same thing may choose to not have. For example, is it kosher for a different library to expose this same type of API (or, dup, etc) without touching std, and mark it as safe? This was my original assumption when opening the thread based on the what the RFC was saying, and my request was to confirm that that was the case and then I could move forward on figuring out where it can be documented better. (and I cannot confirm that that is the case from just libs, because they don't decide what is and isn't language UB).

But from that discussion it does not seem like that is actually the case; and people consider such APIs to be within Rust's safety model, in which case it becomes an opsem thing because the answer is relevant for people writing and reviewing unsafe code regardless of whether they use the standard library.

I assume your complaint is that the documentation says nothing about there not being any other extant Files to the same fd?

That's not correct either, this code can still be unsound if, say, the allocator was using that fd. It's not just Files.

Your example does not exhibit UB / memory unsafety AFAIK because the only thing you did is create a File object which is just a wrapper around an fd.

one which would be dropped, which would cause close(), which might kill a file descriptor in use elsewhere in the program, the example that people have been focusing on a lot in this thread. Yes, I didn't write both sides of the unsafety since I assumed context from the thread.

But as I said this is already UB because it can mess with the allocator, so you don't need "both sides" anyway.

@ChrisDenton
Copy link

It seems like there's two different things tangled up here. The issues of special files like /proc/self/mem, etc should be properly documented. You can break a lot more than just File with direct memory access or the special abilities of other special files. This is an OS thing which, for pragmatic reasons, we just have to live with and document.

The other thing is single ownership and lifetimes for fds to uphold invariants we rely on. Obviously, given it's easy in Rust, we use this kind of model for a lot of library things. I'm not really clear what this has to do with opsem. Better guidelines around fd use are of course very welcome but this seems very clearly in the realm of being a libs issue.

@Manishearth
Copy link
Member Author

I'm not really clear what this has to do with opsem

Concrete question for opsem: would opsem consider it a violation of Rust's rules for unsafe code for a different libary that has nothing to do with the stdlib exposes marked-safe APIs that perform the same logical operation that from_raw_fd or dup do?

Because the libs team does not maintain all crates that deal with IO, just one (well, two, if you count libc).

Either it is an issue for all crates, in which case I think opsem needs to define what IO skullduggery it considers in scope vs out of scope for safety (since /proc/self/mem is clearly out of scope, and in this case we are arguing that file descripter crimes are in scope). Or it is not an issue for all crates, and then I shall ask the libs team to document what is a library vs language invariant in their API. Until that question is answered here I can't quite ask that question of the libs team.

It does seem from this discussion that a lot of people believe this to be in scope for what is considered unsafe across Rust (not just as a stdlib invariant).

The issues of special files like /proc/self/mem, etc should be properly documented

To be clear, I know that's been a long-standing issue and I don't actually care too much about it being documented, it got brought up as an example of the very fuzzy boundary we're dealing with here.

@sunfishcode
Copy link
Member

It'll take me some time to read and reply here, but I do want to quickly point out that Rust does already have documentation about /proc/self/mem. Suggestions for improving it are welcome.

@the8472
Copy link
Member

the8472 commented Jul 27, 2023

It may be "outside the model", but that's exactly what I'm trying to tease apart here.

Indeed it is, I think this has already been covered in the zulip discussion and it would be great if we could not rehash it again. I'm not sure if there's still any open question regarding that?

I can see that a lot of people think we should slice it a different way; where "io unsafety" is a subset of memory unsafety. That's fine, that's reasonable as well, though we need to work on that fuzzy boundary between "memory unsafety" and "safety" then (and "io safety" may still work as a useful tool in that work!).

Hrrm. I think it's not a clear subset. You can get memory unsafety as a consequence of IO unsafety, sure. But you can also get other misbehaviors like clobbering files that should be in the exclusive ownership of another struct. It's a bit like memory safety, but it's not about pointer-addressable memory, it can also cover other bags of bytes (files in a filesystem).

And you cannot necessarily cause those kinds of corruptions on platforms which use capabilities-based filesystem access. Only the thing that has a handle to a directory should be allowed to access files in that directory tree. Accessing the wrong file descriptors violates that isolation.

For example, is it kosher for a different library to expose this same type of API (or, dup, etc) without touching std, and mark it as safe?

That's basically the old composing-levels-of-safety question, how different features may be safe in isolation but become unsafe if combined and then a decision needs to be made who is responsible. Mucking around with random file descriptors in a process is "fine" (assuming a sandboxed process which only has harmless file descriptors lying around). mmap is "fine" (perhaps with additional assumptions such as using a tempfile or proc not being mounted or whatever) if properly encapsulated inside rust. Once you have both you need to make at least one of them unsafe.

Either it is an issue for all crates, in which case I think opsem needs to define what IO skullduggery it considers in scope vs out of scope for safety (since /proc/self/mem is clearly out of scope, and in this case we are arguing that file descripter crimes are in scope). Or it is not an issue for all crates, and then I shall ask the libs team to document what is a library vs language invariant in their API. Until that question is answered here I can't quite ask that question of the libs team.

I think this is also a platform issue. Unixes (and especially linux) are just bad in this regard due to the dense ID space for file descriptors and the magic proc files. There unsafe can only protect you from accidents, not from deliberate misuse.

If

  • file descriptors were randomly assigned in a 128bit space
  • the process got killed every time you try to access an invalid one
  • the file system could only be accessed through capability objects (like cap-std)
  • enumerating open FDs required elevated privileges or at least a clearly labeled "there be dragons" API

then you'd only be able to violate the library safety conditions of various FD-wrapper-types by using AsRawFd and FromRawFd incorrectly. Assuming close() or write() would take an OwnedFd or BorrowedFd respectively then you need to get those from somewhere. Either you call unsafe fn borrow_raw() (safety contract!) or the methods take a RawFd and then they should be unsafe.
dup() is kinda harmless on its own. If you called it with a random number you ideally would get your process killed. If you call it with a valid one you get a new Fd, but you'd still have to turn that into something that you can do evil things with, which again should require unsafe.

And yes the nix crate currently doesn't comply with that.

@Manishearth
Copy link
Member Author

It'll take me some time to read and reply here, but I do want to quickly point out that Rust does already have documentation about /proc/self/mem. Suggestions for improving it are welcome.

Yep, I've seen that 😄. I think it's a decent amount of documentation defining the boundary between that and "actually safe", but I'm less satisfied about the boundary between that and "considered unsafe".

Indeed it is, I think this has already been covered in the zulip discussion and it would be great if we could not rehash it again. I'm not sure if there's still any open question regarding that?

I do not think that zulip discussion has produced a clear definition of "outside the model", so, no? I'm not trying to rehash it; but I do not think we have a clear distinction here that is useful to people writing and reviewing unsafe code.

(I also don't expect this is something we can hash out in a day, and that's fine, but I don't want to consider that "closed")

Hrrm. I think it's not a clear subset

Sorry, when I said it was a subset I was contrasting with it being non-overlapping. Pretend I said overlapping instead of "subset".

I think this is also a platform issue. Unixes (and especially linux) are just bad in this regard due to the dense ID space for file descriptors and the magic proc files.

Yeah and I'm open to documenting this as a per-platform thing too. I said this already at some point, but RawFds are already a platform thingy so it's fine in my book to tell people working with them that their safety invariants depend on platform. Probably ideal, even, because it would be quite nice to have very strong and clear guarantees on platforms that are more principled about these things (i imagine we can do way better on Fuchsia, for example).

That's basically the old composing-levels-of-safety question, how different features may be safe in isolation but become unsafe if combined and then a decision needs to be made who is responsible. Mucking around with random file descriptors in a process is "fine" (assuming a sandboxed process which only has harmless file descriptors lying around). mmap is "fine" (perhaps with additional assumptions such as using a tempfile or proc not being mounted or whatever) if properly encapsulated inside rust. Once you have both you need to make at least one of them unsafe.

I'm coming at this from the perspective of libraries, since I both author unsafe code in libraries, and, more often, I review unsafe code in libraries. From that perspective, "how you use it" is not something you have an answer to, but it is something you must instruct your users about after marking an API as safe. As noted already the stdlib does not sufficiently do this. But it is also a question whether the pan-language definition of unsafe should be doing this, as choices made by the stdlib io module do not affect third-party io crates that are doing their own io.

(this is the "concrete question for opsem" that I posted above)

@RalfJung
Copy link
Member

RalfJung commented Jul 28, 2023

@Manishearth I think maybe what you are looking for is documentation saying that as long as you don't give out the File to foreign code, no language UB can arise from a violation of the FD ownership discipline? Basically, as long as everyone who can get their hands on a File agrees that this File is actually not exclusively owned, but is really a RawFd in disguise, no UB can happen. That's like passing around an &str that violates UTF-8, but everyone who gets this reference knows that and just avoids calling any functions that would be affected by this. It is only when the &str/File passes to outside code that language UB can arise because that code assumes the library invariant to hold.

I think part of the confusion here is that people think of the "library invariant" as being a pure mathematical statement about the data that is stored at a particular value. That is the wrong image. The Rust type system has ownership powers, and invariants can (and do) express ownership. The File invariant chooses to claim ownership of the FD. This is not a statement about the bits that are stored inside this File at runtime, but in the context of the type system this statement is just as precise (and just as impactful) as str stating that its contents are valid UTF-8.

Basically, library invariants are written in separation logic, not just "pure" propositional logic. (Now I see @digama0 has made the same point while I was reading and writing.)

@Manishearth
Copy link
Member Author

Manishearth commented Jul 28, 2023

will respond in full later but I will highlight that yes, I understand that callers are typically expected to follow library invariants, and I understand how that works for str, but there's something more subtle going on here. I also understand that libraries can turn a lot of things into a "type system problem", it's a pattern I have used countless times.

(highlighting this now because I'm worried that once again the conversation has shifted to a different level and I am having basic stuff explained to me, and I would rather it not veer too far down that path because it represents a misunderstanding of what I am trying to tease apart here. I have already spent some effort trying to clarify this before and am happy to do it again but it may feel like rehashing to others)

@RalfJung
Copy link
Member

RalfJung commented Jul 28, 2023

I understand how that works for str, but there's something more subtle going on here.

Yes, that is true. There is a global decision that has been made by IO safety: that file descriptors are an "owneable resource" (in formal terms: that the separation logic that the Rust type system embeds into has a notion of "ownership of an FD"). This is not a decision that has anything to do with operational semantics or memory safety, it is a decision about the reasoning power of the ambient framework that the type system lives in. Clearly that logic has "ownership of a range of memory" as a concept (in various forms, such as immutable, read/write, read/write/dealloc, ...), and it can be freely extended with "ghost ownership" (e.g. to give meaning to things like GhostCell tokens), and now we also have "ownership of an FD". (At a sufficiently abstract level, FDs are a lot like pointers: where memory maps pointers to bytes, the file descriptor table maps FDs to kernel-managed resources.)

This decision can only be made globally: if any part of the program wants a guarantee that an FD that it freshly created is not messed with by anyone else (i.e., nobody is just calling close(rand()) in a loop), then the entire program must respect a basic FD ownership discipline. That is what IO safety codifies.

A consequence of this decision is that all syscalls that work on FDs have appropriate ownership as a precondition. (Without this, "ownership of an FD" would be meaningless as people could still call dup2(rand(), rand()) in a loop.) These preconditions need to be bubbled up the stack until they are eventually discharged. File discharges them with its library invariant. There are other ways to discharge them, e.g. you can have a pool of "shared FDs" where ownership is just publicly available to anyone who wants it, and then for any FD that is in that pool you are trivially safe to call the relevant syscalls.

Basically there is a global decision Rust has to made: does unsafe code have to be robust against a thread that does loop { dup2(rand(), rand()); }, in the same sense that the unsafe code in HashMap has to be robust against a "bad" Hash impl (i.e., there can be strange illogical behavior, but there cannot be UB)? The IO safety RFC says no, it does not have to be robust -- unsafe code may cause UB in the presence of such a thread.

(highlighting this now because I'm worried that once again the conversation has shifted to a different level and I am having basic stuff explained to me, and I would rather it not veer too far down that path because it represents a misunderstanding of what I am trying to tease apart here. I have already spent some effort trying to clarify this before and am happy to do it again but it may feel like rehashing to others)

I'm sorry if I am rehashing pre-treaded ground. Is there a particular one of your clarifying replies that I should look at in lieu of reading this entire enormous thread?

@RalfJung
Copy link
Member

RalfJung commented Jul 28, 2023

I/O safety is not about the type of the fd. I/O safety is only about the lifetime of the fd.

I disagree with this, it is about ownership of the FD. Calling write on a random FD is not allowed, even though this has no lifetime implications.

EDIT: Ah, this is unfortunately less clear-cut than I had hoped. But still, I/O safety I think should allow users to define a type of "exclusively owned FD" that is guaranteed to not have duplicates, even if OwnedFd is not that type.

@Manishearth
Copy link
Member Author

What do you mean by "cannot be checked"? In Box::from_raw, you also cannot check at runtime that this raw pointer is indeed sufficiently valid, and actually owned by the caller. Properly expressing this precondition requires a separation logic where I can prove ownership of memory, and use that to justify the from_raw call. With such a logic I can also prove ownership of an FD and use that to justify turning a raw FD into an owned one.

There's a very major difference between this API and Box::from_raw(): Box::from_raw() is not supposed to interact with pointers not created by the corresponding Rust API (and it is rare to see code doing this), whereas the primary purpose of this API is to work with fds that come from the kernel (or wherever), and it is exclusively what I have seen this API get used for. And there are a bunch of weird APIs in this space, including dup and fsync , where ownership gets murky. They can be reasonably documented: I'm not saying a model isn't feasible, just that it's currently not documented in a useful way.

(As mentoned before I don't care about a formal definition here: I just want a useful one)

"borrowed" fd is also a tricky concept; what does it mean for an fd you plan to use being already borrowed, and what does it mean for you to borrow an fd? In general I think the relationship can be specified with basically the "lifetime" of close().

Note that some FromRawFd implementations borrow, and are only useful when borrowing (the stdio ones, mostly). That's another thing that needs documenting.

And that's another can of worms: you can't even say that it's just about "it's fine to borrow an fd as long as you know it will be closed after you borrow it" because that's perfectly compatible with a type that opens random fds and uses fcntl to ensure that they are still around before calling any operations1. Even though that might break the allocator and other things using mmap.

So it's really something like

  • it is fine to use such APIs to create owning objects if you are responsible for closing that fd ("an owned fd")
  • it is fine to use such APIs to create borrowing objects2 if you can be sure the fd will not be closed until after you are done with it ("a borrowed fd")
  • It is not fine to use such APIs for any fds that are already mapped into memory, either via Rust usage of mmap or the allocator
  • It is not fine to map fds that have been passed into such APIs into memory unless you can be sure that the IO object is not used or destroyed during the lifetime of the memory map
  • dup creates equivalence classes of fds for the purposes of this set of rules

At the end of the day, I need to be able to look at a usage of this API and convince myself that it is sound, especially when it involves another syscall.

This set of rules is checkable! It basically disallows the "get an fd from the environment" pattern in general purpose code, but that's fine, because it's clear about it, and with some more clarity (I expand on this below) on what kinds of violations cause what could still enable such patterns to be written with a clear conscience when needed without impacting the crate environment.

I keep bringing up /proc/self/mem because the "you borrowed an fd that was actually memory-mapped" to me feels like something that ought to be "outside the model" in the same way; where we clearly document what can go wrong but ultimately don't really expect the ecosystem to consistently attempt to avoid it.

The rules above seem quite nice when talking about ownership and very weird when talking about memory mappings; it would be nice if the actual rules were just the ownership ones and the memory mapping stuff was Bonus Rules. Given that in general people are of the opinion here that "a different crate choosing to allow safe cratename::File creation from fd is also fine", it makes sense for the stdlib to be clearer about which things are these Bonus Rules that we would really like people to follow but if your codebase has made the global choice not to care, you can still use the std::fd APIs in a more relaxed way.

I'm sorry if I am rehashing pre-treaded ground. Is there a particular one of your clarifying replies that I should look at in lieu of reading this entire enormous thread?

Nah, I'm not worried about reexplaining my stance to you; that was more just a message to the other people in this thread that I may end up rehashing a bit when explaining things to you. I'm not really sure where I can find individual comments explaining the relevant bits of my stance without myself doing a bunch of digging.

Hold on a sec, caller are generally required to follow library invariants! Creating a non-UTF-8 str can cause UB later. I am not sure what you are talking about here.
...
How would you categorize the &str invariant? Breaking it can lead to UB and consequently to arbitrary memory safety violations.

Generally, yes, callers are required to follow library invariants. However in many specific cases, like you note for str, we provide leeway to the users so that they understand how the API can cause UB and can still avoid it themselves.

We have specified that the str safety invariant is one that can be broken "as long as you don't call these APIs on it", which is quite useful and workable.

I think maybe what you are looking for is documentation saying that as long as you don't give out the File to foreign code, no language UB can arise from a violation of the FD ownership discipline?

Yes, in part, but "FD ownership discipline" also needs to be sketched out in the docs.

As I mentioned I'd also somewhat want language that notes that other libraries and applications may choose to make the call differently. Ideally, it is written in such a way that libraries using std may also choose to violate this invariant; so it would be kosher to violate this invariant in your crate with from_raw_fd() provided you are clear that your crate does so. I can understand that the stdlib would be reluctant to say this, but I will repeat that one of the biggest IO crates out there — tokio — pulls in the nix dependency that by choice lets you violate this invariant in safe code (It has a safe close()!); so most people doing IO stuff already will have the potential for this behavior in safe code in their dep tree.

Also ideally a guarantee about what niche optimizations are present, if any. Really, just a list of what is absolutely an invalid fd for these APIs; saying "negative numbers are invalid on linux and windows" would be totally fine and easily checked; and leaves plenty room for niches in the future.

Basically there is a global decision Rust has to made: does unsafe code have to be robust against a thread that does loop { dup2(rand(), rand()); }, in the same sense that the unsafe code in HashMap has to be robust against a "bad" Hash impl (i.e., there can be strange illogical behavior, but there cannot be UB)? The IO safety RFC says no, it does not have to be robust -- unsafe code may cause UB in the presence of such a thread.

Yeah, that's part of what I'm getting at here; I was trying to figure out if this was a global decision made for the ecosystem at large, or a stdlib-specific thing.

(There might be some things I have missed but hopefully this gives a clear idea of where I'm coming from)

Footnotes

  1. This does still allow for the fd to get closed and reused in the meantime, but there are other such patterns where you could potentially guarantee it isn't.

  2. Objects that are conceptually a borrow of an fd ("do not call close"), like Stdin. They may still be owned types in Rust.

@RalfJung
Copy link
Member

RalfJung commented Jul 29, 2023

Given that in general people are of the opinion here that "a different crate choosing to allow safe cratename::File creation from fd is also fine"

I have to push back here a little, or I might be misunderstanding. The following two things are not sound:

  • an API that takes an arbitrary integer and calls close on it
  • an API that takes an arbitrary integer and calls read/write on it (or any other FD-based API, really)

The first one is a direct violation of what std's OwnedFd aims to provide. The 2nd one does not violate any std type, but we want to enable user-defined types to be defined that would be violated by this.

I was trying to figure out if this was a global decision made for the ecosystem at large, or a stdlib-specific thing.

There is an ecosystem-wide decision involved.

I will repeat that one of the biggest IO crates out there — tokio — pulls in the nix dependency that by choice lets you violate this invariant in safe code (It has a safe close()!); so most people doing IO stuff already will have the potential for this behavior in safe code in their dep tree.

That nix API can fundamentally not soundly coexist with any part of the program that uses I/O safety to justify its correctness. For any part of the program to use ownership-based reasoning all parts of the program need to respect the ownership discipline.


Regarding the broader questions... I am honestly not sure how to best answer them. The short, "low-entropy" answer is in terms of program logics (specifically separation logic) and their notion of ownership; it is probably not very useful to people that don't already have intuition for these things. The long answer tries to use example to convey the same intuition but I don't know which examples are best suited to build that intuition. Honestly I think a real-time meeting would be much more effective here than a long text thread.

The short summary is: we already have an idea of "memory that is exclusively owned". When thinking about Rust programs, not only can we make statements of the form "this location in memory has that value", we can also make statements of the form "this location in memory has that value and nobody else is making any kind of assertion about this location" -- we can say that we own some location in memory. To make statements of this form possible, a new obligation is imposed on all code: you may only access memory that you own! If there is a chance that someone else is claiming ownership of a piece of memory, then you may not access it. (The full version distinguishes between read and write accesses, I am simplifying.) Rust then provides types that conveniently wrap this kind of reasoning so it can all be checked by the compiler.

I/O safety adds two new kinds of statements that can be made:

  • This file descriptor is not closed, and nobody else can even attempt to close it. Let's call this "weak ownership" of an FD.
  • This file descriptor refers to an underlying object in a certain state, and nobody else can perform any operation on that object. (I'm not sure what to call the things file descriptors refer to. I mean the thing that is shared between all the dups that refer to the same... thing.) Let's call this "strong ownership" of an FD.

This also creates new obligations that all code must respect:

  • You may not call close on an FD if anyone else might (weakly or strongly) own it.
  • You may not call any operation on an FD if anyone else might strongly own it.

This is the underlying reason why the two kinds of APIs I listed at the top of this post are unsound. Note that this notion of ownership is completely a compile-time fiction. It has nothing to do with whether fcntl tells you that an FD exists or not.

So far this is all just establishing the terminology and discipline that we use to reason about programs. In a next step, we can then explain the meaning of Rust types in terms of that terminology: OwnedFd expresses that we have weak ownership of an FD. BorrowedFd<'a> expresses that we have borrowed weak ownership of an FD for lifetime 'a.

When reviewing code that works on FD, fundamentally one needs to ensure that each call to an FD-consuming operation is performed with a caller that has ownership of that FD -- borrowed ownership is sufficient for non-destructive operations, full ownership is required to close the FD. This ownership will usually be carried in by Rust types that have such ownership as their invariant, such as OwnedFd or BorrowedFd. Further, dup and its variants (including sendmsg with an FD) weaken ownership: even if previously you had strong ownership, you generally end up with weak ownership afterwards (which also means they cannot be called on borrowed strong ownership, since borrowing requires you to preserve the original level of ownership).

A library can totally decide that it doesn't care much to protect the FDs it creates from actions being taken on them by other parts of the program -- but it must still respect the fact that other parts of the program might want to protect their FDs. Nix is currently failing to do that (Cc nix-rust/nix#1750).


The situation of an FD being passed in via an env var is an interesting one. In an ideal world, where the environment is immutable, this is how I'd think about it: whoever spawns this program has an obligation to actually set up some FD at the number indicated by the env var. (Weak) Ownership of that FD is then passed to the program and made available in a globally shared way -- this means everyone can read or write the FD but nobody is allowed to close it. Effectively its type is BorrowedFd<'static>. This is all it takes to justify calling from_raw_fd, and can explain why cc::Build::new can be safe. (Interestingly we don't seem to have BorrowedFd::from_raw_fd. That is unfortunate. I assume cc::Build::new still takes care not to close that FD.)

In the real world, set_var exists, and that kind of breaks everything. But that's kind of unsound anyway so I am inclined to blame set_var rather than cc::Build::new.

One potentially problematic consequence of this entire setup is that if whoever spawns this program sets the env var to an FD that doesn't actually exist, that causes library UB, which can balloon into language UB (namely, some part of the code could rely on an FD being private, but sadly it got exactly the FD that the env var was set to, and some other part of the code calls cc::Build::new and causes some interactions with this FD that break the assumptions made by the first component).

TL;DR if we want a guarantee that cc::Build::new is sound for arbitrary values of the env vars set up by whoeever spawned the process, then we can't have any code rely on FD ownership for soundness. At this point I/O safety becomes similar to PartialEq -- there is an intended meaning to it, but it you don't get to use it to justify unsafe operations.

@the8472
Copy link
Member

the8472 commented Jul 29, 2023

(I'm not sure what to call the things file descriptors refer to. I mean the thing that is shared between all the dups that refer to the same... thing.)

It's complicated. The immediate level below the file descriptor is the file description (this gets shared by calling dup). But there are additional things like the inode (this gets shared by opening a file twice) and the underlying blocks (if it's a block-device backed filesystem; this would get violated if you directly write to a mounted block device). Unintentionally sharing any of those objects can lead to unsoundness.
So we can probably lump them together and call them "kernel objects" or "system resources backing the file descriptor" or something like that?

@Manishearth
Copy link
Member Author

There is an ecosystem-wide decision involved.

Well, yes, it's an ecosystem-wide decision; I more meant that was the decision made in a way that constrains the entire ecosystem in how they can use such operations, or just stdlib implementations.

(I don't think everyone in this thread falls neatly on one side of this)

That nix API can fundamentally not soundly coexist with any part of the program that uses I/O safety to justify its correctness. For any part of the program to use ownership-based reasoning all parts of the program need to respect the ownership discipline.

This hits at the heart of my contention: if the people doing the most IO in the ecosystem don't feel like they can work with this, perhaps we've decided wrong?

The following two things are not sound:

  • an API that takes an arbitrary integer and calls close on it
  • an API that takes an arbitrary integer and calls read/write on it (or any other FD-based API, really)

The first one is a direct violation of what std's OwnedFd aims to provide. The 2nd one does not violate any std type, but we want to enable user-defined types to be defined that would be violated by this.

One of the questions being discussed here is does the stdlib get to do that and invent new operations that are UB for everyone (not just people directly interfacing with the stdlib). You can kinda make the same point about the rules of UB around allocation ("the existence of Box means that it is UB to call free() on random pointers) but that feels definitely more like a language level thing.

(and yes, before everyone re-re-explains to me how you can cause memory unsafety with these APIs regardless of the stdlib; that goes back into my point about "can we declare this outside of what Rust's model attempts to handle", much like /proc shenanigans.)

The answer I'm getting here is that "yes, the stdlib does get to do that", but this was why I brought it up in the UCG repo first.

we want to enable user-defined types to be defined that would be violated by this

It's unclear to me how useful this is when weighed against the necessities of heavy IO-using crates. It seems like the target audience from the ecosystem POV has chosen differently, which is a strong signal to me.

I'm not sure what to call the things file descriptors refer to

"file descriptor equivalence class" is my mental model. But as @the8472 mentions there's a lot of nuance as to what an fd references. From the pov of "FD ownership" treating them as an equivalence class and talking about whose responsibility it is to close it is sufficient IMO.

@digama0
Copy link

digama0 commented Jul 29, 2023

This hits at the heart of my contention: if the people doing the most IO in the ecosystem don't feel like they can work with this, perhaps we've decided wrong?

It's unclear to me how useful this is when weighed against the necessities of heavy IO-using crates. It seems like the target audience from the ecosystem POV has chosen differently, which is a strong signal to me.

Is this actually the case? The only example that has been brought up so far of such a library is nix, and they seem to be working on it, based on the issue Ralf linked. I didn't see anything in there that suggests that they are rejecting the premise of IO safety, it's just a breaking change for them and takes some time to implement and roll out.

Besides, if you are a "heavy IO-using crate" you always have an out: just declare these functions as unsafe and punt the safety condition to the caller. This whole business is only an issue if you want your API to be safe and also play well with other safe libraries (in particular std but also any other crates adopting IO safety as a concept).

One of the questions being discussed here is does the stdlib get to do that and invent new operations that are UB for everyone (not just people directly interfacing with the stdlib).

I wouldn't really say that "std invented a new operation that is UB for everyone", this was an RFC and not restricted in scope to std - the RFC indicates that it requires rolling out across the whole ecosystem.

@saethlin
Copy link
Member

I'm very wary of one line of reasoning, which can be re-casted as this:

Besides, if you are a "heavy memory-using crate" you always have an out: just declare these functions as unsafe and punt the safety condition to the caller. This whole business is only an issue if you want your API to be safe and also play well with other safe libraries (in particular std but also any other crates adopting memory safety as a concept).

The whole point of Rust is to be able to craft safe APIs on top of unsafe components. People can write a lot of programs in Rust using a lot of libraries and never have to write unsafe to get the job done. If IO safety is upending that and requiring that safety be punted up the stack instead of safely encapsulated, that should be quite concerning.

@ChrisDenton
Copy link

ChrisDenton commented Jul 29, 2023

I/O safety was in part an attempt to actually specify the ownership semantics we already had. E.g. why from_raw_fd is unsafe, what assumptions you can make about File, etc. It doesn't really upend anything, except perhaps by the RFC being more specific and explicit.

I guess you could say our model was always wrong. A File is a global resource which can never truly be "owned" by anyone other than the OS. I think this would result in a worse experience for devs though. As with memory, we need the fiction that we have some control over it.

@digama0
Copy link

digama0 commented Jul 29, 2023

I'm very wary of one line of reasoning, which can be re-casted as this:

Besides, if you are a "heavy memory-using crate" you always have an out: just declare these functions as unsafe and punt the safety condition to the caller. This whole business is only an issue if you want your API to be safe and also play well with other safe libraries (in particular std but also any other crates adopting memory safety as a concept).

Even with that rewrite, I think the analogy still holds: a classic example of a "heavy memory-using crate" which uses unsafe to get the job done is an allocator.

Of course, we should strive to minimize the number of cases where unsafe is needed. The IO safety RFC essentially puts forth the hypothesis that it is more common for code to want IO safety than to want to violate IO safety. (Note that this is a falsifiable prediction about the ecosystem: if it turns out not to be true then we might need to reconsider the whole thing.) Either way there will be some unsafe code, but I think it is defensible to say that close(rand()) is a very fishy thing to do and you need to give some justification for why this doesn't just break the application.

@thomcc
Copy link
Member

thomcc commented Jul 29, 2023

The situation of an FD being passed in via an env var is an interesting one. In an ideal world, where the environment is immutable, this is how I'd think about it: whoever spawns this program has an obligation to actually set up some FD at the number indicated by the env var. (Weak) Ownership of that FD is then passed to the program and made available in a globally shared way -- this means everyone can read or write the FD but nobody is allowed to close it. Effectively its type is BorrowedFd<'static>. This is all it takes to justify calling from_raw_fd, and can explain why cc::Build::new can be safe. (Interestingly we don't seem to have BorrowedFd::from_raw_fd. That is unfortunate. I assume cc::Build::new still takes care not to close that FD.)

Yes, we never close the file descriptor. More generally, I like this formulation, but yeah:

In the real world, set_var exists, and that kind of breaks everything. But that's kind of unsound anyway so I am inclined to blame set_var rather than cc::Build::new.

This is true, and it's hard to imagine set_var, even once we make it unsafe (or more likely, deprecate it in favor of an unsafe version), having a precondition that "also you can't set vars in a way that would cause IO unsafety elsewhere in the program" -- it's far to vague.

One potentially problematic consequence of this entire setup is that if whoever spawns this program sets the env var to an FD that doesn't actually exist, that causes library UB, which can balloon into language UB (namely, some part of the code could rely on an FD being private, but sadly it got exactly the FD that the env var was set to, and some other part of the code calls cc::Build::new and causes some interactions with this FD that break the assumptions made by the first component).

This doesn't concern me because the person spawning the program has an arbitrary number of ways they can cause UB when spawning the program.

@CAD97
Copy link

CAD97 commented Jul 30, 2023

This is true, and it's hard to imagine set_var, even once we make it unsafe (or more likely, deprecate it in favor of an unsafe version), having a precondition that "also you can't set vars in a way that would cause IO unsafety elsewhere in the program" -- it's far to vague.

I don't think it needs to be that specific, though. Instead, being more general gets the point across perfectly well; something along the lines of "the environment acts as an open set of global variables in a flat namespace; setting variables used elsewhere in the code can lead to arbitrary misbehavior of that code, potentially even including unsound behavior if that code makes unsafe assumptions which no longer hold." It's not just about IO safety, but also e.g. var("…")?.parse().unwrap_unchecked(), as unwise as that would be to trust.

In a way, w.r.t. safety, setting environment variables is no different from choosing a random bytes in the global variable section of memory and writing to them. The one way that it's actually different is that there's a greater expectation of trustable isolation between standard globals than there is of the Map<String, String> environment variables.

Any scheme which passes fds via envvar is putting a global proof burden that no code abuses the relevant envvar names, as the compiler doesn't offer any assistance in isolation there. It's equivalent to name mangling and name collisions of #[no_mangle], and that's at least partially outside the Rust model currently.

@RalfJung
Copy link
Member

@Manishearth

Well, yes, it's an ecosystem-wide decision; I more meant that was the decision made in a way that constrains the entire ecosystem in how they can use such operations, or just stdlib implementations.

Yes, it is intended to constrain the ecosystem to not do loop { close(rand()); }.

One of the questions being discussed here is does the stdlib get to do that and invent new operations that are UB for everyone (not just people directly interfacing with the stdlib).

Note that the RFC didn't introduce any new language UB. The operational semantics is unaffected. Whether a concrete program has UB or not is unaffected.

It introduced a new reasoning principle, and it changed/clarified which APIs are sound. It introduced new terminology that libraries can use to define whether something is or is not library UB. (That's probably what you meant but I feel precise terminology can help us tease out the core questions here.)

That probably means this decision should have involved T-lang, not just T-libs-api. And yes if T-lang were to do this arbitrarily, that would be a backwards compatibility hazard. Was this particular RFC a breaking change? I don't know. Was loop { close(rand()); } sound before the RFC? The ownership semantics were not clarified, but from_raw_fd was unsafe, so arguably the intent was there to make this loop unsound -- it just wasn't spelled out properly, and the RFC fixed that.

This hits at the heart of my contention: if the people doing the most IO in the ecosystem don't feel like they can work with this, perhaps we've decided wrong?

Do they feel that? I'd like to learn more about that.

nix predates the RFC, and the nix issue nix-rust/nix#1750 so far looks to me like it is totally possible to make nix IO-safe. I don't see nix as having actively decided against IO safety, they just didn't have OwnedFd when the crate was created.

But yes if loop { close(rand()); } must truly be considered sound, then I/O safety needs to be fundamentally revised.

This is true, and it's hard to imagine set_var, even once we make it unsafe (or more likely, deprecate it in favor of an unsafe version), having a precondition that "also you can't set vars in a way that would cause IO unsafety elsewhere in the program" -- it's far to vague.

It's not great, yeah. We could say that, as @CAD97 suggests, changing any variable requires knowing all invariants associated with that variable and proving they are being maintained -- but then code still has to argue why it can know that a given variable doesn't have an invariant associated with it in another part of the program. It's not great.

We could also give up on what I called "strong ownership" of an FD, and say that read/write/etc are free game on arbitrary FDs, and only close/dup2 are restricted. Basically it would be sound to convert any integer to a BorrowedFd<'static>.

@Manishearth
Copy link
Member Author

That probably means this decision should have involved T-lang, not just T-libs-api

yeah I think that's part of the source of my confusion

nix predates the RFC, and the nix issue nix-rust/nix#1750 so far looks to me like it is totally possible to make nix IO-safe. I don't see nix as having actively decided against IO safety, they just didn't have OwnedFd when the crate was created.

While digging for this i did seem to find discussions amongst the nix maintainers that seemed to indicate otherwise but I can't find them now, I could just have misread.

I do think it is worth talking to the tokio and other async library maintainers about this though. I don't think I saw much from them on that RFC.

@RalfJung
Copy link
Member

If major I/O stakeholders feel like I/O safety (even the weaker form that is just about not closing other file descriptors) gets in their way, that would absolutely be a good reason to re-evaluate this RFC, yes. I would expect that to take the form of a new RFC that describes the usecases blocked by the I/O safety RFC and suggests to abolish the concept and describes how that impacts standard library APIs.

Meanwhile I don't see a ton for us on the UCG side to do here. I/O safety is clearly an intended reasoning principle right now, and some hints of it go back all the way to Rust 1.0 (with from_raw_fd being unsafe). The questions here are not on the operational side (so this is outside of t-opsem's realm).

Is this a documentation issue? I don't see any mention of I/O safety in the standard library docs, so that seems like something that should definitely be improved.

@RalfJung
Copy link
Member

rust-lang/rust#114780 should fix the documentation issue.

@Manishearth
Copy link
Member Author

Yes, it's mostly a documentation issue.

@sunfishcode
Copy link
Member

I also made a list of things I plan to improve about the documentation, based on the discussion in this thread.

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2023
add more explicit I/O safety documentation

Fixes rust-lang/unsafe-code-guidelines#434
Cc rust-lang#114167
Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
@RalfJung
Copy link
Member

The most pressing docs issue should hopefully be fixed by rust-lang/rust#114780, though if @sunfishcode gets around to improving them further based on their list of course that would be even better. :) Still, the issue here can be closed now I think.

@RalfJung
Copy link
Member

For the question of how cc::Build::new can be safe, I have opened rust-lang/rust#116059.

github-actions bot pushed a commit to rust-lang/miri that referenced this issue Sep 23, 2023
add more explicit I/O safety documentation

Fixes rust-lang/unsafe-code-guidelines#434
Cc rust-lang/rust#114167
Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
add more explicit I/O safety documentation

Fixes rust-lang/unsafe-code-guidelines#434
Cc rust-lang/rust#114167
Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
add more explicit I/O safety documentation

Fixes rust-lang/unsafe-code-guidelines#434
Cc rust-lang/rust#114167
Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants