Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe Extern Blocks #3484

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

Conversation

Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Sep 7, 2023

Rendered


Continuation of #3439, very sorry for the mix up, but when I went to make #3477 I deleted my rfcs repo fork and re-forked. I had forgotten that there was an open PR at the time.

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 7, 2023
Co-authored-by: Jacob Lifshay <programmerjake@gmail.com>
@kpreid
Copy link

kpreid commented Sep 14, 2023

Perhaps non-unsafe extern blocks should still be specified as potentially allowed — in case, in the future, there is some target ABI where (some) declarations can be statically verified to be sound at compile or link time? (For example, something like WebAssembly but which had its own, Rust-compatible, labeling of function safety, and checked function signature compatibility, so it is not possible to cause UB by mis-calling a function, nor to call an unsafe function by declaring it as safe.) That is, instead of saying “the grammar now requires unsafe”, say “unsafe is (still) syntactically optional, but semantically required in all current cases”.

This would minimize churn in the language definition if any such targets appear in the future. (But maybe that's an insignificant concern since support for previous editions is still required.)

@Lokathor
Copy link
Contributor Author

a future RFC can always relax required keywords in certain situations, it wouldn't even be an edition-break issue at that point.

but let's save that for a future RFC when we actually have a clear and specific fully working case.

@thomcc
Copy link
Member

thomcc commented Sep 15, 2023

One concern about making the functions inside the unsafe extern safe to call by default, is that it may make it easy to accidentally make some functions incorrectly safe when updating to a new edition.

For example, I could imagine people updating their projects, solving issues one by one, and resolving this issue by just changing an extern block to unsafe extern, failing to realize that they've (likely incorrectly) made all the function declarations in the block safe to use.

(Keep in mind that not every project uses cargo fix, especially if there are not a large number of problems)

@programmerjake
Copy link
Member

One concern about making the functions inside the unsafe extern safe to call by default, is that it may make it easy to accidentally make some functions incorrectly safe when updating to a new edition.

maybe have a weak keyword safe:

unsafe extern "C" {
    safe fn sqrtf(v: f32) -> f32;
    unsafe fn printf(s: *const c_char, ...);
}

@nikomatsakis
Copy link
Contributor

I'm frustrated by this RFC because the overall direction is obviously right to me, but the overloading of unsafe feels awfully subtle overall. I feel like we keep doubling down on the original decision to have unsafe play two roles (unsafe and trusted) and I am concerned that it is getting more and more confusing. I would feel much better if we proposed to move to a trusted keyword -- which is obviously a distinct RFC -- in conjunction with this.

@nikomatsakis
Copy link
Contributor

One specific concern I have:

  • What is the proportion of external functions that can correctly be safe?

i.e., we have to weigh the risk of people accidentally writing unsafe extern "C" versus how often they actually want those semantics. It's possible to use proc macros to derive around this.

@Lokathor
Copy link
Contributor Author

Lokathor commented Oct 4, 2023

Hey I'm all for a trusted keyword but gosh I don't think you can sell it at this point.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 11, 2023

The T-lang team met to discuss this RFC today.

We bikeshed'ed a bit and probably covered the same ground that was already covered in this comment thread above (e.g. the suggestion above of a weak safe keyword).

In the end, we have the following proposal for how you might adjust this RFC to meet some broader goals that we identified.

Proposal:

(In this text, let the term "unadorned extern" mean "an extern that is not prefixed by unsafe", i.e. not unsafe extern { ... })

  • On all Rust editions, you can now write unsafe extern { ... }
  • On ≤2021 editions, we lint against unadorned extern { ... }, and emit a suggestion that one write unsafe extern { ... } instead; in ≥2024 editions, unsafe extern { ... } is required.
  • On all Rust editions, unsafe extern { ... } is extended to allow writing safe and unsafe in front of any static or fn item. (This is introducing a new contextual keyword: safe; the T-lang members present at the meeting supported this in part because we think such a keyword may have utility in other contexts in the future, and this is a natural place to employ it.)
    • We also suggest that on all Rust editions, the grammar of unadorned extern { ... } be extended to allow writing safe and unsafe (so that we can unify the parsing code for unsafe extern { ...} and unadorned extern { ... }), but it is nonetheless an error in the static semantics to have a safe or unsafe item within an unadorned extern { ... } block.
  • An item declared safe within an unsafe extern block is allowed to be used from safe code without an unsafe block surrounding the use of that item.
  • An item with declared unsafe within an unsafe extern block is only allowed to be used within an unsafe block (i.e. this re-implements the current Rust semantics, but now allowing for someone to include the redundant declaration so that it looks like other unsafe fn that occur outside of extern blocks).
  • An item without a safe nor unsafe marker within an extern block is implicitly unsafe (i.e. this is the current Rust 2015 semantics).

Here are the various motivations/assumptions that led us to the proposal above:

  • We agreed that unsafe extern { ... } itself is motivated. There is a proof obligation to check that the signature you use for foreign items actually matches the foreign items that end up being pulled in at link time.
  • We were concerned about people accidentally making items in the extern block safe to call by leaving out some kind of opt-in marking: this motivates the contextual safe keyword.
  • Once we had had the safe keyword as the way to opt into safety, the unsafe keyword on each unsafe-to-use item was no longer strictly necessary.
    • We agreed that it was useful to allow it, so that e.g. people can write code where it is apparent at the line of the declaration site that this is an unsafe fn. (But we want to encourage people to migrate to unsafe extern { ... }, even on old editions, so use of the new safe/unsafe declarations requires unsafe extern and is not legal within an unadorned extern.)
  • Allowing the unsafe keyword to be omitted from each item means that the migration path to 2024 edition is now truly trivial, even for developers migrating by hand: just replace extern { ... } with unsafe extern { ... }. Any other change beyond that is a matter of personal style.
  • We believe the proposal above gives us room to be stricter in future editions, after we have more experience with the feature.
    • For example, we might start requiring unsafe fn rather than just fn for an unsafe-to-call function in an unsafe extern { ... } block.

@traviscross
Copy link
Contributor

traviscross commented Mar 25, 2024

@rustbot labels +I-lang-nominated +A-edition-2024

We met back on 2023-10-11 to discuss this RFC, and in that meeting, we hammered out a consensus for how to move forward on this as articulated by pnkfelix above. We've just been waiting for the RFC to be updated according to that consensus, and it now has been (with one caveat that I've noted in review -- that caveat has now been addressed).

Let's nominate to discuss potentially moving forward on this for Rust 2024.

@rustbot rustbot added A-edition-2024 Area: The 2024 edition I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. labels Mar 25, 2024
@scottmcm
Copy link
Member

Updates here look good! I this this is a good path forward and this version addresses the previous worries about changing something to be the opposite over an edition boundary.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 27, 2024

Team member @scottmcm has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Mar 27, 2024
@RalfJung
Copy link
Member

RalfJung commented May 7, 2024

These two are the alternatives I see can make sense. Requiring unsafe extern { unsafe fn } is not.

This is explained in the newly added sections of the RFC. The safety obligation can be split into two parts:

  1. Ensuring that the signature is right, so that data passed in on one side actually arrives in the intended way on the other side. This also includes attributes such as nonnull noalias noundef that are derived from using types such as &mut i32 in the signature.
  2. Ensuring that the function, when called with the given inputs, is actually safe.

In unsafe extern { unsafe fn }, the first unsafe is discharging obligation (1). However, obligation (2) remains, which is why the function mus still be declared unsafe.

The status quo is that obligation (1) is also attached to the call sites of the function. This doesn't make a ton of sense, since the correctness of the signature should be checked at the place where the signature is declared.

Furthermore, unfortunately the signature has an effect on all calls of this function everywhere: if you make an argument NonZeroI32, then if any call of this function ever passes in a 0, this is UB -- even if that call used a different declaration from a different extern block, even if that call is actually not even in Rust but in C. That's what rust-lang/rust#46188 is about. Here's another example. So this is also made part of obligation (1).

@GoldsteinE
Copy link

@traviscross Thanks for updating the text! I think it now details all the alternatives proposed before the update.

One possible implementation (pointed out by @kennytm above) that plays nicely with extern types and addresses the LLVM bug) would be “only require unsafe before extern if the block contains function or static declarations”. It could probably be revisited if/when the extern types are accepted though, for now it only affects empty blocks.

I think I remembered some prior art: cxx.rs does unsafe extern blocks:
https://cxx.rs/extern-c++.html

Their rules are:

  1. Only unsafe-to-call functions can be declared in an extern {} block (they require explicit unsafe).
  2. safe-to-call functions can be declared in an unsafe extern {} block (they don’t require a keyword).

Unfortunately, LLVM bug prevents these rules from being adopted, despited them being IMHO somewhat neater.

@RalfJung

In unsafe extern { unsafe fn }, the first unsafe is discharging obligation (1). However, obligation (2) remains, which is why the function mus still be declared unsafe.

Sorry, now I’m confused. I think per RFC text, unsafe extern { unsafe fn } creates an unsafe-to-call function, so obligation (2) is not discharged? E.g.

unsafe extern "C" {
    // The obligation isn't discharged here.
    unsafe fn strlen(str: *const c_char) -> usize;
}
// It's discharged here, where we know the inputs:
unsafe { strlen(1 as _) } // and we still can do it incorrectly

Anyway, RFC doesn’t actually require the inner unsafe, so the (more logical in my opinion)

unsafe extern "C" {
    /* no obligation discarded here */ fn strlen(str: *const c_char) -> usize;
}

is also possible.

@RalfJung
Copy link
Member

RalfJung commented May 7, 2024

Sorry, now I’m confused. I think per RFC text, unsafe extern { unsafe fn } creates an unsafe-to-call function, so obligation (2) is not discharged? E.g.

Yes, that's what I said. The unsafe before the fn exists to indicate the existence of obligation 2; it is then discharged at the call site.

In contrast, the unsafe before the extern exists to indicate the discharging of obligation 1.

@nikic
Copy link

nikic commented May 7, 2024

However, there are a class of usages for extern "C" that immediately cause undefined behavior, without ever actually calling the external function.
For example, see rust-lang/rust#46188

Note that this is somewhat unclear. It can certainly cause surprising UB as currently implemented in LLVM. This affects both clang and rustc. However, whether this is actually allowed by the C standard or a bug in LLVM is not clear.

It's my understanding that this is, at least primarily, a rustc bug, not an LLVM bug. rustc is responsible for how it wants to handle attributes when there are multiple declarations/definitions of the same symbol. IIRC clang does handle this correctly in relevant cases by e.g. not emitting attributes like noalias on declarations. (Within one module, LLVM has no concept of multiple declarations/definitions for one symbol, this is fully a frontend concern. LLVM's own interpretation would only come into play during module linking.)

So, there's another possible alternative here: declare that incorrect declarations are fine, and only the attributes of the declaration actually "used" at any given call apply to that call. Then, I think, there's no proof obligation associated with an extern fn declaration. However, I don't know whether this can be done with LLVM -- IIRC, @nikic said it is not currently clear how one would implement this. @bjorn3 said that it can't currently be implemented in cranelift.

Only adding attributes to calls and not declarations is perfectly fine as far as LLVM is concerned, I don't see any implementation problems from that side. The only possible downside that comes to mind is that if a call gets devirtualized, we might have attributes with more information on the declaration rather than the (originally virtual) call. I doubt this is a significant problem though.

(I'm pretty sure that we already discussed this in some details somewhere, but I can't find it.)

@RalfJung
Copy link
Member

RalfJung commented May 7, 2024

(I'm pretty sure that we already discussed this in some details somewhere, but I can't find it.)

Yeah, same here. Thanks for repeating your answer!

That seems like there is a path forward for all our backends that avoids rust-lang/rust#46188: do not add attributes to declarations ever, only at call sites. My inclination would be that we should try that before considering #46188 motivation for a language change, i.e. the RFC should be updated to stand on its own feet, rather than relying on #46188.

It seems like many people think that the separation of the extern fn obligation into two steps as described above is still valuable, which means the RFC would still go ahead. But given how much the justification changed, maybe it's worth re-assuring ourselves that everyone in the lang team had all the context here (not sure how much the stated lang team consensus was relying on #46188).

Even if the C standard allows for what LLVM is doing, we could still
conceivably fix LLVM.  In the text, let's draw this out a bit more
finely.

(Thanks to RalfJ for raising this point.)
When people migrate to the new edition, if they have turned up the
severity of the `unsafe_code` lint and they have `extern` blocks that
need to be marked `unsafe`, they will see this lint as intended.
Let's make a note of that.

(Thanks to kennytm for raising this point.)
@kennytm
Copy link
Member

kennytm commented May 7, 2024

@RalfJung #3484 (comment)

Thanks for the explanation. If Rust had "checked safety" it would be very worthwhile to distinguish between "unsafety by matching function signature" vs "unsafety of the function itself".

I'd argue that the forward-declared items themselves should already be taught as being unsafe in that you have to match the function signature with the actual library. Otherwise simply forcing the unsafe prefix on extern is not going to explain to the user what are being checked. The -sys crates upgrading to 2024 are just going to slap unsafe everywhere "because cargo fix says so".

So I'm still -0 on the double unsafe since it's still cringe to see that when taking the lang design theorist glasses off.

  • unsafe fn/static — +1
  • safe fn/static — +1
  • /*safe*/ fn/static — -1
  • unsafe extern — -0

This RFC had used as one motivation that undefined behavior can
currently result simply from having an incorrect `extern` block even
if the items within are not used from Rust code.

This motivation was not cited in the lang team's 2023-10-11 consensus
for how and why to proceed with this RFC, and it's possible that we
may be able to resolve this issue in other ways, so let's remove that
motivation from this RFC.

(Thanks to RalfJ for raising this point and suggesting this.)
The 2023-10-11 consensus from lang on how to move forward included
that the `safe` and `unsafe` keywords would be optional within an
`unsafe extern` block.

The reference-level section correctly followed this consensus, but the
guide-level section did not, and suggested that annotating items with
these keywords was required.  Let's fix that.
@RalfJung
Copy link
Member

RalfJung commented May 7, 2024

I'd argue that the forward-declared items themselves should already be taught as being unsafe in that you have to match the function signature with the actual library.

That's exactly what this RFC does: making the declaration unsafe. It's the unsafe extern you are complaining about.

The unsafe fn is saying that this is unsafe to call; it doesn't discharge an obligation but creates one. It would be very strange for one unsafe to take on both roles, declaring and discharging an obligation. (We used to do this with the unsafe fn, where it also made the body of the function be an unsafe block, but we're slowly changing that as it makes it much harder to systematically explain unsafety.)

@traviscross
Copy link
Contributor

On this thread, there had been a lot of confusion by reasonable people about whether items within an unsafe extern block were required to be annotated with safe or unsafe or whether that was optional. I was confused why there was such confusion.

I think I understand that now. The reference-level section correctly followed the 2023-10-11 lang team consensus on this point (that these are optional), but the guide-level section had language that suggested that these were required. Somehow we seem to have missed this contradiction.

I've fixed this now (these annotations are optional, and if not present, an item is assumed to be unsafe).

@traviscross
Copy link
Contributor

As @RalfJung suggested, I've now removed the rust-lang/rust#46188 issue as a motivation (and have added a section noting this history).

@traviscross
Copy link
Contributor

@rustbot labels +I-lang-nominated

While it's a bit unusual, let's nominate to confirm:

  1. Our key motivation for this change is to make clear that the obligation for proving the correctness of the signatures within an extern block falls on the author of the extern block, not on the caller (or, in general, the user) of an item within an extern block.
  2. We're happy with this motivation even if Rust could be fixed such that writing an extern block with incorrect signatures would not potentially cause the resulting program to exhibit undefined behavior even if those items were never used by Rust code.

@rustbot rustbot added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label May 7, 2024
@kennytm
Copy link
Member

kennytm commented May 7, 2024

@RalfJung #3484 (comment)

// in the bureaucratic ideal world
extern {
    trusted(extern_signature) unsafe(deref_pointer) fn fopen(pathname: *const c_str, mode: *const c_str) -> *mut FILE;
    trusted(extern_signature) safe fn malloc(size: usize) -> *mut c_void;
}

// but because you can't provide the reason directly (and it's annoying anyway) it becomes
extern {
    trusted unsafe fn fopen(pathname: *const c_str, mode: *const c_str);
    trusted safe fn malloc(size: usize) -> *mut c_void;
}

// but because we conflate trusted and unsafe it becomes
extern {
    unsafe unsafe fn fopen(pathname: *const c_str, mode: *const c_str);
    unsafe safe fn malloc(size: usize) -> *mut c_void;
}

// but because `unsafe unsafe` and `unsafe safe` is even sillier it becomes
unsafe extern {
    unsafe fn fopen(pathname: *const c_str, mode: *const c_str);
    safe fn malloc(size: usize) -> *mut c_void;
}

what I'm saying is that the extern can already absorb that trusted(extern_signature)/unsafe to get

extern { // implies `trusted(extern_signature)` and will emit `unsafe_code` lint if it contains any `fn`/`static` item
    unsafe fn fopen(pathname: *const c_str, mode: *const c_str);
    safe fn malloc(size: usize) -> *mut c_void;
}

@Nemo157
Copy link
Member

Nemo157 commented May 7, 2024

what I'm saying is that the extern can already absorb that trusted(extern_signature)/unsafe to get

There is prior art to adding unsafe to something that is already "known to be unsafe" and linted by the unsafe_code lint: #3325.

Having a single keyword to look for makes it easier for reviewers to know when they need to have something more thoroughly reviewed. On a multi-maintainer project that does not deny(unsafe_code) because they do use it in some locations, it may be that some of the reviewers are not aware that extern { safe fn foo(); } can cause UB, whereas if they see unsafe extern { safe fn foo(); } they'll know they need to learn what it means/find someone else to review it.

@RalfJung
Copy link
Member

RalfJung commented May 7, 2024

but because unsafe unsafe and unsafe safe is even sillier it becomes

I don't agree with that framing.
I would say: because trusted(extern_signature) must appear on every single item, we can optimize the bureaucratic ideal world to reduce redundancy:

trusted(extern_signature) extern {
    unsafe(deref_pointer) fn fopen(pathname: *const c_str, mode: *const c_str) -> *mut FILE;
    safe fn malloc(size: usize) -> *mut c_void;
}

Then we do the "remove the reasons" and "conflate trusted and unsafe" transformations and voila -- we arrived at the RFC proposal.

You proposal removes every trace of trusted(extern_signature), which is not how we treat unsafe anywhere else in the language. For instance, one could argue that get_mut_unchecked already clearly indicates that something unsafe is going on, so surely we can absorb the trusted(index_in_bounds) into the call? But no, that is emphatically not how Rust works.

@tmccombs
Copy link

tmccombs commented May 7, 2024

So in the 2024 edition from what I can tell, you wouldn't be able to use extern without unsafe. So this is efectively changing the extern keyword to a new unsafe extern keyword that, unusually, contains whitespace.

one could argue that get_mut_unchecked already clearly indicates that something unsafe is going on, so surely we can absorb the trusted(index_in_bounds) into the call?

There is a pretty big difference between a language keyword, and a substring of a function name.

@kennytm
Copy link
Member

kennytm commented May 7, 2024

@tmccombs I'm pretty sure extern crate and extern fn (function definitions) are unaffected. unsafe extern are still 2 keywords.

@programmerjake
Copy link
Member

programmerjake commented May 7, 2024

I think unsafe should only be required on extern if it contains any fn/static declarations (or any other declarations that link to symbols in an unchecked manner, if we get any more kinds).
I know I used:

#[doc = include_str!("../README.md")]
extern {} // there's no reason this should be unsafe

and I expect extern type declarations to also not need unsafe.

@kennytm
Copy link
Member

kennytm commented May 7, 2024

@RalfJung #3484 (comment)

You proposal removes every trace of trusted(extern_signature),

I'm saying extern { implied trusted(extern_signature)

which is not how we treat unsafe anywhere else in the language.

#![warn(unsafe_code)]
std::arch::global_asm! { "" } 
// ^ using this macro is unsafe even though it does not need an `unsafe` block

For instance, one could argue that get_mut_unchecked already clearly indicates that something unsafe is going on, so surely we can absorb the trusted(index_in_bounds) into the call? But no, that is emphatically not how Rust works.

Strawman. For get_mut_unchecked I may want to propagate the unsafe effect rather than discharging it. There is no such option for an extern block.

Additionally, existing (≤2021) extern {} block did not require unsafe, unlike get_mut_unchecked here which always required unsafe {}. Migration matters.

@RalfJung
Copy link
Member

RalfJung commented May 8, 2024

I'm saying extern { implied trusted(extern_signature)

I understand. And I'm saying that's bad as one now has to know a list of things in Rust that implicitly introduce unsafety, as opposed to the simple rule that grep unsafe is sufficient.

I don't think my get_mut_unchecked example is a strawman. For me, that feels like the same thing -- implicit vs explicit unsafety. You seem to make a qualitative difference there, but to me the difference is just quantitative.

std::arch::global_asm! { "" }

That should be rectified, like we did with unsafe attributes.

No reason to repeat past mistakes.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in the lang triage meeting today, and we confirmed that:

  1. Our key motivation for this change is to make clear that the obligation for proving the correctness of the signatures within an extern block falls on the author of the extern block, not on the caller (or, in general, the user) of an item within an extern block.
  2. We're happy with this motivation even if Rust could be fixed such that writing an extern block with incorrect signatures would not potentially cause the resulting program to exhibit undefined behavior even if those items were never used by Rust code.

@rustbot rustbot removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-edition-2024 Area: The 2024 edition disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
Status: Needs help: Design
Development

Successfully merging this pull request may close these issues.

None yet