Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe expressions #1346

Closed
wants to merge 3 commits into from
Closed

Unsafe expressions #1346

wants to merge 3 commits into from

Conversation

arcnmx
Copy link

@arcnmx arcnmx commented Oct 29, 2015

Yes the example is terrible, Idunno I just grep'd for unsafe and used the first thing I found ._.

tl;dr

unsafe some_unsafe_func()

instead of

unsafe {
    some_unsafe_func()
}

Rendered

@nagisa
Copy link
Member

nagisa commented Oct 29, 2015

While it is usually used for single calls, I think that’s unidiomatic use of unsafe {}.

Usually you want to insert calculations that, when wrong, could cause unsafe to cause memory unsafety into the block too: e.g.

unsafe {
    let bytes = 1024;
    do_something_with_str(somestr.slice_unchecked(0, bytes));
}
// rather than
let bytes = 1024;
do_something_with_str(unsafe {somestr.slice_unchecked(0, bytes) });

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 29, 2015
@seanmonstar
Copy link
Contributor

@nagisa that seems like arbitrary personal preference. As in, as long as it doesn't unreasonably tangle my code, i try to restrict the unsafe {} to only encapsulate unsafe things.

Reason: I don't want to accidentally (as in, without proper thought) insert an additional unsafe usage that the compiler ignores since I'm already inside an unsafe block.

@arcnmx
Copy link
Author

arcnmx commented Oct 29, 2015

While I don't necessarily disagree, I dislike doing that because it feels like there's a danger of accidentally slipping in something unsafe when it wasn't intended. I see unsafe as "something that takes potentially unchecked inputs, and does something scary with them". For example, take the implementation of slice::IndexMut:

    fn index_mut(&mut self, index: usize) -> &mut T {
        assert!(index < self.len());
        unsafe { self.get_unchecked_mut(index) }
    }

The assertion, which is very dangerous if wrong, is not included in the unsafe block. And in the example in the RFC, that includes covering the entire match statement in unsafe, where the whole _ => {} block could have entirely unrelated code in it (and then what if someone accidentally slips a pointer dereference in there thinking it's some other type!).

Reason: I don't want to accidentally (as in, without proper thought) insert an additional unsafe usage that the compiler ignores since I'm already inside an unsafe block.

This is why I take slight issue with unsafe functions, which make the entire function body implicitly unsafe.

@steveklabnik
Copy link
Member

I will say @nagisa 's style is the one that I adhere to as well.

@Gankra
Copy link
Contributor

Gankra commented Oct 30, 2015

Yeah putting everything in a big unsafe block is basically the equivalent of wrapping everything in a try catch(Exception), with all of the advantages and disadvantages associated with that kind of thing.

That said, I'm a fan of "the big unsafe" because as soon as I've established "we're doing some unsafe stuff", I don't really care about what's individually the unsafe things. It seems kinda weird to me to be worried about someone coming along and adding a new unsafe thing unwittingly?

@withoutboats
Copy link
Contributor

I use the petite unsafe style myself for the same reason as @arcnmx. And sometimes the invariants that make the unsafe call okay are necessarily distant from the call (e.g. it could be inside a method on a struct). I think this is a personal style thing and not an idiomatic thing.

Most of my unsafe code is ffi calls. I would enjoy being able to prepend unsafe to each ffi call rather than using blocks.

@Ericson2314
Copy link
Contributor

+1 petite unsafe, to the point where I wish we had "safe blocks" to limit unsafety, especially in unsafe functions.

@DanielKeep
Copy link

This is something I've wanted for FFI-heavy code since I wrote my first line of FFI code in Rust.

Yeah, yeah, I know; it's an external function, it could spray random image macros all over my address space. But it's MessageBoxW. It's not going to. Now, the arguments are a potentially different story, passing around pointers. I never feel entirely comfortable because I have to use an unsafe block to call MessageBoxW, but it's not MessageBoxW I'm worried about, it's the arguments, which I want to be in a safe context, if only for peace of mind.

Which then means breaking all the arguments out and binding them to variables outside the call and now the code, which was long to begin with (because that's what happens when you try to do anything with the Windows API in Rust) is now several times longer and I'm seriously starting to consider just writing this in C++ because I'm pretty much throwing away all safety anyway at this point.

A slightly exaggerated example, but you get the idea. That said, I'm not hugely in favour of this; the above problem can be addressed by just writing wrappers around every single FFI function. Then again, I wouldn't complain if this was added.

@withoutboats
Copy link
Contributor

It also makes reasoning about unsafe infectiousness somewhat straightforward: "unsafe language items are infectious, unsafe expressions are not," expressions and language items being the two conceptual levels of Rust's syntax.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 5, 2015

I do not understand this RFC.

In particular, I do not understand the relationship between the precedence of the dot-operator (EXPR.f, EXPR.m()) and the unsafe EXPR being suggested here.

It seems like it is saying that a.m() has higher precedence than unsafe EXPR, i.e. unsafe a.m() is parsed as unsafe { a.m() } under this RFC.

But at the same time, it claims that unsafe a.m().n() is parsed as (unsafe { a.m() }).n().

This does not match my usual understanding of how an infix operator like EXPR.f is parsed.

@nrc
Copy link
Member

nrc commented Nov 5, 2015

not in favour - the gains here seem very minor - we only save typing {}. The cost is more syntactic choices (and thus less uniformity) and unclear scoping - explicitly typing those {} makes scope explicit and much clearer. Plus a little bit more complexity in the language and implementation.

@nikomatsakis
Copy link
Contributor

I'm kind of indifferent here. I mean, from my POV, this is a small convenience, which is nice -- but it's a small one nonetheless. However I will note that others on the @rust-lang/lang team are less positive. I'll let them speak for themselves, but I think it's primarily a matter of not wanting to introduce more choices where they are not needed. (Oh, they already did while I was writing this. Oh well.)

However, we have to figure out the precedence! I initially assumed that this had a very low precedence, like closures. Basically it would extend as far right as possible. But that doesn't seem to be what the RFC says, and I think it's going to be very hard to encode that into the grammar.

@nikomatsakis nikomatsakis self-assigned this Nov 5, 2015
@arcnmx
Copy link
Author

arcnmx commented Nov 5, 2015

@pnkfelix To clarify...

at the same time, it claims that unsafe a.m().n() is parsed as (unsafe { a.m() }).n().

What's meant is that it means providing unsafe a.m().n() makes it wrap more of the code in unsafeness than the absolute minimum, which would be unsafe { a.m() }.n(). Not that they're equivalent. The RFC simply proposes allowing expressions to follow unsafe rather than strictly blocks, it doesn't mess with operator precedence or anything strange like that!

@pnkfelix
Copy link
Member

pnkfelix commented Nov 5, 2015

@arcnmx One might argue the absolute minimum would be (unsafe { a }).m().n() ...

The above admittedly looks absurd, since an identifier lookup is never unsafe, but my point is that in terms of how language grammars are typically defined, it is strange to have an infix operator (the "." in EXPR.m() that is more tightly binding than a prefix operator, but only for one level of application.

(Maybe my concern will be clearer if you consider an example like unsafe a[i].foo() -- is that equivalent to (unsafe { a[i] }).foo(), or is in rather equivalent to unsafe { a[i].foo() } ...?)

@pnkfelix
Copy link
Member

pnkfelix commented Nov 5, 2015

@arcnmx

Or maybe this will be the simplest way for me to state my concern:

You say:

The RFC simply proposes allowing expressions to follow unsafe rather than strictly blocks, it doesn't mess with operator precedence or anything strange like that!

but parsing expressions that lack delimiters (delimiters being the {...} in unsafe { ... }) inherently messes with operator precedence. (The operator in this case being the dot operator, EXPR.f, which is sometimes used in tandem with method calls to form the dot-method-call operator EXPR.m().)

@arcnmx
Copy link
Author

arcnmx commented Nov 5, 2015

@pnkfelix Sorry, I'm clearly very terrible at expressing this. By absolute minimum, I meant "minimum that you could write with {} syntax and only wrap the relevant unsafe code". It's unrelated to how the expression syntax works, I'm simply stating that expression syntax provides less control over which part of the expression is covered as unsafe, because it would have such a low precedence. And therefore as a drawback means it could encourage more code wrapped in "unsafety" than may otherwise be expressed with the {} syntax.

I'll update the wording and examples when I get a chance. I'm not aware of a full operator precedence table, so it's tough to accurately place it, but I'd place it somewhere lower than infix operators but higher than most binary operators. Consider these identical...

unsafe a.m().n() => unsafe { a.m().n() }
unsafe a[i].foo() => unsafe { a[i].foo() }
unsafe a[i].foo() + 1 => unsafe { a[i].foo() } + 1

...and so on. @nikomatsakis mentioned it having closure operator precedence, which would actually make the last example unsafe { a[i].foo() + 1 }. I find that a bit unintuitive, but am not particularly picky either way. Perhaps the best way to describe my own preference is it would be roughly of equivalent precedence to the & prefix operator.

EDIT: hm, actually... That also enables (unsafe a.m()).n() => unsafe { a.m() }.n()

@pnkfelix
Copy link
Member

pnkfelix commented Nov 5, 2015

@arcnmx

Perhaps I have been the victim of an invalid inference.

You wrote:

Consider these identical...

unsafe a.m().n() => unsafe { a.m().n() }

but this appears to contradict the RFC text?


Namely, the primary example of the RFC text says:

unsafe { senders.get_unchecked(0) }.send(value).is_err()

can now be rewritten as:

unsafe senders.get_unchecked(0).send(value).is_err()

and the Drawbacks section strongly implies that the above is not equivalent to:

unsafe { senders.get_unchecked(0).send(value).is_err() }.


But then the very first example in your last comment implies that

unsafe senders.get_unchecked(0).send(value).is_err()

is desugared to unsafe { senders.get_unchecked(0).send(value).is_err() }.


Am I misunderstanding something here?

@arcnmx
Copy link
Author

arcnmx commented Nov 5, 2015

@pnkfelix this is what I meant by expressing the idea poorly :)

I didn't mean to imply that the primary examples were identical in that the latter desugared into the first. I now realise that's confusing and will change it! The drawbacks section meant to imply that it was equivalent to the greedy unsafe (that greediness itself being the drawback!)

@nikomatsakis
Copy link
Contributor

I remember some similar confusion about the precedence of the box operator (rust-lang/rust#21192). My intuition seems to generally be that keywords have lower precedence than symbols -- however, I remember being convinced (in that case) that there were advantages to the higher precedence, primarily because box foo as bar parsed as (box foo) as bar, which isn't especially relevant here. Anyway, I could probably live with an unsafe keyword that had the same precedence as other unary operators. That said, the lack of clarity around the scope of the unsafe could be perceived as a downside of this proposal!

@nikomatsakis
Copy link
Contributor

The point of my comment, which upon re-reading I see that I didn't state explicitly, I think was that it might make sense to tie the precedence of unsafe to the precedence of box.

@arcnmx
Copy link
Author

arcnmx commented Nov 6, 2015

Aha, I forgot we had a precedent set (hah) already with the box keyword! Agreed that they should correspond with each other. Will also be sure to make proper note of this in the RFC.

My comment on keyword 1 + 2 precedence actually applies to box as well - my personal feel is that keyword prefix operators should have the same precedence as any other unary prefix operator.

@retep998
Copy link
Member

As someone who uses unsafe quite heavily, I don't feel that being able to leave out {} is worth the potential ambiguity and confusion. After all, we require that other things like if and while must always have braces, so what is so special about unsafe that braces can be optional?

@Gankra
Copy link
Contributor

Gankra commented Nov 11, 2015

@retep998 well, requiring braces is so stuff like

if foo bar();
    baz();

doesn't happen. (also presumably parsing issues with no parens around the condition?)

But

unsafe unsafe_op1();
   unsafe_op2();

just doesn't compile. There's no way to misuse this construct, as far as I can tell.

@arcnmx
Copy link
Author

arcnmx commented Nov 11, 2015

Updated RFC text for clarification on the operator precedence.

@arcnmx
Copy link
Author

arcnmx commented Nov 11, 2015

@retep998

As someone who uses unsafe quite heavily, I don't feel that being able to leave out {} is worth the potential ambiguity and confusion. After all, we require that other things like if and while must always have braces, so what is so special about unsafe that braces can be optional?

It's a personal preference thing, really. I dislike braces because they seem like strange noise that normally should only indicate control flow or inner scope. I see unsafe as "warning this function call does scary things especially if its parameters are wrong take note" and not "this whole block of code is scary things". Context around unsafe is important, and you'll never encompass it all in a single block. The rightward indent drift is annoying to me when you don't necessarily intend to create a new scope.

I also consider "this function is unsafe to call" and "this code does unsafe things" as two separate ideas, which is why I disagree with the idea that the body of an unsafe functions should be implicitly unsafe.

But offtopic, it's mostly about ambiguity. Any situation where an expression is followed by another expression, such as control flow statements that take a condition, must have some sort of unambiguous parsing of the two. That typically means either if (cond) expr (and then you have to worry about expressions following expr and other painful things) or if cond { ... }, etc.

With unsafe, it's just not applicable. There's no ambiguity with unsafe foo() bar() because foo() bar() is simply not valid Rust. It's essentially an operator, so the only question is how much of the following expression it consumes by way of precedence, which is certainly a concern but IMO a small one (either way, it still conveys the "watch this line of code carefully" idea perfectly fine). It will help to consolidate with box so that we have a standard precedence for keyword prefix operators that should be easy enough to remember and not present a cognitive or parsing burden on the language.

@nikomatsakis
Copy link
Contributor

Hear ye, hear ye. This RFC is entering final comment period.

As of now, the general feeling was that we would most likely not accept the RFC. It is certainly true that unsafe <expr> feels lightweight, but at the same time, everyone seems to have distinct feelings about what the precedence of the unsafe keyword ought to be, and the use of {} has long been a deliberate design choice in Rust to avoid such confusion. But no final decision has been made.

@withoutboats
Copy link
Contributor

I don't feel optimistic about this RFC being accepted, but I would still like it. I see this as useful mainly for calling a single unsafe function in the midst of otherwise safe code. Obviously, since its a stylistic thing, its not an awful loss either.

Precedence ambiguity doesn't seem like too much of a problem, for the same reason the bracketless if scenario is not much of a problem - if precedence surprisingly puts an unsafe operation outside of the unsafe scope, the compiler will inform the user. This is also a fairly uncommon occurrence since most other operations in the precedence list are not unsafe.

@arcnmx
Copy link
Author

arcnmx commented Mar 12, 2016

Mm, I'd like it, I think it's an ergonomic improvement over the current clunky required blocks, which just plain seem unnecessary, but it just doesn't seem like that's enough of a motivation to incite change!

I don't like to think that the precedence is the main reason to not go through with it though... It's a talking point for sure, but it's not our first keyword prefix operator and it won't be particularly confusing whichever way is chosen. Either it will encompass a small part of a line (and result in compile error when misused), or it will encompass a large part of it but never beyond that one line/statement.

A lot of the debate in this thread regarding the precedence was due to misunderstandings and poor wording in the first draft of the RFC. After that, box was mentioned, and tying it to that precedence as a common rule for keyword unary prefix operations makes the most sense to me. It's unfortunate that's still unstable / up in the air...

@aturon
Copy link
Member

aturon commented Mar 14, 2016

FWIW, I'm not strongly opposed to this RFC, but like many others feel its downsides (extra syntactic complexity/axes of choice, general dislike for this style of prefix operators) outweigh its upsides (occasional slight improvement to ergonomics). I've not felt the ergonomic hit here much personally, despite having written a decent amount of code that could use this proposal. And, on the other hand, I somewhat appreciate the visual attention that the braces draw to today's unsafe.

@aturon aturon added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed I-nominated labels Mar 17, 2016
@nikomatsakis
Copy link
Contributor

As we forgot to issue the "final-comment-period" label, we're going to extend FCP by 1 week here.

@bstrie
Copy link
Contributor

bstrie commented Mar 21, 2016

👎 to this. I don't see the utility in making unsafe blocks more visually loose, I'm sour on TIMTOWTDI, and I'd be perpetually unclear on operator precedence. This is the sort of feature that would be explicitly forbidden by a Google Style Guide-like document. And even though the compiler can check that unsafe operations occur only in unsafe blocks, it still feels too much like goto fail for my liking. It's far too much cost just for the ability to sometimes avoid typing two characters.

Anyway, if there's more demand for this in the future then we can always reconsider later.

@mandel59
Copy link

👎.
@arcnmx's “a danger of accidentally slipping in something unsafe” would not be proper comprehension.
unsafe fn and unsafe block might be seen as similar thing, but they are different. While unsafe fn means that the function is unsafe, the unsafe block means that the block is rather asserted as safe manually. When you use unsafe block, you have to prove that the block is safe.
Less unsafe blocks are better only if the proof of the blocks are provided. The petite unsafe style doesn't help or encourage you to prove it and might cause a danger of accidentally asserting something unsafe as safe, then the compiler wouldn't complain anything but the program is wrong.

@llogiq
Copy link
Contributor

llogiq commented Mar 22, 2016

If this gets accepted, I'll personally add a lint against it to clippy. It'd probably be allow, though.

@aturon
Copy link
Member

aturon commented Apr 1, 2016

The lang team met to discuss this RFC and has decided to close. The rationale has already been laid out at various points on the thread, but to summarize:

  • It's unclear whether we want to encourage this particular narrow use of unsafe.
  • The precedence rules around operators like this one (and box) tend to be somewhat confusing and subtle; Rust's design on the whole has preferred requiring braces for clarity.
  • On the whole, the additional complexity and potential confusion introduced by adding this alternative form outweigh any benefits, from the perspective of the lang team.

Thanks, @arcnmx for the RFC!

@aturon aturon closed this Apr 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet