Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_unchecked and friends should do bound checks in debug mode. #36976

Closed
ticki opened this issue Oct 5, 2016 · 20 comments
Closed

get_unchecked and friends should do bound checks in debug mode. #36976

ticki opened this issue Oct 5, 2016 · 20 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ticki
Copy link
Contributor

ticki commented Oct 5, 2016

This can help catching a lot of safety issues.

@bluss
Copy link
Member

bluss commented Oct 5, 2016

It's a nice thought, but they cannot change to do this. It would be a breaking change since they are often used to go outside the formal length (i.e. of a vector), but still inside the capacity.

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

It's a nice thought, but they cannot change to do this. It would be a breaking change since they are often used to go outside the formal length (i.e. of a vector), but still inside the capacity.

Not true. offset is used for this.

@hanna-kruppe
Copy link
Contributor

Plus, if get_unchecked were used for this purpose, it would probably be UB and should be changed since it returns &T, not *const T.

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

Yeah, &T holds the invariant that the inner value is initialized and valid, as opposed to *mut T or Unique<T>.

@bluss
Copy link
Member

bluss commented Oct 5, 2016

Current use of get_unchecked to go out of bounds of a slice:

It's ubiquitous in the implementation of Vec.

Non-libstd code uses the same pattern. It would for example break arrayvec

Break syntex

If it would be UB to access past the length but inside a capacity, why?

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

If it would be UB to access past the length but inside a capacity, why?

This code is definitely broken. &T asserts that the inner value is initialized, but it isn't past the length of the vector. It's very suprising that it does this instead of using offset.

@jonas-schievink
Copy link
Contributor

Probably related to the fact that slice::get_unchecked has like no documentation, despite being unsafe.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Oct 5, 2016

It's probably UB because it creates a &T to a location that is not initialized. The reference is immediately converted to a raw pointer, but simply existing is bad AFAIK. By analogy, in #36925 @arielb1 said that creating a &[T] (that is immediately converted to *const [T]) with a null pointer is insta-UB.

Granted, references are currently annotated with non-null metadata for LLVM, while they may not be an analogous attribute for "pointee memory is initialized". So it may not get miscompiled today. But it's still super dubious and should get phased out.

@bluss
Copy link
Member

bluss commented Oct 5, 2016

It's a question that the memory model has to answer, whether it is broken or not.

@bluss
Copy link
Member

bluss commented Oct 5, 2016

We can agree that this case is not broken, right, because it points at something that's initialized to a valid value?

    pub fn truncate(&mut self, len: usize) {
        unsafe {
            // drop any extra elements
            while len < self.len {
                // decrement len before the drop_in_place(), so a panic on Drop
                // doesn't re-drop the just-failed value.
                self.len -= 1;
                let len = self.len;
                ptr::drop_in_place(self.get_unchecked_mut(len));
            }
        }
    }

self.len -= 1;
let len = self.len;
ptr::drop_in_place(self.get_unchecked_mut(len));

@hanna-kruppe
Copy link
Contributor

That use case is indeed not UB.

@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2016

It's probably UB because it creates a &T to a location that is not initialized. The reference is immediately converted to a raw pointer, but simply existing is bad AFAIK. By analogy, in #36925 @arielb1 said that creating a &[T] (that is immediately converted to *const [T]) with a null pointer is insta-UB.

That sounds like a good enough reason not to require references to have valid values behind them, as long as the memory behind them is valid. In any case, ATM we do not have any plans of requiring that.

@leonardo-m
Copy link

leonardo-m commented Oct 5, 2016

(Hello, this is my first comment around here).
I think the debug asserts in the get functions are important to catch some possible bugs. When I define custom matrices and the like I usually put debug_asserts to catch some bugs in debug builds. Debug builds are usually so slow compared to release builds, that adding the bound tests usually gives a quite small performance loss.

If adding bound checks in debug mode to get_unchecked/get_unchecked_mut is an unacceptable breaking change, then I suggest to add two new methods that do it. This makes the Vec/Slice API bigger, but I think it's worth it.

A problem is: how to name those two functions? get_debug_checked/get_debug_checked_mut could be clear, but they seems a bit too much long.

@nagisa
Copy link
Member

nagisa commented Oct 5, 2016

If adding bound checks in debug mode to get_unchecked/get_unchecked_mut is an unacceptable breaking change, then I suggest to add two new methods that do it.

You’re looking for quite a weird compromise between get_unchecked* and plain indexing (which always panics) which doesn’t seem useful in general case. The desired behaviour can be easily achieved by combining get* with a variation of unchecked_unwrap, which would assert in debug builds. Having in mind those points, I do not think it would be a good addition to the stdlib.

@bluss
Copy link
Member

bluss commented Oct 5, 2016

I think it's reasonable to have more checking of unsafe code rather than no checks. I'm a proponent of the idea that ticki is proposing here, I just think it's a breaking change to change the slice methods.

(For example, ndarray's corresponding uget is documented to have a debug assertion. Those numerics people like to use unchecked indexing, but we also want sometimes-checking.)

@leonardo-m
Copy link

leonardo-m commented Oct 5, 2016

That "weird compromise" is essentially how the D language arrays work, they assert bounds in debug builds only.

The point of get_unchecked* is to speed up some arrays accesses you're sure are within bounds. The point of the bound tests for the indexing [] syntax is to have a memory safe language. Having those bound tests in debug builds only is indeed an intermediate point between being sure (after proving?) and being conservatively safe (followed by LLVM optimizations that mostly unpredictably remove some bound tests). So in my code where I can't afford bound tests in release builds, I'll keep using bound tests in debug builds.

Doing this with a get* plus a debug unchecked_unwrap sounds OK, as long unchecked_unwrap becomes a standard method.

@nagisa
Copy link
Member

nagisa commented Oct 5, 2016

The point of get_unchecked* is to speed up some arrays accesses you're sure are within bounds.

As shown by a number of examples elsewhere in the thread, that’s not necessarily true (or that it is not always clear which “bounds” the programmer has in mind).

I think it's reasonable to have more checking of unsafe code rather than no checks.

I do not disagree, but I find dubious an idea that many people would use not_really_unchecked_get()/get().not_really_unchecked_unwrap() over just unchecked_get(), and changing unchecked_get() had its ship depart quite a while ago.

That being said, if we were to add anything here, I’d insist on making it composable somehow (i.e. unchecked_unwrap()). Having a large matrix of _get() methods helps nobody.

@Aatch
Copy link
Contributor

Aatch commented Oct 6, 2016

Note that the cases in Vec are the way they are to deal with unwinding. By setting the length first, it prevents unwinding from attempting to drop the element.

@ticki
Copy link
Contributor Author

ticki commented Oct 6, 2016

@Aatch Yeah, that's natural, but I still believe that it is wrong to use unchecked_get. I was suprised that it wasn't done with offset.

@steveklabnik steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. C-feature-request Category: A feature request, i.e: not implemented / a PR. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 26, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 16, 2017

I agree that debug-only checking seems useful but there is a lot of design work remaining here. I am not confident that we can reach consensus on a design without someone spearheading this with an RFC.

Some points to make in an RFC related to debug-only checking of unsafe slice indexing:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests