Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators over slices of ZST behave strange #42789

Closed
RalfJung opened this issue Jun 21, 2017 · 12 comments · Fixed by #52206
Closed

Iterators over slices of ZST behave strange #42789

RalfJung opened this issue Jun 21, 2017 · 12 comments · Fixed by #52206
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 21, 2017

The following test passes for slices over non-ZST, but fails for ZST:

fn slice_test<T>(x: &[T]) {
    let mut i = x.iter();
    i.next();
    let x1 = i.next().unwrap();
    let x2 = &x[1];
    let x3 = x.iter().nth(1).unwrap();
    assert_eq!(x1 as *const _, x2 as *const _);
    assert_eq!(x2 as *const _, x3 as *const _);
}

fn main() {
    slice_test(&[1,2,3]);
    slice_test(&[(),(),()]);
}

Playpen

While pointers to ZSTs don't really matter as they will never be dereferenced, I think providing consistent addresses for the same element would still make sense.
Yet, x2 turns out to be 0x55e742ee85dc, no idea where that address is coming from. The LLVM IR contains some undef, but I am not sure if that always indicates a problem.

Generally, the iterator's treatment of ZST's seems a little fishy. Below

impl<T> SliceIndex<[T]> for usize {
, the get methods defer to .offset to compute the address of an element of the slice. offset is only well-defined for in-bounds accesses, yet it is used here on a dangling pointer-to-ZST, with some non-0 offset.

@arielb1
Copy link
Contributor

arielb1 commented Jun 21, 2017

To clarify, the way the test fails is that the slice iterator returns 0x1 instead of an actual pointer to an element.

@arielb1
Copy link
Contributor

arielb1 commented Jun 21, 2017

Generally, the iterator's treatment of ZST's seems a little fishy. Below

impl<T> SliceIndex<[T]> for usize {
, the get methods defer to .offset to compute the address of an element of the slice. offset is only well-defined for in-bounds accesses, yet it is sued here on a dangling pointer-to-ZST, with some non-0 offset.

By "dangling element" you mean 0x8: &mut Zst? By what I was able to read from LLVM's rules, a "valid pointer range" for the purpose of GEPi is just a range that:
A) does not start or contain null or ~0.
B) does not overlap with any "LLVM-known allocation" - aka globals, stack, malloc, etc.

In particular, LLVM does not try to know about all allocas in address space, it only cares about the ones it knows about, so 0x8 is a valid pointer.

@RalfJung
Copy link
Member Author

To clarify, the way the test fails is that the slice iterator returns 0x1 instead of an actual pointer to an element.

IMHO that would be fine, if the get methods did the same. Things just should be consistent.

By "dangling element" you mean 0x8: &mut Zst?

0x1 here, but yeah.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 21, 2017

We could treat any cast to a raw pointer to a zst as a constant producing an integer. Should be trivial to do as a Mir pass and has clear semantics imo.

@RalfJung
Copy link
Member Author

This can't be done on polymorphic code though, can it?

@strega-nil
Copy link
Contributor

@arielb1 Could you show where LLVM talks about treating ~0 specially?

@Mark-Simulacrum Mark-Simulacrum added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 23, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 28, 2017
@bluss
Copy link
Member

bluss commented Jan 7, 2018

I was going to fix this for the slice iterator in #46223, but I have yet to pick it up again (feel free to do that anyone).

@RalfJung
Copy link
Member Author

Thanks to @rkruppe for pointing out that this is actually a soundness issue. The following module should never panic:

mod unique_token {
    use std::sync::atomic::{AtomicBool, Ordering};

    // With a private field so it can only be constructed inside this module
    pub struct Token(());
    
    // Stores whether we have already created a token
    static GUARD : AtomicBool = AtomicBool::new(false);
    
    impl Token {
        pub fn new() -> Option<Token> {
            if GUARD.compare_and_swap(false, true, Ordering::AcqRel) == false {
                Some(Token(()))
            } else {
                None
            }
        }
        
        pub fn take_two(a: &Token, b: &Token) {
            // There's only ever one token so these have to be the same
            assert_eq!(a as *const _, b as *const _,
                "How can there be two tokes?!?");
        }
    }
}

and yet here is how we can make it panic:

use unique_token::Token;

fn main() {
    let t = Token::new().expect("Couldn't even get a single token");
    // make sure we can't get a second
    assert!(Token::new().is_none());
    
    let s = &[t];
    let t1 = &s[0];
    let t2 = s.iter().next().expect("Slice can't be empty");
    Token::take_two(t1, t2);
}

Playground

@arielb1
Copy link
Contributor

arielb1 commented Jul 7, 2018

@RalfJung

You can also get elements of slices through slice patterns, which don't squash ZST pointers to 0x1. You have a good argument for having slice iterators use the "real" address of the ZST instead of squashing it.

cc @rust-lang/libs - what's the obstacle to not squashing pointers in iterators?

@arielb1 arielb1 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Jul 7, 2018
@arielb1
Copy link
Contributor

arielb1 commented Jul 7, 2018

Not sure whether we should tag this as I-unsound

@RalfJung
Copy link
Member Author

RalfJung commented Jul 9, 2018

The iterators also play fast-and-loose with the alignment rules. For example, Iter::as_slice will call from_raw_parts with an unaligned pointer when used with a type like [u32; 0].

@RalfJung
Copy link
Member Author

I have a proposed fix at #52206.

bors added a commit that referenced this issue Jul 18, 2018
slices: fix ZST slice iterators making up pointers; debug_assert alignment in from_raw_parts

This fixes the problem that we are fabricating pointers out of thin air. I also managed to share more code between the mutable and shared iterators, while reducing the amount of macros.

I am not sure how useful it really is to add a `debug_assert!` in libcore. Everybody gets a release version of that anyway, right? Is there at least a CI job that runs the test suite with a debug version?

Fixes #42789
bors added a commit that referenced this issue Jul 30, 2018
slices: fix ZST slice iterators making up pointers; debug_assert alignment in from_raw_parts

This fixes the problem that we are fabricating pointers out of thin air. I also managed to share more code between the mutable and shared iterators, while reducing the amount of macros.

I am not sure how useful it really is to add a `debug_assert!` in libcore. Everybody gets a release version of that anyway, right? Is there at least a CI job that runs the test suite with a debug version?

Fixes #42789
bors added a commit that referenced this issue Aug 2, 2018
slices: fix ZST slice iterators making up pointers; debug_assert alignment in from_raw_parts

This fixes the problem that we are fabricating pointers out of thin air. I also managed to share more code between the mutable and shared iterators, while reducing the amount of macros.

I am not sure how useful it really is to add a `debug_assert!` in libcore. Everybody gets a release version of that anyway, right? Is there at least a CI job that runs the test suite with a debug version?

Fixes #42789
bors added a commit that referenced this issue Aug 19, 2018
debug_assert to ensure that from_raw_parts is only used properly aligned

This does not help nearly as much as I would hope because everybody uses the distributed libstd which is compiled without debug assertions. For this reason, I am not sure if this is even worth it. OTOH, this would have caught the misalignment fixed by #42789 *if* there had been any tests actually using ZSTs with alignment >1 (we have a CI runner which has debug assertions in libstd enabled), and it seems to currently [fail in the rg testsuite](https://ci.appveyor.com/project/rust-lang/rust/build/1.0.8403/job/v7dfdcgn8ay5j6sb). So maybe it is worth it, after all.

I have seen the attribute `#[rustc_inherit_overflow_checks]` in some places, does that make it so that the *caller's* debug status is relevant? Is there a similar attribute for `debug_assert!`? That could even subsume `rustc_inherit_overflow_checks`: Something like `rustc_inherit_debug_flag` could affect *all* places that change the generated code depending on whether we are in debug or release mode. In fact, given that we have to keep around the MIR for generic functions anyway, is there ever a reason *not* to handle the debug flag that way? I guess currently we apply debug flags like `cfg` so this is dropped early during the MIR pipeline?

EDIT: I learned from @eddyb that because of how `debug_assert!` works, this is not realistic. Well, we could still have it for the rustc CI runs and then maybe, eventually, when libstd gets compiled client-side and there is both a debug and a release build... then this will also benefit users.^^
djrenren pushed a commit to djrenren/compiletest that referenced this issue Aug 26, 2019
debug_assert to ensure that from_raw_parts is only used properly aligned

This does not help nearly as much as I would hope because everybody uses the distributed libstd which is compiled without debug assertions. For this reason, I am not sure if this is even worth it. OTOH, this would have caught the misalignment fixed by rust-lang/rust#42789 *if* there had been any tests actually using ZSTs with alignment >1 (we have a CI runner which has debug assertions in libstd enabled), and it seems to currently [fail in the rg testsuite](https://ci.appveyor.com/project/rust-lang/rust/build/1.0.8403/job/v7dfdcgn8ay5j6sb). So maybe it is worth it, after all.

I have seen the attribute `#[rustc_inherit_overflow_checks]` in some places, does that make it so that the *caller's* debug status is relevant? Is there a similar attribute for `debug_assert!`? That could even subsume `rustc_inherit_overflow_checks`: Something like `rustc_inherit_debug_flag` could affect *all* places that change the generated code depending on whether we are in debug or release mode. In fact, given that we have to keep around the MIR for generic functions anyway, is there ever a reason *not* to handle the debug flag that way? I guess currently we apply debug flags like `cfg` so this is dropped early during the MIR pipeline?

EDIT: I learned from @eddyb that because of how `debug_assert!` works, this is not realistic. Well, we could still have it for the rustc CI runs and then maybe, eventually, when libstd gets compiled client-side and there is both a debug and a release build... then this will also benefit users.^^
bors added a commit to rust-lang-ci/rust that referenced this issue May 11, 2023
…r=the8472

Simplify the implementation of iterators over slices of ZSTs

Currently, slice iterators over ZSTs store `end = start.wrapping_byte_add(len)`.

That's slightly convenient for `is_empty`, but kinda annoying for pretty much everything else -- see bugs like rust-lang#42789, for example.

This PR instead changes it to just `end = ptr::invalid(len)` instead.

That's easier to think about (IMHO, at least) as well as easier to represent.

`next` is still to big to get inlined into the mir-opt/pre-codegen/ tests, but if I bump the inline threshold to force it to show the whole thing, this implementation is also less MIR:
```
> git diff --numstat
241     370     tests/mir-opt/pre-codegen/slice_iter.forward_loop.PreCodegen.after.mir
255     329     tests/mir-opt/pre-codegen/slice_iter.reverse_loop.PreCodegen.after.mir
184     216     tests/mir-opt/pre-codegen/slice_iter.slice_iter_mut_next_back.PreCodegen.after.mir
182     254     tests/mir-opt/pre-codegen/slice_iter.slice_iter_next.PreCodegen.after.mir
```
(That's ≈70 lines less for `Iter::next`, for example.)

r? `@ghost`

~~Built atop rust-lang#111282, so draft until that lands.~~
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
6 participants