New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.pop_front() causing memory corruption (macOS) #57

Open
aldanor opened this Issue Dec 3, 2018 · 36 comments

Comments

Projects
None yet
5 participants
@aldanor
Copy link

aldanor commented Dec 3, 2018

This took me a long while to figure out, but I narrowed it down to just a single line:

println!("{:?}", queue);
queue.pop_front();
println!("{:?}", queue);

The elements are simple Clone/Copy structs; the first println outputs the queue in its normal state, whereas in the second all values are like 4294999990 or 123145302343606. This only happens in a quickcheck-like test suite where the queue is used super intensively, and pushes/pop are done thousands of times (it constantly fails at the same spot though).

I could try running it through valgrind if it helps, not sure how else I could help. Is this a known issue perhaps?

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 3, 2018

Hi @aldanor thank you for the report and sorry for the slow reply (i'm travelling). Could you share a full snippet that reproduces the issue ? It's ok if it depends on quicktest or one of your libraries, I can use that as a starting point to investigate further.

Also it would be nice to know whether this happens with the latest version of the library or some older version (could you try with master ?). IIUC, this happens on MacOSX right ? Could you try enabling the unix_sysv cargo feature and see if you also reproduce it with that ?

@gnzlbg gnzlbg added the bug label Dec 3, 2018

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 3, 2018

@gnzlbg I've managed to isolate this. Here's an example that always fails:

    #[test]
    fn test_deque() {
        #[derive(Clone, Copy, Debug, PartialEq)]
        pub struct Foo {
            a: i64,
            b: Option<(bool, i64)>,
        }

        use slice_deque::SliceDeque;
        use rand::{StdRng, SeedableRng, distributions::Uniform};

        let mut rng: StdRng = SeedableRng::seed_from_u64(0);
        let mut deque = SliceDeque::new();

        loop {
            let n = rng.sample(Uniform::new_inclusive(0, 1000));
            for i in 0..n {
                deque.push_front(Foo { a: 42, b: None });
            }
            let n = rng.sample(Uniform::new_inclusive(1, deque.len()));
            for i in 0..n {
                assert_eq!(deque.pop_front(), Some(Foo { a: 42, b: None }));
                if !deque.is_empty() {
                    // this assertion fails (becomes corrupt after pop_front())
                    assert_eq!(unsafe { *deque.get_unchecked(deque.len() - 1) },
                               Foo { a: 42, b: None });
                }
            }
        }
    }

fails like so:

thread 'test_deque' panicked at 'assertion failed: `(left == right)`
  left: `Foo { a: 1048576, b: Some((true, 2)) }`,
 right: `Foo { a: 42, b: None }`', test.rs:251:21

Note that the same example if you replace a struct with int works just fine. Could it have anything to do with alignment?

(Haven't tried master/unix_sysv yet, will do next.)

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 3, 2018

@gnzlbg Just checked the master branch and unix_sysv enabled, same outcome.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

This is indeed a bug somewhere in SliceDeque, minimal working example:

const C: [i16; 3] = [42; 3];

let mut deque = SliceDeque::new();
for _ in 0..918 {
    deque.push_front(C);
}

for _ in 0..237 {
    assert_eq!(deque.pop_front(), Some(C));
    assert!(!deque.is_empty());
    assert_eq!(*deque.back().unwrap(), C); // fails B != C
}
@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 5, 2018

Indeed, your example is even more minimal.

Does this only occur on macOS?

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

Does this only occur on macOS?

I have only tested this on macOS, working on a fix. I suspect the bug is platform independent, but I can't say for sure yet.

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 5, 2018

Just checked on 64-bit Linux:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some([0, 42, 42])`,
 right: `Some([42, 42, 42])`', src/main.rs:14:9

@gnzlbg gnzlbg referenced this issue Dec 5, 2018

Merged

Fix issue 57 #58

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 5, 2018

Another thing to note re: your example, if you replace 3 with any power of two (2/4/8), everything works. If you use an odd number though, like 3/5/7, it fails, deterministically. So something to do with alignment.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

It appears that this was caused by a "by 1 off"-error: 70c87a4#diff-b4aea3e418ccdb71239b96952d9cddb6R751

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

Could you test if that branch solves the problem for you?

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 5, 2018

It appears that this was caused by a "by 1 off"-error

That's usually the nastiest type of errors :trollface:

Could you test if that branch solves the problem for you?

I've checked the branch on 64-bit Linux, seems to work fine so far, I think that fixed it.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

I thought about alignment too at first, but at the end the problem had nothing to do with that.

When the SliceDeque had all elements in the second mirrored region except for one element in the first one, a pop_front would remove that element, such that all elements are in the second mirrored region.

For simplicity, the whole implementation assumes that if all elements fit in a single mirrored region, they are always in the first one.

The job of the code with the bug is to make sure that this is the case. It worked in many cases, but it was missing all cases in which the elements of the second memory region lied exactly on the boundary between both regions.

This introduced a memory error that allows safe Rust code to read uninitialized memory if the deque was put in the state described above.

@gnzlbg gnzlbg closed this in #58 Dec 5, 2018

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

Version 0.1.16 has been released with the fix. Sorry that it took so long to get to the bottom of this, i was travelling for the last couple of days.

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 5, 2018

@gnzlbg Thanks a lot for the quick fix! Will have to switch from vecdeque back to slicedeque... again :)

@aldanor aldanor referenced this issue Dec 5, 2018

Open

Fuzz the API #59

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 5, 2018

Will have to switch from vecdeque back to slicedeque... again :)

I've tried to make that as painless as possible by providing VecDeque API compatibility. If this isn't as simple as a use slice_deque::SliceDeque as VecDeque; please open an issue.

SliceDeque isn't always better than VecDeque, and I think that switching between them should be as painless as possible to allow users to measure and choose what's best for them.

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Dec 6, 2018

Reads from uninitialized memory can be exploited to obtain secret data, bypass exploit mitigations or even execute arbitrary code.

Please add this issue to the Rust security advisory database so that anyone depending on the crate has a way to check whether they depend on a vulnerable version.

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 15, 2018

Seems still hitting this problem even using 0.1.16. I cannot say it's the same bug, but the behavior is quite alike. At some time memory of the items in the deque is corrupted. If I turn on unix_sysvfeature, it gives an other (not out of memory) panic.

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 15, 2018

@zimond Could you try isolating a minimal example?

@aldanor

This comment has been minimized.

Copy link
Author

aldanor commented Dec 15, 2018

Seems like #59 would be helpful to have after all.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 15, 2018

@zimond do the tests pass on your system? which system are you on?

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 16, 2018

I'm on a 17 macbook pro. The tests pass. It's hard to isolate the problem as I run into this in a quite complex system i'm building. I created roughly 1000x SliceDeque and keeps .pop_front() and .remove(index) and .insert(index) between the dequeues, at generally the very same timepoint on each run, the memory corrupts. The items I used in the deque is a plain struct containing some vec attributes.

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 16, 2018

By the way I can confirm this is related to SliceDeque, as I just switched my code back to std VecDeque and the problem disappears.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 19, 2018

@zimond I haven't been able to reproduce this yet on macos x. It would be extremely helpful if you could come up with a minimal (or not so minimal, e.g. point me to a github repo where this fails) working example.

If your code is not available online, maybe you could provide a version where everything that doesn't have much to do with the deques is stripped out, at least as long as its reasonable to do so.

In the meantime, i'm going to start fuzzing the library and see if that finds it.

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 19, 2018

Ok I'll try... I will update in this thread once I get something

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 19, 2018

Thanks! In the mean time I've set up fuzzing (see #59 (comment)) but it hasn't found any issues yet :/

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 19, 2018

Have you checked .as_mut_slice() and then mutating items? Or does it have the possibility to break memory ?

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 19, 2018

So as_mut_slice is pretty trivial for SliceDeque, it cannot really corrupt anything.

Until now, the majority of bugs have been due to failure to update the head and tail of the deque properly. I don't see anything wrong with that code, and there are a lot of debug_assert!s enabled in debug mode to check that these work properly, but that does not mean that these are correct.

I didn't ask about this before, but I suppose that you are able to reproduce the memory corruption in debug builds with debug assertions turned on, and you do not get a panic coming up from any of the asserts, right?

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 19, 2018

Just tried debug build and you are right, the assertions did not catch this. It's so hard to reproduce this and I checked my code once again, I created a custom Iterator based on the slice returned by as_mut_slice (internally maintaining a usize index). The project always crashes when iterating the iterator.

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 21, 2018

I managed to narrow the bug down to pop_front(). In LLDB, I found a crash that before pop_front() call, a SliceDeque has two items. After the pop, it has one item. But:

  1. the item id do not match either the remaining item or the removed item.
  2. The vec field in the item is corrupted.

So I think maybe in certain situation, pop_front will move the pointer to some invalid memory, maybe?

image

(note: after pop, head = 0, tail = 1)

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 21, 2018

So I think maybe in certain situation, pop_front will move the pointer to some invalid memory, maybe?

This is basically what was happening in this bug. That's very suspicious .

@zimond is your project in github ? or could you send me a reduced test per email if it isn't ?

@zimond

This comment has been minimized.

Copy link

zimond commented Dec 21, 2018

Not on github. It's really hard to create a reduced test on this. So I just updated several replies here. I will try again this weekend. Sorry for the long wait.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Dec 21, 2018

Don't be sorry, I am! I want to fix this, but without a program to reproduce it I really can't :/

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Jan 15, 2019

I haven't forgotten about this, a reproducer would still be appreciated.

@gnzlbg gnzlbg reopened this Jan 15, 2019

@zimond

This comment has been minimized.

Copy link

zimond commented Jan 16, 2019

@gnzlbg Hey I tried several times but it's still hard to reproduce. But today I reviewed the code and decided to replace .pop_front() with .remove(0), and it stopped crashing. Hope this helps, anyway.

@Rafferty97

This comment has been minimized.

Copy link

Rafferty97 commented Jan 22, 2019

I'm also encounterring a memory corruption issue when using SliceDeque. It always happens after a call to .push_front(), after which the program immediately crashes due to reading uninitialised memory. Unfortunately, my code is also quite complex like in @zimond's case, but if I can produce a simplified test case I will post it here.

@gnzlbg

This comment has been minimized.

Copy link
Owner

gnzlbg commented Jan 22, 2019

I've implemented an optimization on master that might break the ".remove(0) workaround", so we really do need to get to the bottom of this. At this point, any kind of working example, no matter how complex, would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment