Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with simple stress test of global allocator #36

Closed
kevin-vigor opened this issue Mar 6, 2019 · 5 comments

Comments

Projects
None yet
2 participants
@kevin-vigor
Copy link
Contributor

commented Mar 6, 2019

The following simple stress test fails rapidly with a segfault:

#include <vector>
#include <cstdlib>

static constexpr size_t kMinAllocSz = 800000;
static constexpr size_t kMaxAllocSz = 900000;
static constexpr unsigned kMaxLiveAlloc = 128;  // keep no more than 128 * kMaxAllocSz memory allocated.

int main(void) {
    std::vector<void *> alloc(kMaxLiveAlloc, nullptr);

    while (1) {
        const size_t ix = rand() % kMaxLiveAlloc;
        const size_t sz = (rand() % (kMaxAllocSz - kMinAllocSz)) + kMinAllocSz;

        free(alloc[ix]);
        alloc[ix] = malloc(sz);
    }
    return 0;
}

Reproduce with:

 g++ --std=c++14 -g -Wall -Werror msimple.cpp -o msimple -ldl
LD_PRELOAD=/home/kvigor/mesh/libmesh.so ./msimple

I have attempted to debug this a bit. What I have found is that in GlobalHeap::pageAlignedAlloc() I will see that the pointer returned from mh->mallocAt() == arenaBegin(), and if I check miniheapForLocked(ptr) it does not point to the original miniheap. When I later free this allocation it fails due to the same bogus miniheap pointer being found in ThreadLocalHeap::free(). Unfortunately I have not yet been able to diagnose further.

This error does not occur if I allocate fixed size objects; having some randomness in the allocation size seems to be required to trigger the issue.

(minor sidebar: what is the "locked" implied in miniheapForLocked()? free() calls it with no obvious locks held? A comment and/or renaming of the function might help.)

@bpowers

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

@kevin-vigor thanks for the report + reproducer! I can trigger this locally, and am looking now.

@bpowers

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

also - the Locked prefix on miniheapFor was vestigal - no lock is needed to look up a miniheap with that function anymore

bpowers added a commit that referenced this issue Mar 7, 2019

@bpowers

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

Going to have to continue this tomorrow, but again thanks for such a helpful test case + debugging notes.

I've built mesh locally with ./configure --debug --no-optimize, which enables a lot of default-off debug checks. Additionally, I've checked your test case into the repo, and you can build and run it under GDB with mesh like:

./configure --debug --no-optimize
make -j5
make src/test/global-large-stress
gdb src/test/global-large-stress

There are at least 2 things going on here: we aren't properly coalescing/reusing dirty Spans in MeshableArena in this case (exclusively large object allocations), which leads to continuously allocating new spans and enlarging the arena.

Second, this error happens right around when the new allocation is 4 GiB from the start of the arena -- it sure feels like we have some int32_t math that has snuck in somewhere and we are overflowing int.

bpowers added a commit that referenced this issue Mar 9, 2019

mini_heap: fix 32-bit truncation in ptr calculations
Updates #36

getSpanStart has code like:

```C++
    return beginval + _span.offset * kPageSize;
```

Because kPageSize was an int, and _span.offset is an int, that part of
the expression was being truncated on allocations more than 4 GB from
the start of the arena.

@bpowers bpowers closed this in 86c7db2 Mar 9, 2019

@bpowers

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

@kevin-vigor both issues are fixed, and I can now run the stress test for minutes on my machine. Please shout if you turn up anything else :)

@kevin-vigor

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2019

Thanks for quick response! I will be banging on allocator more next week, will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.