Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicate frees in usingJEMalloc() / usingTCMalloc() checks #1949

Closed
wants to merge 1 commit into from

Conversation

mhx
Copy link
Contributor

@mhx mhx commented Mar 7, 2023

I'm using folly & jemalloc in one of my projects. After recently updating folly, I was seeing random segfaults in the test suite.

With a debug build of jemalloc, I got:

<jemalloc>: size mismatch detected (true size 96 vs input size 8), likely caused by application sized deallocation bugs

Further debugging and bisecting pointed at commit 73f67d4.

The problem is that the new code introduced by that commit does not prevent the slow path from being run multiple times, as was the case with the original code. For some reason, the checks for jemalloc/tcmalloc use a static variable for capturing the result of the malloc call. If the slow path is executed multiple times, this leads to multiple free calls on the same address. I don't know why the variable was made static, but it doesn't seem to be necessary and changing the variable to automatic storage duration fixes the issue.

(The check for tcmalloc still looks a little questionable as it only looks at the global number of allocated bytes, which also depends on allocations in other threads.)

I'm using folly & jemalloc in one of my projects. After recently
updating folly, I was seeing random segfaults in the test suite.

With a debug build of jemalloc, I got:

```
<jemalloc>: size mismatch detected (true size 96 vs input size 8), likely caused by application sized deallocation bugs
```

Further debugging and bisecting pointed at commit 73f67d4.

The problem is that the new code introduced by that commit does
not prevent the slow path from being run multiple times, as was
the case with the original code. For some reason, the checks for
jemalloc/tcmalloc use a `static` variable for capturing the result
of the `malloc` call. If the slow path is executed multiple times,
this leads to multiple `free` calls on the same address. I don't
know why the variable was made `static`, but it doesn't seem to be
necessary and changing the variable to automatic storage duration
fixes the issue.

(The check for tcmalloc still looks a little questionable as it
only looks at the global number of allocated bytes, which also
depends on allocations in other threads.)
@facebook-github-bot
Copy link
Contributor

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mhx added a commit to mhx/dwarfs that referenced this pull request Mar 8, 2023
@mhx
Copy link
Contributor Author

mhx commented Mar 11, 2023

FYI, just went through the history of this piece of code out of curiosity and the static was originally introduced (a15fcb1) in order to avoid optimizing out the malloc call when the pointer was not freed. Later (e697d57), the free call was introduced, but the static was kept.

@mhx
Copy link
Contributor Author

mhx commented Apr 26, 2023

Ping?

@ot
Copy link
Contributor

ot commented May 4, 2023

Hey @mhx, long time no see :)

the new code introduced by that commit does not prevent the slow path from being run multiple times

This is unexpected, it definitely should be run only once: note how FastStaticBool::getSlow uses a static initializer to ensure that Initializer is only called once. In fact, I cannot reproduce what you're observing in our builds. Are you observing this in folly tests? Can you give an example? I suspect there is some issue with templates not being merged correctly across compilation units.

About why static is necessary, the compiler is technically allowed to elide malloc/free pairs, see for example https://godbolt.org/z/Kd18edzav . volatile may help here, but I'm not sure it's enough.

The check for tcmalloc still looks a little questionable as it only looks at the global number of allocated bytes, which also depends on allocations in other threads.

Agree on this. There is an implicit assumption that the first feature testing will happen before any thread is started, which is almost certainly the case in practice, but it would be better if we didn't need that.

@mhx
Copy link
Contributor Author

mhx commented May 6, 2023

Hey @mhx, long time no see :)

Hi @ot, hope you're doing fine!

the new code introduced by that commit does not prevent the slow path from being run multiple times

This is unexpected, it definitely should be run only once: note how FastStaticBool::getSlow uses a static initializer to ensure that Initializer is only called once.

You're absolutely right. I think I missed the static in getSlow.

I know what's going on, and I believe I missed this when I originally sent the PR:

folly gets built with -std=gnu++17 (and -std=gnu++1z, because it contradicts itself here and here). However, the rest of my code (including thrift) is built with -std=c++20. So, depending on the flags the compilation unit was built with, different implementations of FastStaticBool are being used, each of which has one shot at calling the initializer. This matches with what I'm seeing in the debugger:

  • I'm setting a breakpoint on the line with the free(ptr); call in usingJEMalloc().
  • The breakpoint gets hit twice, from different threads: once via small_vector::push_back() (header-only code, so C++20), and a second time via folly::IOBuf::create called from Thrift/Frozen but via io/IOBufQueue.cpp, so C++17.

Here's a simple repro:

#include <iostream>
#include <vector>

#include <folly/small_vector.h>
#include <folly/io/IOBuf.h>

int main()
{
  folly::small_vector<int, 1> foo;
  foo.resize(100);

  auto bar = folly::IOBuf::create(42);

  std::cout << foo.size() << "\n";
  std::cout << bar->goodSize(10) << "\n";

  std::vector<void *> ptr;

  for (int i = 0; i < 100000; ++i)
  {
    ptr.push_back(malloc(i % 2000));
  }

  for (auto p : ptr)
  {
    free(p);
  }

  return 0;
}

Building folly with cmake/clang like this:

# mkdir build-clang
# cd build-clang
# CC=clang CXX=clang++ cmake .. -GNinja
# ninja

Then compiling the above code in test.cpp, once using C++17 and once using C++20:

# clang++ -std=c++17 -O2 -g -I .. -I . -o test-c++17 -L. test.cpp -lfolly -lglog -lfmt -ljemalloc
# clang++ -std=c++20 -O2 -g -I .. -I . -o test-c++20 -L. test.cpp -lfolly -lglog -lfmt -ljemalloc

The C++17 one runs just fine, the C++20 one runs into an assertion during the free() loop:

mhx@air build-clang [main] $ ./test-c++17 
100
16
mhx@air build-clang [main] $ ./test-c++20 
100
10
<jemalloc>: /var/tmp/portage/dev-libs/jemalloc-5.3.0-r1/work/jemalloc-5.3.0/include/jemalloc/internal/arena_inlines_b.h:513: Failed assertion: "bitmap_get(slab_data->bitmap, &bin_info->bitmap_info, regind)"
Aborted

They also produce different results for the goodSize(10) call because the second time the initializer is called, the counter doesn't change, making the code think it's not running under jemalloc.

If you debug into test-c++20, you'll find that the static pointer from the initializer is freed twice, once via each FastStaticBool implementation.

So I can definitely fix this by overriding folly's build flags to be C++20 as well.

However, I still think this is an issue as it requires projects using folly to either use the same compile flags, or override folly's.

I believe my fix would still be a viable solution, as it would allow the initializer to safely be called more than once.

About why static is necessary, the compiler is technically allowed to elide malloc/free pairs, see for example https://godbolt.org/z/Kd18edzav . volatile may help here, but I'm not sure it's enough.

I'm not sure either, but I genuinely wonder what could justify elision of the malloc() call when the target variable is volatile?

@ot
Copy link
Contributor

ot commented May 8, 2023

Even if we work around this specific issue (and it would be a very fragile workaround), compiling with different flags so that different feature flags are applied to the same header is a recipe for ODR violations, and likely to cause more pernicious issues. For example, in general folly makes no guarantees about ABI stability, so you could even have structs with different sizes under different versions of C++.

This seems like something that would be best addressed at the build system level. I'm not familiar with our CMake specs, but there should be some way to specify the C++ version from the parent project, and if there isn't we should support that. I'd be very happy to accept a PR in that direction 😃

@mhx
Copy link
Contributor Author

mhx commented May 26, 2023

Definitely agree that the standard should be settable and that it's a more robust solution to the problem. I've made a new PR #2005 and will close this one.

@mhx mhx closed this May 26, 2023
facebook-github-bot pushed a commit that referenced this pull request Jul 13, 2023
)

Summary:
This follows up on a discussion in #1949 and the comment by ot that the C++ standard for folly should be (easily) settable by a parent project.

There are two commits in this PR:

- The first commit removes the redundant `CXX_STD`, which causes `-std=` to be passed to the compiler twice.
- The second commit introduces a check before setting `CMAKE_CXX_STANDARD`, so a value set by a parent project won't be overridden.

Pull Request resolved: #2005

Reviewed By: Gownta

Differential Revision: D47343926

Pulled By: Orvid

fbshipit-source-id: a712d117a6d6e2d125ee71f5f8d47ee5ceb3106b
facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request Jul 31, 2023
Summary:
This follows up on a discussion in facebook/folly#1949 and the comment by ot that the C++ standard for folly should be (easily) settable by a parent project.

There are two commits in this PR:

- The first commit removes the redundant `CXX_STD`, which causes `-std=` to be passed to the compiler twice.
- The second commit introduces a check before setting `CMAKE_CXX_STANDARD`, so a value set by a parent project won't be overridden.

X-link: facebook/folly#2005

Reviewed By: Gownta

Differential Revision: D47343926

Pulled By: Orvid

fbshipit-source-id: a712d117a6d6e2d125ee71f5f8d47ee5ceb3106b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants