Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header file <pqxx/pqxx> seems to cause: free() double free issues #681

Closed
dnewtonrichards opened this issue May 3, 2023 · 29 comments · Fixed by #683
Closed

header file <pqxx/pqxx> seems to cause: free() double free issues #681

dnewtonrichards opened this issue May 3, 2023 · 29 comments · Fixed by #683

Comments

@dnewtonrichards
Copy link

dnewtonrichards commented May 3, 2023

System: Debian 10 (Buster) X86-64 bit system
libpqxx built with gcc/g++ version 12.2
Postgres version 15.2
The following code produces this message: 'free(): double free detected in tcache 2'

#include <iostream>
#include <pqxx/pqxx>  // without this header the problem goes away
int main(int argc char *argv[])
{
    return 0;
}
@jtv
Copy link
Owner

jtv commented May 3, 2023

Thank you for reporting this. (I edited your message slightly to stop Github from trying to interpret the code as Markdown.)

Did you also get a nonzero return code, or did the program seem to complete successfully?

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 3, 2023 via email

@jtv
Copy link
Owner

jtv commented May 3, 2023

Ah, I'm not actually sure that I looked closely enough at the output. I was expecting to see an error as well. I'll try again.

My best guess at an explanation for now is that it's something to do with destruction and cleanup of a global constant, after main() returns. Which can be hard to debug because "the code" isn't really running when that happens.

In that murky execution environment it's possible to have subtle bugs, whether in libpqxx or in the infrastructure such as the libc implementation, or things that debugging tools can't quite oversee. I've had false positives of this kind even from very good tools (such as valgrind). So it's possible the message will just go away with a package upgrade — though of course we can't count on that.

I'm kind of clutching at straws here, but it might be helpful to know...

  1. Does this indeed happen after main() completes?
  2. Can we narrow down which libpqxx headers trigger the message?

If I can reproduce the problem on my end now, then it's probably easiest if I check these.

@jtv
Copy link
Owner

jtv commented May 3, 2023

Oh, another question! Did you build using CMake, or using the configure script?

@jtv
Copy link
Owner

jtv commented May 3, 2023

The questions don't stop. :-)

  • In Buster I see only much older versions of libpq and gcc. Even Bullseye doesn't have those versions. Did you mean Sid?
  • What compile options did you set? Did you compile as C++17, or C++20, or something else?
  • Did you link statically or dynamically?

@jtv
Copy link
Owner

jtv commented May 3, 2023

Unfortunately I couldn't bring up a docker container for Debian's Stable or Unstable with the right packages — a package failed to install. So can't try those right now.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 3, 2023 via email

@jtv
Copy link
Owner

jtv commented May 3, 2023

Thanks. That should help me reproduce the problem once I get a working Debian image with the right versions.

There's one more thing that might be worth checking: whether this message also occurs when you include libpq's header libpq-fe.h, and link to libpq, and make no reference to libpqxx.

@jtv
Copy link
Owner

jtv commented May 3, 2023

About the 134... Are you saying that the program's return code was 134? From what you said earlier I got the impression that the return code was 0.

I think on Linux systems the higher return codes are normally used when the program terminates because of a signal.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 3, 2023 via email

@jtv
Copy link
Owner

jtv commented May 3, 2023

Ah, this is a terminology problem.

Your program's return code then is 134. It's completely usual for main() to return 0, but that's not always what ends up determining the return code.

What's annoying about this is that when I tried to reproduce the problem before, I would definitely have noticed if the return code was non-zero. So we're back to me being unable to reproduce the problem.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 3, 2023 via email

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 4, 2023 via email

@tt4g
Copy link
Contributor

tt4g commented May 4, 2023

This appears to be the same problem.
https://stackoverflow.com/questions/31200522/double-free-or-corruption-when-using-old-glibc-and-libstdc-library-versions

Would building pqxx again with libstdc++ work around the problem?

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 5, 2023 via email

@tt4g
Copy link
Contributor

tt4g commented May 5, 2023

It is not important to build with libstdc++.
I wanted to tell you that there is a known problem with core dumps caused by different versions of libstdc++ that the library and the application are linked to.
Please read the Stack Overflow question.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 5, 2023 via email

@tt4g
Copy link
Contributor

tt4g commented May 5, 2023

I suspect that it is undefined behavior because of the different libstdc++ versions.
I wanted you to try to see if rebuilding libpqxx with the current libstdc++ would work around the problem.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 5, 2023 via email

@jtv
Copy link
Owner

jtv commented May 5, 2023

So it's possible that there's a bug there. But complacency is death, so I'm still eager to figure this one out.

@dnewtonrichards another debugging question... what happens if you include <pqxx/pqxx> but don't link to either libpqxx or libpq?

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 5, 2023 via email

@jtv
Copy link
Owner

jtv commented May 5, 2023

AAAAAaaaaaighh!

I have reproduced the problem.

It only happens when linking dynamically.

On non-Windows systems I don't recommend using libpqxx as a shared library in any case; C++ ABIs are just too detailed to make backward binary compatibility very useful.

So as a workaround, I would suggest: get rid of any .so versions of libpqxx, do a make clean to get rid of any hidden shared library build inside the build directory, then configure libpqxx with --disable-shared, re-build, and re-install.

(I do have regular automated tests set up on Debian unstable, using libpqxx as a shared library — but at the moment that build uses clang, not gcc! And the problem happens with gcc only. ☹)

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 5, 2023 via email

@jtv
Copy link
Owner

jtv commented May 5, 2023

When I do the broken build (gcc 12 with --enable-shared) but also configure --enable-audit for extra run-time checking, it finds a memory leak:

Direct leak of 31 byte(s) in 1 object(s) allocated from:
    #0 0x7f157c72d4c8 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
    #1 0x56090e94b717 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) (/libpqxx/test/.libs/runner+0xb77717)
    #2 0x7f157c134df6 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<std::allocator<char> >(char const*, std::allocator<char> const&) (/libpqxx/src/.libs/libpqxx-7.8.so+0x2c4df6)
    #3 0x7f157c1315d3 in __static_initialization_and_destruction_0(int, int) (/libpqxx/src/.libs/libpqxx-7.8.so+0x2c15d3)
    #4 0x7f157c13188f in _GLOBAL__sub_I_array.cxx (/libpqxx/src/.libs/libpqxx-7.8.so+0x2c188f)
    #5 0x7f157cd20abd in call_init elf/dl-init.c:70
    #6 0x7f157cd20abd in call_init elf/dl-init.c:26

SUMMARY: AddressSanitizer: 31 byte(s) leaked in 1 allocation(s).

Of course a leak is kind of the opposite of a double-free. But again this problem happens with the shared library, not with the static one, which makes me think it's related.

I think the search is on for suspicious handling of a std::string with static lifetime. I'll try bisecting headers and code.

@jtv
Copy link
Owner

jtv commented May 5, 2023

Found it!

Turns out <pqxx/internal/encodings.hxx> is a good test case: if I #include that (along with some harmless boilerplate that it requires), the test crashes — but the header files that it in turn includes are all fine.

So there's something in internal/encodings.hxx that triggers the problem. And the only line I need to remove to fix it is...

PQXX_DECLARE_ENUM_CONVERSION(pqxx::internal::encoding_group);

The offending part of PQXX_DECLARE_ENUM_CONVERSION() (a macro defined in strconv.hxx) is:

    template<> inline std::string const type_name<ENUM> \
    { \
    #  ENUM \
    } \

(The # ENUM looks a bit weird: that's just #ENUM, to stringize the name ENUM using the preprocessor, but my formatter seems to treat that like a preprocessor directive.)

If I remove the definition's inline keyword, the problem goes away. Even when including all of <pqxx/pqxx>. What a fuss over such a little thing!

jtv added a commit that referenced this issue May 5, 2023
Fixes #681.

WHen building libpqxx as a shared library on Debian with gcc 12, any
program that included libpqxx headers (with a few exceptions) would
crash after `main()` returned.

There was an error message:

_free(): double free detected in tcache 2_

The problem turned out to be triggered by the inlining of any
specialisation of `pqxx::type_name`, as we did in the macro
`PQXX_DECLARE_ENUM_CONVERSION`.  There's a header that uses that macro,
thus triggering the problem.
@jtv
Copy link
Owner

jtv commented May 5, 2023

SUCCESS!!! You did it - many thanks for all of your help!!! Perhaps I should install clang? Anyway, I will go with a static build... --Dave

On Fri, May 5, 2023 at 11:14 AM Jeroen Vermeulen @.> wrote: AAAAAaaaaaighh! I have reproduced the problem. It only happens when linking dynamically. On non-Windows systems I don't recommend using libpqxx as a shared library in any case; C++ ABIs are just too detailed to make backward binary compatibility very useful. So as a workaround, I would suggest: get rid of any .so versions of libpqxx, do a make clean to get rid of any hidden shared library build inside the build directory, then configure libpqxx with --disable-shared, re-build, and re-install. (I do have regular automated tests set up on Debian unstable, using libpqxx as a shared library — but at the moment that build uses clang, not gcc! And the problem happens with gcc only. ☹) — Reply to this email directly, view it on GitHub <#681 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7SSVMRBJSKGPV7YEW6I4P3XEUKN5ANCNFSM6AAAAAAXUKQLM4 . You are receiving this because you were mentioned.Message ID: @.>
-- Regards, Dave

No need to change compilers for this... Don't get me wrong, clang is an excellent compiler. But one of those tricks to producing top-quality code is to run it through as many compilers as you can, and pay attention to the errors or warnings they produce.

So... do both. :-)

@jtv
Copy link
Owner

jtv commented May 5, 2023

SUCCESS!!! You did it - many thanks for all of your help!!! Perhaps I should install clang? Anyway, I will go with a static build... --Dave

On Fri, May 5, 2023 at 11:14 AM Jeroen Vermeulen @.> wrote: AAAAAaaaaaighh! I have reproduced the problem. It only happens when linking dynamically. On non-Windows systems I don't recommend using libpqxx as a shared library in any case; C++ ABIs are just too detailed to make backward binary compatibility very useful. So as a workaround, I would suggest: get rid of any .so versions of libpqxx, do a make clean to get rid of any hidden shared library build inside the build directory, then configure libpqxx with --disable-shared, re-build, and re-install. (I do have regular automated tests set up on Debian unstable, using libpqxx as a shared library — but at the moment that build uses clang, not gcc! And the problem happens with gcc only. ☹) — Reply to this email directly, view it on GitHub <#681 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7SSVMRBJSKGPV7YEW6I4P3XEUKN5ANCNFSM6AAAAAAXUKQLM4 . You are receiving this because you were mentioned.Message ID: _@**.**_>
-- Regards, Dave

No need to change compilers for this... Don't get me wrong, clang is an excellent compiler. But one of those tricks to producing top-quality code is to run it through as many compilers as you can, and pay attention to the errors or warnings they produce.

So... do both. :-)

Heh. Case in point: clang doesn't like the change I made! Making the variable non-inline seems to mean that every translation unit that contains the definition gets its own definition, and for some reason they clash. Something like that. It's a bit awkward to debug on my phone.

I think I'll have to see about making it a string_view instead of a string or something.

jtv added a commit that referenced this issue May 6, 2023
Fixes #681.

When building libpqxx as a shared library with gcc 12, any program that included libpqxx headers (with a few exceptions) would crash after `main()` returned.

There was an error message:

_free(): double free detected in tcache 2_

The problem turned out to be triggered by the inlining of any specialisation of `pqxx::type_name`, as we did in the macro `PQXX_DECLARE_ENUM_CONVERSION`.  There's a header that uses that macro, thus triggering the problem.

I had to make enum `type_name` a `std::string_view`.  This pains me.  It changes the type of the variable.  But gcc 12 (in a
shared-library build) won't let me get away with an `inline std::string` and clang won't let me get away with a none-`inline` `std::string`.

A `std::string_view` is easier — trivial destructor, no allocations.
@jtv
Copy link
Owner

jtv commented May 6, 2023

A very timely fix — Ubuntu 23.04 also ships with gcc 12 and has the same problem.

Thanks for the report @dnewtonrichards, and your patience in helping me resolve it.

@dnewtonrichards
Copy link
Author

dnewtonrichards commented May 6, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants