Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double free or corruption error when two instances of an OOT block that uses FFTW is called in GRC #6528

Closed
graceyeung68 opened this issue Feb 15, 2023 · 7 comments

Comments

@graceyeung68
Copy link

What happened?

I have an out-of-tree block in c++ that runs fine in GRC but when I have two of them running simultaneously in a flowgraph, I am getting a "double free or corruption" error. I am not using any free() functions in my code. I use std::vector's in the code which should handle the dynamic memory allocation automatically. The only place I can see in the code is the definition of the make function that calls new to return an sptr as indicated below. What might be the cause of this error? Thank you.

  probeExtract::sptr
  probeExtract::make(const char* filtName, const char* maskName, int mlsrM, int numChan, int sampPerBit, int Tblock, int edgeBufferSize, int structureType)
  {
      return gnuradio::get_initial_sptr
      (new probeExtract_impl(filtName, maskName, mlsrM, numChan, sampPerBit, Tblock, edgeBufferSize, structureType));
  }

System Information

OS: Ubuntu 20.04
GR Installation Method: apt?

GNU Radio Version

3.8 (maint-3.8)

Specific Version

3.8.1.0

Steps to Reproduce the Problem

Two instances of OOT block running simultaneously.

Relevant log output

double free or corruption (!prev)
@willcode
Copy link
Member

@graceyeung68 This is probably something that has to be fixed in the OOT block. There is nothing in the GNU Radio core that limits how many of a certain type of block can exist. There are many flowgraphs in use that have more than one of a certain block. You could post a link to your code on chat.gnuradio.org or the mailing list to see whether someone can help debug.

We can leave this issue open for now in case the error is shown to be in GR. Please close otherwise.

@graceyeung68
Copy link
Author

Thank you. So is the "new" function indicated above that returns the sptr for the class object automatically taken care of internally by python and I won't need to worry about free'ing or deleting any pointers that might be the cause of the error?

Thank you.

@willcode
Copy link
Member

Smart pointers automatically delete the internal object when the refcount reaches 0. No explicit free is required. See
https://cplusplus.com/reference/memory/shared_ptr/

Closing, since this is not a GNU Radio bug.

@marcusmueller
Copy link
Member

@graceyeung68 yes, that should all happen under the hood correctly: Things are a bit more complicated than that. Python handles python objects, and in this case wrappers around these shared pointers which have been amended to have wrappers for all the methods and properties of the underlying class, probeExtract. I don't know what your python binding looks like, but if it's like the rest of the code you typically find in python/yourmodulename/bindings/*.cc, then this C++-sptr-to-python wrapping mechanism just works:

It holds an instance of the shared pointer, which means that the moment the python wrapper is instantiated, that make function is called, and the returned shared pointer is stored in the pyobject.

What a shared pointer is is really only, functionally: a pointer to a control block, and the actual object pointer (as allocated by the new in your code snippet). Every time a sptr is copied, that control block is modified: a reference counter is increased (and ownership of the pointed-to object is shared), and every time such a copy is destructed, the counter is decreased. When at the point of destruction that reduces the reference count to 0, the owned object is destructed and its memory freed.

So, the pointer chain looks a bit like this:

+--------------------+
|control block       | <--- shared_ptr<yourblock>  <----- Pybind-generated pyobject
|         ⋮          |    (potentially many of these)
| - reference count  |                 ⋮
+--------------------+

The shared pointer holds a pointer to the control block, which holds the reference count. When you copy a shared pointer, that calls the shared pointer's copy constructor (or copy assignment operator), the control block pointer is copied, and the reference count is increased. That's the mechanism with which there is "shared" ownership.

The pyobject actually holds a shared_ptr. That's not without challenges: Python's object life time is not as C++'s. In C++ you know exactly when a local object stops existing, in Python, objects which are no longer accessible (for example, because their single name now refers to something else) are garbage-collected at Python's own leisure.

Now, when you tell your python program to actually copy that pyobject, what the wrapper should usually be doing is actually also make a copy of the held shared_ptr, which increases the reference count. So, if the original pyobject is garbage-collected, the reference count can't drop to zero.

Noooow, if something somehow makes a copy of such a pyobject without actually calling the copy constructor of the shared pointer, but by simply actually copying the memory of the shared pointer, then you run into the problem where two things think they are the one reducing the reference counter below 1, and should call the object's destructor, and delete (/free) on the allocated memory.

That should. Not. happen.

It's not even clear it's what happens in your case; the double free problem might come from something else (that problem applies to everything that calls delete internally, that could be std::string etc, but as long as pybind knows about that, there should be no problem). But it did happen to other OOTs in the past. So, the only way forward I see here is that you run python in a debugger (e.g. gdb --args python3 yourflowgraph.py), add a breakpoint or just logging in your objects destructor and constructors and hope to catch where that happens.

@graceyeung68
Copy link
Author

Thank you Marcus for the detailed explanation. I will give the debugger a try. One thing I do notice is the flowgraph crashes at different points in the code in each successive run after each crash giving a different error message such as:
corrupted size vs. prev size while consolidating
double free or corruption (!prev)
double free or corruption (top)
double free or corruption (fasttop)
double free or corruption (out)
malloc(): unsorted double linked list corrupted
or just an error code without a message. And nowhere in my code do I use malloc() or free(). That's why I was wondering if it is something internal or under the hood... Thank you.

@graceyeung68
Copy link
Author

Backtracing shows the crash happens in an fftw3f library function used in the code. Is the fftw3f package compatible with GR? Thank you.

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7dd1859 in __GI_abort () at abort.c:79
#2 0x00007ffff7e3c26e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7f66298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007ffff7e442fc in malloc_printerr (str=str@entry=0x7ffff7f68628 "double free or corruption (fasttop)") at malloc.c:5347
#4 0x00007ffff7e45c65 in _int_free (av=0x7fffac000020, p=0x7fffac55a8d0, have_lock=0) at malloc.c:4266
#5 0x00007fffefe830d9 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#6 0x00007fffefe83c44 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#7 0x00007fffefe83fc4 in fftwf_mkplan_d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#8 0x00007fffefe88e5d in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#9 0x00007fffefe8380b in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#10 0x00007fffefe83a03 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#11 0x00007fffefe83fc4 in fftwf_mkplan_d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#12 0x00007fffefe8d95c in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#13 0x00007fffefe8380b in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#14 0x00007fffefe83a03 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#15 0x00007fffefe83fc4 in fftwf_mkplan_d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#16 0x00007fffefe84043 in fftwf_mkplan_f_d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#17 0x00007fffefe887a3 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#18 0x00007fffefe8380b in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#19 0x00007fffefe83a03 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#20 0x00007fffefe83fc4 in fftwf_mkplan_d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#21 0x00007fffefe88e5d in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#22 0x00007fffefe8380b in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#23 0x00007fffefe83a03 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#24 0x00007fffeff4a583 in ?? () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#25 0x00007fffeff4a758 in fftwf_mkapiplan () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#26 0x00007fffeff4ddd7 in fftwf_plan_many_dft () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#27 0x00007fffeff4d09b in fftwf_plan_dft () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#28 0x00007fffeff4cd9a in fftwf_plan_dft_1d () from /lib/x86_64-linux-gnu/libfftw3f.so.3
#29 0x00007fffe99b04d5 in gr::customModule::probeExtract_impl::ApplyFilterBank(int, gr::customModule::probeExtract_impl::FilterBank, std::vector<std::complex, std::allocator<std::complex > >&, int, int, int, std::vector<float, std::allocator >&, int, int, int) ()
from /usr/local/lib/x86_64-linux-gnu/libgnuradio-customModule.so.1.0.0git
#30 0x00007fffe99b1f69 in gr::customModule::probeExtract_impl::general_work(int, std::vector<int, std::allocator >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /usr/local/lib/x86_64-linux-gnu/libgnuradio-customModule.so.1.0.0git
#31 0x00007ffff0799c03 in gr::block_executor::run_one_iteration() () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#32 0x00007ffff07ed59a in gr::tpb_thread_body::tpb_thread_body(boost::shared_ptrgr::block, boost::shared_ptrboost::barrier, int) ()
from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#33 0x00007ffff07dd818 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#34 0x00007ffff07fc4e8 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#35 0x00007ffff010343b in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0
#36 0x00007ffff7d94609 in start_thread (arg=) at pthread_create.c:477
#37 0x00007ffff7ece133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@graceyeung68 graceyeung68 changed the title Double free or corruption error when two instances of an OOT block is called in GRC Double free or corruption error when two instances of an OOT block that uses FFTW is called in GRC Apr 19, 2023
@graceyeung68
Copy link
Author

graceyeung68 commented Apr 19, 2023

@willcode @marcusmueller Hi, I understand this issue has been closed. But from the previous comment, grc breaks during a call to fftw. I'd like to know if there is anything special that I should watch out for if I am simply trying to use fftw to perform a forward/backward transform. The double free error is occurring during a call to fftw. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants