-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
double free or corruption (fasttop) #92
Comments
Both this issue and #91 happen in the same program. The program creates a few message callbacks (single threaded), a thread to do the heavy computation, and a transform listener. As a result the program contains 3 threads. Also one of the message callbacks broadcasts transforms. I have a bunch of mutexes to protect my data, but I am assuming that the tf buffer is thread safe. |
Can you provide any more info on what to do to reproduce this? It looks like it's crashing inside a push_back on a std::vector, which makes corruption much more likely. |
@tfoote it seems that we both posted at the same time. I am not sure you saw the description about what my program. And I just updated it. This is a very high level description and probably not enough information. I am not sure what else to say though. When you say it makes corruption more likely what do you mean exactly? A stray pointer in my code that corrupts the tf buffer? |
Yeah, I didn't see your comment. The tf buffer is supposed to be thread safe. I was mostly referring to it not being a double free. There are two possibilities, one is that the vector operations are not all properly protected with a mutex and two things are adding/removing iterating at the same time, or yeah it could be a complete smash where a pointer is walking into the wrong area. I'll dig in and find all the instances of operating on that vector and audit them for being properly protected. |
I got another crash:
Here is the backtrace for that thread:
And for each thread:
|
and for the original double free problem:
|
I found a possible "off the end of a vector" issue: In tf2, BufferCore::getFrame reads:
Since frames_ is an std::vector, this has the possibility of referencing beyond the end. I'd say that the > should be >=. |
similarly here https://github.com/ros/geometry_experimental/blob/indigo-devel/tf2_ros/include/tf2_ros/message_filter.h#L367 shouldn't it be |
All values are valid in the vector. Return the empty object if it is out of range. The zero value is initialized to an empty pointer. No need to allocate a new one. And the value is unsigned so only values too high are invalid.
I've patched the getFrame indexing range check in 729a653 The message_filter.h queue size is correct as is. Adding one to the queue size has to overflow the queue to warrent poping values. As an example if the queue is empty and the queue is size 1. If it evaluates equal it will try to pop off from an empty queue which will not work. It could be |
… clear the vector so it's not saving allocations. And that vector was not lock protected, for #92
I've moved the vector from a member variable onto the stack. That member was not previously threadsafe. Please try thread fix#92 with commit fc80e5b |
@bricerebsamen sorry forgot to ping you for the above |
I tested that patch without #101. It resulted in instant crash: |
Shoot, I missed this comment when I merged #100. |
Looking at that backtrace, we're in setTransform and the only lock in there is the frames_mutex_ (https://github.com/ros/geometry_experimental/blob/729a6531929203720f4a518ac9e79321fe3c5537/tf2/src/buffer_core.cpp#L248) which is a member of the class so should definitely not be null.
To me this looks like it's getting clobbered. I think that the reason for the change due to #100 is that it's changing the memory layout of the class. |
I had a look at it and I tend to agree with @tfoote that the tracebacks don't seem to be related to #100. Further more that #100 would not cause any memory corruption issues. The only thing that could tie #100 to the tracebacks here, imo, is if #100 some how exposed an existing bug in an unrelated part of the code. That being said, I think it is extremely unlikely that this is the case. |
@bricerebsamen , does that still happen with master ? |
I don't have the means to test that anymore, sorry
|
move lct_cache into function local memory
I migrated some of my code to tf2 and it crashes from time to time. I am running the latest version from source. Here is the backtrace:
The text was updated successfully, but these errors were encountered: