Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT or SIGSEGV when FileWatcher destroy on Linux #87

Closed
SpartanJ opened this issue Jun 6, 2015 · 8 comments
Closed

SIGABRT or SIGSEGV when FileWatcher destroy on Linux #87

SpartanJ opened this issue Jun 6, 2015 · 8 comments
Labels
bug Something isn't working major

Comments

@SpartanJ
Copy link
Owner

SpartanJ commented Jun 6, 2015

Original report by Mihail Slobodyanuk (Bitbucket: mihail_slobodyanuk, ).


Test code attached (main.cpp).
Only for Linux implemented asynchronous watch thread finalizing. If watcher not waiting in read(), then SIGABRT or SIGSEGV raised.
I made a several experiments with deferred and immediate thread cancellation but have no good results. In finally i rewrite code to synchronous interrupt. I used select() with timeout to check is events present and avoid blocking on read(). See FileWatcherInotify.diff

@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 6, 2015

Original comment by Mihail Slobodyanuk (Bitbucket: mihail_slobodyanuk, ).


Don't know how to attach second file.
Here is main.cpp

@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 6, 2015

Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ).


Hi Mihail!

Sorry, but i can't reproduce the bug with your example file, it only crashes if the fd from FILE* fd = fopen(file.c_str(), "w"); returns NULL and you try to write with that fd. FileWatcherInotify terminates the thread with USR1 signal that stops the read, that it's supposed to be an alternative for the replacement of read with select. What i'm missing?

PD: Yesterday i added something you asked some time ago and i wasn't convinced to add the IN_MODIFY flag in the inotify watcher. Just to let you know :)

Thanks for your help!

Regards

@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 7, 2015

Original comment by Mihail Slobodyanuk (Bitbucket: mihail_slobodyanuk, ).


Hi Martin!

Strange. Here is bundle with static library what i link, executable and core dump. Dump contain following backtrace:

#0  0x00007f0190844890 in std::string::empty() const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00000000004264aa in efsw::FileWatcherInotify::run (this=0x191b1f0) at ../../src/efsw/FileWatcherInotify.cpp:349

The main.cpp is only draft test example. Do not pay attention to possible NULL in fd.

FileWatcherInotify terminates the thread with USR1 signal that stops the read, that it's supposed to be an alternative for the replacement of read with select. What i'm missing?

pthread_kill(USR1) used only for Android. For vanilla Linux used pthread_cancel(). I did not observed errors when cancellation issued at read(). But my test lead to cancellation at handling events.

PD: Yesterday i added something you asked some time ago and i wasn't convinced to add the IN_MODIFY flag in the inotify watcher. Just to let you know :)
Thank you.

@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 7, 2015

Original comment by Mihail Slobodyanuk (Bitbucket: mihail_slobodyanuk, ).


It possible something wrong with mutex release on cancel. Looks like mutex which acquired in row 340, released on cancellation then code continue execution. In that case ~FileWatcherInotify() destroy watchers but watcher thread continue reading from it.
It's differ than my expectation from Linux documentation. It's say pthread_cancel do not implicitly release mutexes but invoke C++ destructors. So it should be locked/unlocked in RAII way. In code release called explicitly. How it should be released on cancel in FileWatcherInotify::run?

To avoid reported SIGSEGV i tried to move thread termination before watcher removing in ~FileWatcherInotify() and made pthread_join() immediately after pthread_cancel(). It that case i switched to unhandled exception. GCC issue special exception type abi::__forced_unwind if PTHREAD_CANCEL_ASYNCHRONOUS set. But this exception type implemented in later versions than CGG 4.2 (Last GCC GPLv2, many 3rdparty toolchains based on it). In CGG 4.2 it can be caught only by catch(...) and MUST be re-thrown.

See pthread_exit-c-and-FATAL-exception-not-rethrown

cancelling-a-thread-that-has-a-mutex-locked-does-not-unlock-the-mutex

@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 7, 2015

Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ).


Fixed issue #47.
Thanks to Mihail Slobodyanuk for the help!

@SpartanJ SpartanJ closed this as completed Jun 7, 2015
@SpartanJ
Copy link
Owner Author

SpartanJ commented Jun 7, 2015

Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ).


Oh, i couldn't reproduce it because i wasn't writing to the test file. Now i saw what you where saying. Everything you said seems right and your patch is the correct solution, so i just push it to the repository ( with two minor change, the wait() for the thread isn't necessary because it's called in the destructor of the thread class, and fixed the indentation also ).

First i thought it was failing only because i was clearing the mWatcher before the other thread ended using it, but changing that ( that was a bug ) didn't fix it. I'm still not sure exactly why fails where it fails, it's probably something with the mutex like you said. But it doesn't matter, your patch works and that's enough for me.

PD: Just for curiosity, are you using efsw in some open source project? I would love to know for what kind of project the library is being used!

Thanks for your help! :)

Regards,

Martín

@SpartanJ
Copy link
Owner Author

Original comment by Mihail Slobodyanuk (Bitbucket: mihail_slobodyanuk, ).


Hi, Martin!

Sorry for the delayed reply - I've been out on vacation for the past couple of weeks and just now catching up. Our team - elepantdrive.com - currently evaluating cross-platform file system watchers for inclusion in software product that is actually not open-source (at least, not at the moment - this could change). We want wide cover of many platform. As of today it's only being used in a development branch and not distributed - we're still checking it out, comparing options, debugging. In general the solution is very well for our requirements. The thinking is that, under the MIT license, it would be ok if we used it for distribution as long as we amended the EULA/terms to include attribution to your good work. If we do move ahead with efsw, we surely will make the appropriate changes.

My apologies about subj issue. Follow-up tests shown issue in patch. select() return results in their arguments. Need move initialization of select() arguments (starting from fd_set rfds;) into the loop.

Thanks,
Mihail

@SpartanJ
Copy link
Owner Author

Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ).


Hi Mihail!

It would be totally fine to use the library for a closed-source software, the only requirement is just to mention that you are using the library. I really don't care much about this things, just a mention would be a good gesture. I'm super exited to know that this could be used in a project like Elephant Drive, let me know if this finally happens.
Thanks for the tip about the select, i moved the initialization after the loop as you said.
Also, regarding the Issue #40, i still couldn't find a proper way to solve it.

Regards,

Martín

@SpartanJ SpartanJ added major bug Something isn't working labels Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

1 participant