Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on InotifyBackend::handleSubscription #49

Open
paeolo opened this issue Mar 4, 2021 · 4 comments
Open

Segfault on InotifyBackend::handleSubscription #49

paeolo opened this issue Mar 4, 2021 · 4 comments
Labels
bug Something isn't working confirmed bug

Comments

@paeolo
Copy link

paeolo commented Mar 4, 2021

🔦 Context

Hi,
I am using @parcel/watcher as part of one of my personal project, and I noticed that my program crash sometime without any error message from node.

After some investigation it appears that @parcel/watcher is segfaulting on this line:

bool InotifyBackend::handleSubscription(struct inotify_event *event, std::shared_ptr<InotifySubscription> sub) {

😯 Description

I can't reproduce the issue, and sometime I don't have it for a few days. So what I did is I added an extensive logging to my app so I can get a hint.

One hour ago , I got the crash again so this time I looked at my logs and found this:

[paul@latitude-5410 ~]$ cat /var/log/onemon/onemon.07e9d5ae84.socket.log
...
{
  message: 'CHILD EVENT',
  event: 'close',
  code: null,
  signal: 'SIGSEGV',
  level: 'info',
  timestamp: '2021-03-04 15:10:51'
}

So now that I know I have a SEGFAULT at 15:10:51, I looked at my core dumps (I run on Linux):

[paul@latitude-5410 ~]$ coredumpctl list
...
Thu 2021-03-04 15:10:51 CET   36842  1000  1000  11 present   /usr/bin/node

Then I looked at the stack trace:

[paul@latitude-5410 ~]$ coredumpctl gdb 36842
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd50a5e9e3b in std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::count(std::string const&) const ()
   from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
[Current thread is 1 (Thread 0x7fd4f2ffd640 (LWP 36860))]
(gdb) bt
#0  0x00007fd50a5e9e3b in std::_Hashtable<std::string, std::string, std::allocator<std::string>, std::__detail::_Identity, std::equal_to<std::string>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::count(std::string const&) const ()
   from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#1  0x00007fd50a5e777f in InotifyBackend::handleSubscription(inotify_event*, std::shared_ptr<InotifySubscription>) ()
   from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#2  0x00007fd50a5e91e3 in InotifyBackend::handleEvent(inotify_event*, std::unordered_set<Watcher*, std::hash<Watcher*>, std::equal_to<Watcher*>, std::allocator<Watcher*> >&) ()
   from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#3  0x00007fd50a5e959c in InotifyBackend::handleEvents() () from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#4  0x00007fd50a5e98f8 in InotifyBackend::start() () from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#5  0x00007fd50a5cf9e6 in ?? () from /home/paul/dashboard.server/node_modules/@parcel/watcher/prebuilds/linux-x64/node.napi.glibc.node
#6  0x00007fd513060bc4 in std::execute_native_thread_routine (__p=0x560f06d17210) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#7  0x00007fd512e18299 in start_thread () from /usr/lib/libpthread.so.0
#8  0x00007fd512d41053 in clone () from /usr/lib/libc.so.6

Unfortunately, I am not able to reproduce the issue and most of the time my program don't crash, I am not even able to tell you if there is a pattern or whatever.

One thing I can tell you is that I am using multiple calls to subscribe to watch different directories in the same time, so maybe this segfault is triggered by one of the watcher when some file is updated.:

exports.subscribe = async (dir, fn, opts) => {

On the other hand, I can provide you a complete core dump here so you can use gdb or whatever to have more info on this segfault.

🌍 Your Environment

Software Version(s)
parcel_watcher 2.0.0-alpha.9
Node v14.15.4
Operating System archlinux 5.11.2-arch1-1
@DeMoorJasper DeMoorJasper added bug Something isn't working confirmed bug labels Mar 4, 2021
@devongovett
Copy link
Member

Thanks for the report. One thing you could try would be to use a debug build to get a more accurate backtrace. You could do this by cloning the repo locally and running yarn && yarn rebuild (or npm install && npm run rebuild). Then use yarn link/npm link to link the local version into your project. Then you can run your program inside gdb or lldb and get a backtrace hopefully. Thanks in advance!

@paeolo
Copy link
Author

paeolo commented Mar 10, 2021

Sorry I provided a coredump thinking you could use it, but now I realise it's not really possible. I'll try to get a coredump with a debug build and extract meaningful information from it, like full stack state, local variables, and precise line numbers, and I'll come back to you then but it might take some time since I don't know when the next crash will happen:(

@paeolo
Copy link
Author

paeolo commented Mar 10, 2021

I have a coredump for the debug build now. I looked a bit with vscode inside it and found the segfault to be there:

if (watcher->mIgnore.count(path) > 0) {

The reason is that the watcher mIgnore property is corrupted. But also much more interesting is that the mDir is also corrupted, at this point in the code it is equal to m\xa5z:\xf8U.

I guess at some point the watcher data has been corrupted, but I have no clue yet on why it happened. If you want more detail or have any idea on where to look I'm all ears!

crash

@paeolo
Copy link
Author

paeolo commented Mar 11, 2021

@devongovett Some more insight.

Just to add some context, the way I use this library is having multiple subscriptions to multiple directories. When I start my application, I start watching these directories with one subscription for each:

.
├── a
├── b
├── c
├── ...

So basically, what I do is this somewhere in my code:

const subscriptions = await Promise.all(
  directories.map(
    (directory) => watcher.subscribe(directory, (err, events) => processFn(directory, err, events)
  )
);

Then later my code will produce some artifacts, and when everything is done I want to save a snapshot of the filesystem. So it looks like that:

const processFn = async (directory, err, events) => {
  // Do some stuff
  await processEventsAsync(events);

  // Save a snapshot
  watcher.writeSnapshot(
    directory,
    someFilePathThatDontInterfereWithWatchingInAnyWay
  )
}

So first remark here, I know that what I do differs with what you do with parcel (v2) since in parcel there is two assumptions:
(A1). You watch only one directory: projectRoot (if I remember well).
(A2). You write a snapshot only once, when parcel is exiting.

I have my reasons to save a snapshot everytime file changes, cause my code rely heavily on caching and I didn't want to pay the price of my code exiting too soon without updating the snapshot.

With this context in mind, what pops in my eyes now in the screenshot is that:

mWatched: false

Looking at the code it means that either mWatched were never true, either this line has been called:

mWatched = false;

There are many places where this code can be called but not that much. Given the context, the place that I find interesting is this one:

watcher->unref();

Indeed, recall that the segfault is on this line:

if (watcher->mIgnore.count(path) > 0) {

So it means that we are currently processing some input events has part of the process of watching.

But like the coredump tells us, it seems the watcher instance is corrupted, ie. maybe at this point the C++ object has been destroyed because of an unref calls, which will trigger this line:

removeShared(this);

So maybe this segfault is really a race condition between a subscription beeing active, and calls to writeSnapshot, all of them with the same dir and ignore parameters.

If this is the case, the reason why my code crash from time to time is that I am violating condition (A2), which can lead to this segfault.

What do you think ?

EDIT: another coredump today, same problem with also a corrupted watcher. Also I realized that having the mWatched property true or false doesn't mean anything since the watcher is corrupted.. But I still think there is something true about my reasoning, waiting for your input;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed bug
Projects
None yet
Development

No branches or pull requests

3 participants