Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when using cleanup action with AssumeDeleted after a cleanup with RefreshParent #149

Closed
brunetton opened this issue Dec 15, 2020 · 10 comments
Labels

Comments

@brunetton
Copy link

brunetton commented Dec 15, 2020

Hi,

I'm using this great tool for a while now, and I defined some actions that I often use.

Those days I'm facing a segfault in this case:

  • I mount my internal phone memory in ~/Android directory (go-mtpfs ~/Android) - not sure if the fact that it's a FUSE directory have something to do with the problem

  • I defined a custom action that move current file to my local hard drive:

    [Cleanup_02]
    Active=true
    AskForConfirmation=false
    Command=mv %p /media/bruno/data/a_trier/
    Hotkey=Ctrl+Shift+D
    OutputWindowAutoClose=false
    OutputWindowPolicy=ShowIfErrorOutput
    Recurse=false
    RefreshPolicy=AssumeDeleted
    Shell=/bin/sh
    Title=Move to Data/
    WorksForDir=true
    WorksForDotEntry=false
    WorksForFile=true
    
  • I browse ~/Android to sort videos in my phone

    • I open each video in VLC using this action:
      [Cleanup_03]
      Active=true
      AskForConfirmation=false
      Command=vlc %p
      Hotkey=Ctrl+Shift+V
      OutputWindowAutoClose=false
      OutputWindowPolicy=ShowNever
      Recurse=false
      RefreshPolicy=NoRefresh
      Shell=/bin/sh
      Title=Open in VLC
      WorksForDir=false
      WorksForDotEntry=false
      WorksForFile=true
      
    • sometimes I want to delete the file (I use standard ctrl-delete)
    • sometimes I want to keep the file and copy it to my local hard drive (I use my custom action Cleanup_02 using Ctrl+Shift+D shortkey)

When I launch the copy to hard drive action (Cleanup_03) this sometimes leads to a crash.
A log example coming

Qdirstat commit cb2ca90

@brunetton
Copy link
Author

Here's a log example: qdirstat.log

@shundhammer
Copy link
Owner

A wild speculation from my side what could have happened is this:

You start the vlc media player for each of those files and tell QDirStat's output window to never open. Yet, it will probably (I'll have to check that; it's been a while) remain active to capture stdout and stderr of the started cleanup action. It might also store a pointer to the FileInfo object representing the video that vlc is trying to display; probably for as long as the started process is active.

If in the meantime you decide to delete that video from the QDirStat main window, QDirStat's DirTree will refresh the parent directory branch; i.e. it will delete all FileInfo objects in that subtree, invalidating the pointers to those objects.

The OutputWindow may still be active at that time since vlc may still be running; even if you terminated it, it might still take a short while to really shut down. The OutputWindow will be deleted only after the last process that it started exited; then it will execute its destructors (and those of its child objects).

If any one of those child objects still tries to access a pointer to that FileInfo object that is now invalid, there will be a segfault.

Let me check this.

@shundhammer
Copy link
Owner

Please check if the problem appears only when you start those actions in rapid succession, or also when you give vlc a bit of time (some seconds) to really shut down.

Also, does the problem only appear when vlc reports errors? Try tail -F on the log file in a separate terminal window in parallel; it should report any errors.

I just checked: It is entirely possible to have several cleanup actions running in parallel. This might be part of the problem. A cleanup action with sleep 15 lets me start several at them with a separate output window for each.

The problem is that it's not trivial to prevent this without running the risk of locking up all cleanup actions completely when one of them simply never returns (or just runs for a long time).

@brunetton
Copy link
Author

Thanks for your super-complete answer !

Please check if the problem appears only when you start those actions in rapid succession, or also when you give vlc a bit of time (some seconds) to really shut down.

I'll check the vlc shotdown with pgrep and come back to you

Also, does the problem only appear when vlc reports errors? Try tail -F on the log file in a separate terminal window in parallel; it should report any errors

I didn't noticed that vlc reported errors, this is actually a track to follow

@shundhammer
Copy link
Owner

shundhammer commented Dec 16, 2020

Commit 2a484d5 fixes a segfault if a cleanup action has refresh policy assume deleted, and an item and any of its ancestors were selected:

It iterated over all selected items, both ancestor and item, and deleted both from the internal DirTree. If the ancestor came first, the child item was already deleted along with all child items in that subtree, and that invalidated the item pointer, but that pointer was still in the set. When the iteration reached that item, it accessed the (now invalid) pointer, resulting in a segfault.

The fix is to normalize the FileInfoSet of selected items when applying that assume deleted refresh policy, i.e. to remove all items from the set where any ancestor is also in the set.

@shundhammer
Copy link
Owner

Please check if that fixes your problem. It might; but it might also be a completely independent problem.

@shundhammer
Copy link
Owner

Please reopen if the above commit does not fix your problem.

@brunetton
Copy link
Author

I'm really sorry to say that I had the problem again (with commit a7f6d76) :+1

@shundhammer
Copy link
Owner

shundhammer commented Dec 18, 2020

Then please describe in some more detail what you are doing, how you are doing it, and what you are observing.

- Does it only happen when you start that "move to the other disk" cleanup quickly after starting VLC with one of those videos?

- What exactly are you moving to the other disk? Only one video, or the complete directory?

- While you are doing that, is any of those VLC processes still open?

- When does it crash?
- When you start to move a video / a directory?
- When moving a video / a directory is finished?
- When you close VLC started from one of those cleanup actions?

- Does it also crash when you do it very slowly, giving all processes some time to shut down properly?

Edit: Answers no longer needed: I found the problem. See below.

@shundhammer
Copy link
Owner

shundhammer commented Dec 21, 2020

How to Reproduce

I finally managed to reproduce this:

It crashed when you had done a partial tree refresh (like in a cleanup action with the RefreshParent refresh policy) and afterwards deleted a single tree item (like in a cleanup action with the AssumeDeleted refresh policy) in the same directory that had just been refreshed.

It has nothing to do with that VLC custom cleanup action still active, nothing with that Fuse filesystem from your phone being a bit slow.

Cause and Fix

The fix is commit 0fcf1f4: Resetting the _deletingAll flag at the end of the DirInfo::clear() method. That's all.

That flag looks pretty innocuous: Its purpose is to prevent expensive operations for unlinking items of a DirInfo's linked children list when all of them are being removed anyway. Normally, it would search the items list to locate the one that is about to be unlinked and set the next pointer of its predecessor to its own next pointer, thus removing that one item from the linked list.

While that is of course necessary if only one item is unlinked, it is pointless if the whole children list is cleared; in that case, all those iterations are redundant, and the effort to do that is O(n²) which may be quite expensive for larger lists.

The information when to take that shortcut is stored in that _deletingAll flag since DirInfo::clear() does not call DirInfo::unlinkChild() directly, so an additional boolean parameter could not easily be used.

So the flag was set to true at the start of DirInfo::clear(), but it was never set back to false; and that was the problem: If DirInfo::clear() was ever called, the flag was true for the whole life time of that DirInfo object.

Any subsequent call to DirInfo::unlinkChild() (e.g. from DirTree::deleteSubtree() via the AssumeDeleted refresh policy) would then skip removing the item from it's parent's children list, yet delete the associated object, leaving an invalid pointer behind.

The next call to anything that would iterate over those children list was bound to access that invalid pointer, leading to a segfault.

Lesson Learned

Those seemingly harmless flags that control an object's essential behaviour are evil. They tend to sneak up on you from behind and kick you in your backside when you least expect it.

Maybe it's time to get rid of that flag; but the current solution is simple, and that particular flag caused so much trouble that I am sure to not forget it anytime soon. There may be more similar hidden landmines, though.

@shundhammer shundhammer changed the title Random crash (segfault) when using custom action Segfault when using cleanup action with AssumeDeleted after a cleanup with RefreshParent Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants