-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to continue reading at all mount points once #129
Comments
Let me think about this. |
Okay, so I thought about this. I do acknowledge that this might be a useful feature for a number of users. But there are a quite some caveats and problems that need to be overcome:
|
"Real" vs. Pseudo / System FilesystemsEven traditional Unix systems always had special filesystems like
Yikes. What a mess.
This is what I want to see: The real filesystems on my real disks. And the
There might be some people who find that useful; I am not among them. And neither is the vast majority of Linux users. And this output is without using en-vogue technologies like docker containers and shrink-wrap-all-the-world package formats such as snap or flatpak. They all tend to multiply this mess; they get very creative with the use of bind-mounts and/or mounting filesystems multiple times to multiple mount points. |
Bind MountsLinux supports the concept of mounting parts of an already mounted filesystem to another path with the This is very useful in many cases, but it has caveats since the erstwhile directory tree starting with the root directory might become a general graph that contains cycles:
So even traditional file utilities like Neither of those tools does an endless recursion (which was my first suspicion), but it's still an awkward case with awkward handling. Should QDirStat really follow such bind-mounts? I have serious doubts. |
Filesystem Mounted Multiple TimesThis is very similar to the "bind mounts" scenario. On traditional Unix systems this was strictly forbidden, but Linux allows to mount the same filesystem to multiple different mount points at the same time.
It might be challenging to find out which mount point is the primary one, and which ones are just secondary ones. In this example it's simple because the filesystem is mounted again onto itself; in other cases where mutually exclusive mount points are used, is it really possible to tell which one is the primary one and which others are just secondary? Or are they all created equal? When a filesystem If there is a concept like a primary mount point, and that would be In the general case, it would not be easy to tell if a primary mount point is even part of the current directory tree to read. Just imagine one more filesystem between the initial filesystem and this multi-mount filesystem. Yes, it's a pathological case, but it's the pathological cases that create problems. Whichever way is chosen, it will be very confusing to the normal user. |
QDirStat Bug: Wrong Sums for Bind-Mounts and Multi-MountsThere is no endless recursion (no matter whether or not "the cross filesystems" option is set), but the same files and directories are summed up several times, so the result is greatly distorted. This is clearly a bug that should be fixed; I am just not sure if that is realistically possible: The check if a directory is a mount point or not right now checks for the major and minor device numbers of both the parent and the newly found child directory, and if they are different, this is very likely a mount point. Maybe that check is just too simplistic, and QDirStat needs to check first what mount points are known to the system (checking |
Always Continue Reading at Bind-Mounts and Multi-Mounts?The feature request this issue is all about was to allow always continuing to read when a mount point was found. For bind-mounts and filesystems mounted multiple times, this would be a bad idea, however: This would enforce distorting the sums. |
Cleanup Actions and Bind-Mounts and Multi-MountsRecursive cleanup actions might also be a problem; they may or may not work properly if started from a subtree that includes any mounted filesystem. It may become worse if users bind-mount system directories; in a project some 10 years ago I saw developers (not noobs! system-level developers!) wrecking their system because the scratchbox development system that was used for the project had mounted parts of However, this is a general problem, independent of always continuing to read at mount points or not. |
So, where does this leave us? I am pretty sympathetic to the feature request in general, but it's a lot less simple than one might imagine. This needs to be restricted to the "reasonable" cases:
I also never liked that initial directory selection box very much. Personally, I always supply the path to be scanned on the command line (even more so with It might actually be time to greatly improve that initial selection: Away from the predefined Qt directory selection dialog and towards a more dedicated dialog that gives the user the useful choices. Those would include:
The checkboxes would affect only the current program run. Not sure, but maybe also several tabs at the top to make the other views more accessible:
|
BtrfsBtrfs is Not only does it include the functionality of LVM and RAID; it also has subvolumes and snapshots. And, as the biggest complication, shared disk blocks between them so you never quite know what a size reported by Btrfs actually means. Subvolumes are in wide use to get different mount options for different subtrees on the same Btrfs volume (in particular, read-only and copy-on-write). That by itself would not be a big problem, but each subvolume also having its own separate device major and minor number is; that had made it quite hard for QDirStat to figure out what is a genuine mounted filesystem and what is just a subvolume. Snapshots are a built-in backup (kind of) to get back to a previous state. This is useful for system updates or for manual operations by the admin: If any of them turns out to have bad effects, it is possible to do a rollback to a previous snapshot. Snapshots are the typical application for shared disk blocks: Creating a snapshot means adding a reference to each of the disk blocks in the current system to the snapshot and increasing the usage count of that block. Btrfs has built-in CoW (copy on write), so a write access to the filesystem means the disk block is at that moment duplicated and detached from the previous version in the snapshot. While this leads to very efficient storage of an entire filesystem in snapshots, it leads to a huge administrative mess when it comes to calculating the real disk usage: Outside the Btrfs kernel module it is impossible to find out what disk blocks are shared between the live filesystem and any snapshots. The values that Btrfs returns to syscalls like While it is simple for QDirStat to traverse a directory tree even on a Btrfs, the trouble started with determining what is a subvolume and what is a real mount point: While a subvolume should be read, anything else probably should not. Adding even more mount point magic to the mix will probably break this fragile construct in some way or the other. So for Btrfs, this will need to remain simple. |
Network FilesystemsNetwork filesystems such as NFS and Samba (and maybe more) may become a problem: A user scanning his root filesystem and then using the "cross filesystems" option might be blissfully unaware that he is killing the performance of the network and the NFS / Samba servers every time he does that. Such shared filesystems tend to be very large (which is the whole point of having them on central servers rather than distributed on each desktop client), and scanning them completely with a tool like QDirStat will put a huge strain on shared resources such as network bandwidth and file server I/O. If anybody actually wants to do that, he should explicitly request it. The default for scanning network filesystems should be "disabled". Not sure; maybe this should even be a separate setting / checkbox / context menu selection. |
Rather than relying on checkboxes before scanning directories, a different approach just came to my mind: Maybe when a mounted filesystem is found, open some kind of notification (a non-modal pop-up? A separate aread in the main window?) collecting those filesystems so
Maybe something like
This selection would not contain any "weird" filesystems: No system filesystems like |
Great discussion - don't think I have much to add. Adding buttons to the UI is a good idea, I didn't even know pkg: existed and I also usually invoke qdirstat from the shell. By the way, rmlint may be of some inspiration here: It's also a tool that recursively scans directories as quickly as possible. They do a scan through all mount points in the system before even starting to do the following things here https://github.com/sahib/rmlint/blob/3a7d52db5d3ddf82b00e45ea3ead69e8e413c725/lib/utilities.c#L599:
this information is also used to control the number of threads: for all fs that are part of the same disk and that are rotational, only one scan thread is started, otherwise multiple are started per underlying disk etc. they also use fiemaps to be able to order disk read operations by actual location on disk for performance. just random debug output from it:
|
By the way, in your above list of dangerous fs you're missing fuse (which can just do whatever it wants, e.g. return infinite lists of files or at the least a borg fuse mount contains many copies of a fs state just like btrfs snapshots). And then of course, someone might have put a regex in their FAT file system, good luck traversing that: https://github.com/8051Enthusiast/regex2fat |
Okay, here is the first working version; please check. As mentioned before, this also needed a revamp of the "Open Directory" dialog: I was never very fond of the Qt standard directory selection dialog anyway. Not only does this one have this new "Cross Filesystems" checkbox for one program run (but it remains the same during that whole program run), it also lists the mounted filesystems much more prominently: That "Places" list on the left is completely new. This lists only the normal filesystems (including network filesystems such as NFS or CIFS (Samba)), not any system mounts like I am not yet completely happy with the sort order; it should be the mount order which should be the same as in The idea of listing the mounted filesystems so prominently is that this is where you typically need to check the disk usage, not on any random directory in the middle of a filesystem. You can still do that, of course; and it even starts with the current directory. That combo box with the path also has auto-completion for valid paths on the filesystem. And there is still the "Browse..." button that opens the normal Qt directory selection dialog for those (three or four people) who like it. 😄 |
Looks like the sort order is indeed by mount order, but systemd starts multiple mounts in parallel, so it depends on which filesystem is faster; the result isn't always the same the from one system boot to the next. Whatever. |
New and better "Open Directory" dialog: The combo box now no longer has autocompletion (this turned out to be really confusing), but all three widgets now keep in sync with one another: The combo box, the places on the left, and the directory tree. As you type, the corresponding node in the tree is selected. When you click in the tree, the places on the left always show the fileystem that you are on. When you click on the filesystem in the "places" bar, you will go to the mount point of that filesystem. But you can still type (and copy & paste!), it is still validated (i.e. the "OK" button only becomes active when you have a valid path in the combo box); and the combo box items show the parent directories of the current path. Also, note the "Up" button which takes you one directory level up. |
@shawwn wrote:
DO NOT HIJACK AN EXISTING ISSUE FOR SOMETHING COMPLETELY DIFFERENT. Deleting. Seriously, I am working my ass off to keep everything well-organized, well-documented and easy to understand, and people can't be bothered to do the most basic things (that would require one or two mouse clicks) and just dump their problem-of-the-hour into a completely unrelated issue? No way. |
Currently there is an option in settings to always continue reading at mount points, and one to continue reading at a specific mount points.
It would be nice to also have the option to cross all boundaries in the current search (but not by default). For example during the "open directory" dialog, or as a menu option below "continue reading at current mount point"
The text was updated successfully, but these errors were encountered: