Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support excluding directories containing a file #90

Closed
chrahunt opened this issue Jan 26, 2019 · 7 comments
Closed

Support excluding directories containing a file #90

chrahunt opened this issue Jan 26, 2019 · 7 comments

Comments

@chrahunt
Copy link

Many backup software tools provide a means to exclude directories when they contain a particular file. For example:

We can better support the use case of evaluating files for backup by implementing a capability to exclude scanning and showing directories containing a specific file.

@shundhammer
Copy link
Owner

Let me think about this.

It will be a bit difficult to implement because when that file is detected, other files in that directory will already have screwed up the accumulated values, so the sums in that subtree will have to be recalculated, or that directory's content will have to be subtracted again.

Also, the configuration for exclude rules will become somewhat more complex because of this.

@shundhammer
Copy link
Owner

shundhammer commented Mar 15, 2019

First prototype in branch huha-exclude-dir-with-file.

Works so far, but since this required some refactoring in a very sensitive part (the directory reading in the LocalDirReadJob class), this will need some extensive testing.

Limitations: So far, "Read Excluded Directory" does not work yet.

exclude-dir-with-file

@shundhammer
Copy link
Owner

shundhammer commented Mar 15, 2019

OK, restarting reading such an excluded directory now works as well.

Please do some testing with this branch!

@shundhammer
Copy link
Owner

Please test.

@chrahunt
Copy link
Author

Everything seems to be working as expected. Functionality checked:

  • excludes directories when fixed string, wildcard, and regex "Exclude Any File in that Directory" entries match a file within the directory
  • as above, with multiple matching files in a directory
  • as above, with filesystem mount point within a directory
  • as above, with matching file immediately under filesystem mount point
  • as above, with 😁 as part of the pattern/filename
  • "Exclude Any File in that Directory" does not check against child directory names

Notes:

  1. Currently the size of the directory entry itself is included in the parent size calculation when the directory is excluded (e.g. an empty directory registers as 4.0 kB). This aligns with the existing Exclude Rule behavior. I don't think it makes sense for my original use case, since a completely excluded or empty directory in the backup archive would take no/little space compared to one with many children.

@shundhammer
Copy link
Owner

OK, thank you for testing.

Not sure about the directory's own size in the total size sums; for one thing, it does exist, and it is listed (mostly to show you that there is something that was excluded and so you have something to select if you wish to continue reading there anyway). Similar to a mount point, that directory might or might not go to a backup medium, depending on the exact constellation and use case.

For another, the total size might not correspond exactly with backup size anyway: The backup medium might easily have another granularity (block size / cluster size), for another, fragments of files are handled in a number of different ways in different filesystem types, and none of them ever specifies how they are accounted. If you have a cluster size of 4k and a file of 4k + 1 Byte, does it consume 2 clusters, i.e. 8k? Or does the filesystem know a clever trick to be less wasteful with those fragments? And what about symlinks?

And, much worse, what about i-nodes? All those things vary wildly between filesystem types, and most of it is unspecified. So overengineering this with trying to be exact to some few sectors does not make sense anyway.

So, for the time being, those own sizes of excluded directories will be accounted as they are; it would add quite some complexity and also possibly performance impact to try to be precise down to the byte level where that precision is not achievable anyway.

@shundhammer
Copy link
Owner

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants