Skip to content

Commit

Permalink
Merge pull request #86 from arberg/master
Browse files Browse the repository at this point in the history
Updated filters appendix
  • Loading branch information
kees-z committed Jan 8, 2022
2 parents ceed504 + 52cedc9 commit c1a0fec
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 1 deletion.
57 changes: 56 additions & 1 deletion docs/appendix-d-filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Duplicati's filter engine processes folders first then files. The reason for tha

When filter rules have been defined the first folder is taken and the filter rules are processed one by one. The first rule that matches is applied and the following rules are not processed anymore. For instance, if the first rule excludes a folder, then this folder and all files within will be excluded from the backup even if following rules include this folder or its files. Likewise, if the first rule includes a folder, then it will be included even if a following rule would exclude it.

It is recommended to write folder rules first and file rules afterwards. That way rules are written in the same order as they will be effective when Duplicati processes them and Duplicati's filters are easier to understand that way.
It is recommended to write folder rules first and file rules afterwards. Also it is recommended to write the folder rules one directory level at a time. That way rules are written in the same order as they will be effective when Duplicati processes them and Duplicati's filters are easier to understand that way.

Per default, all files and folders will be backed up. That means, if no rule matches, the file or folder will be included. In the special case that all rules are include rules (which does not make sense when all files and folders are included per default) Duplicati assumes that all other files and folders are meant to be excluded (this had to be defined as another rule in Duplicati 1.3 but most people found that confusing so we changed that in Duplicati 2.0).

Expand Down Expand Up @@ -47,6 +47,12 @@ In the UI, filters can be created using drop down boxes for common rule types. M

Using the command-line there are specific settings to specify include or exclude rules. These are `--include` and `--exclude`. Multiple rules can be specified by using `--include` or `--exclude` repeatedly.

### Creating and validating your filters

Duplicati UI updates the file and folder include/exclude icons on the fly to reflect the current filters. A green check-mark indicates that the folder will be traversed by Duplicati, but its content may be excluded by other filters.

![Filter example](duplicati-filters-match-example.png "Filter example")

### Settings

Besides filter rules there are settings that can exclude specific files by their attributes. Those settings are `--skip-files-larger-than` and `--exclude-files-attributes`. The latter is able to exclude files that have any of the following attributes: `ReadOnly`, `Hidden`, `System`, `Directory`, `Archive`, `Device`, `Normal`, `Temporary`. Those settings are applied to all files of the backup.
Expand All @@ -61,3 +67,52 @@ Besides filter rules there are settings that can exclude specific files by their

**Include some files, exclude others.** Now let's define a filter that does both of the above. First it excludes @eaDir specifying `-*/@eaDir/`. Then it includes only JPG files specifying `+*.jpg`. The problem here is, that Duplicati includes all files and folders per default. This means that e.g. /photos/movie.avi will also be part of the backup. To make the including rule effective an additional rule is required that excludes all files that do not match any of the current rules. The filter must say "exclude this, exclude that, include this but nothing else". The best rule for "but nothing else" is a regular expression that excludes all files. It is `-[.*[^/]]` on Linux or Mac, and on Windows the rule is `-[.*[^\\]]`. The rule says "exclude everything that is not a folder". The final filter then is `-*/@eaDir/ +*.jpg +*.jpeg -[.*[^/]]`. Duplicati will process all folders but @eaDir/ and it will include JPG and JPEG files but exclude all other files.


**Advanced regular expression filter example**

* Suppose we want `/mnt/(user|disk\d+)/media/.*` but not `Movie.*` folders within `media/`
* This includes `/mnt/user/media/X`, `/mnt/disk23/media/Y`, but not `/mnt/user/media/Movie/DieHard.mkv`.
* We don't want `/mnt/user0/.*`, or `/mnt/user/secrets`

Duplicati applies the filters to a folder before its children, searching for the first filter-line matching the folder. If a parent path matches an exclude then that whole tree is cut off. Conceptually its easiest to build the expressions by starting at the top of the folder hierarchy and move down one level including/excluding the desired files. So let us first match the parent folders we want to be processed, then remove those we don't want. Then we add the subfolder we want, and exclude all others.

* Source set: /mnt/
* `+[/mnt/(disk\d*|user)/]`
* `-[/mnt/[^/]*/]`
* `+[/mnt/[^/]*/media/]`
* `-[/mnt/[^/]*/[^/]*/]`
* `-[/mnt/[^/]*/media/Movie[^/]*/]`
* `+[/mnt/[^/]*/media/[^/]*/]`
* `-[/mnt/[^/]*/.*\.log]`

Note that `.*` matches anything, and `[^/]*` matches anything NOT containing a `/` (linux path separator).

* The first two lines match the root folders in our `/mnt` source-set, including only folders like `disk123` and `user`.
* The next two lines include only media subfolder in the already included. Notice how `[^/]*` does not match a path-separator `/`
* The 5-6 lines exclude directories like `Movie` and `MovieSeen`, but includes the rest
* The last line excludes all files with `.log` extension (case-sensitive). Here we use `.*` to match including paths. An equivalent alternative is `[/mnt/[^/]*/.*/[^/]*\.log]` where the last path separator is included so `[^/]*\.log` matches a filename. Or just `[.*\.log]` to skip all that verbosity (and stay in the reg-exp world, which isn't requried).

Notice also in lines 5-6 we stay on one directory level, we do not match `Movie.*`. Both would be valid, The reason being that it is then easier to remember that Duplicati works through directories a level at a time.

The 6. filter (include) can be omitted, as files which do not match a filter are included (unless all filters are include filters).

Here we utilize that in Duplicati all folders end in / (in Linux), while a file does not end in / (for Windows its backslash).

# Testing Filters with command line tool

Filters can be tested with the command line tool, see https://github.com/duplicati/duplicati/wiki/Headless-installation-on-Debian-or-Ubuntu and https://duplicati.readthedocs.io/en/latest/04-using-duplicati-from-the-command-line/

For instance test the above example on a [linuxserver docker](https://hub.docker.com/r/linuxserver/duplicati) or [duplicati/duplicati](https://hub.docker.com/r/duplicati/duplicati) with

```
docker exec duplicati mono /app/duplicati/Duplicati.CommandLine.exe test-filters /mnt/ \
--include="[/mnt/(disk\d*|user)/]"\
--exclude="[/mnt/[^/]*/]"\
--include="[/mnt/[^/]*/media/]"\
--exclude="[/mnt/[^/]*/[^/]*/]"\
--exclude="[/mnt/[^/]*/media/Movie[^/]*/]"\
--include="[/mnt/[^/]*/media/[^/]*/]"\
--exclude="[/mnt/[^/]*/.*\.log]"
```

**Building Regular Expressions:** There are lots of online services such as [Skinners RegExp engine](https://regexr.com) to help build correct regular expressions.
Binary file added docs/duplicati-filters-match-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c1a0fec

Please sign in to comment.