-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restic 0.10.0 always reports all directories "changed", adds duplicate metadata, when run on ZFS snapshots #3041
Comments
Sorry, the formatting is a bit weird in my restic command example above. It should read:
|
Can you identify a file that is experiencing this problem, and |
#2823 only changes the behavior for regular files, not directories. |
You can run |
Thank you for the suggestions! The metadata diff just listed every file and directory with a The output of running stat on a contained directory is more insightful. All the fields remain the same between different ZFS snapshots, except Things get weirder when I run stat on files instead of directories. The With this in mind, perhaps a switch can be added to always ignore |
Can you please be more elaborate when you describe this, for example can you commands and output of the stat commands? Would be nice to see what you're talking about here. |
Here is a complete annotated session illustrating how the output of Step 1: Create new ZFS snapshot. Stat a file and a directory in that ZFS snapshot. Take note of value of the first field, which is
Step 2: Backup the contents of the ZFS snapshot. I presume Restic sees changed values for
Step 3: Run the backup again on the same ZFS snapshot. Note that nothing new is added to the repo
Step 4: Destroy the old ZFS snapshot and create a new ZFS snapshot of the exact same file-system. Note how
Step 5: Run the backup one more time. Note that Restic reports all directories as changed and stores new tree blobs for them:
Unfortunately, I'm not sure of any easy way to reproduce this behavior unless you a way to change the underlying Just to cover bases, is there another possible explanation? Or should |
Thanks a lot for that clarity :) We'll have to look at whether it makes sense to look at the device ID. |
Let me point out something that dawned on me in case it isn't obvious to you: The I'm also having second thoughts about whether it is correct that changes to |
The GNU libc manual and The Linux Programming Interface both do. However, st_dev is better thought of as the connection to the device rather than the actual device. For internal disk drives that doesn't matter, but when I unplug my USB disk, plug in a USB stick and then plug in the disk again, the stick gets the disk's former device number and the disk gets a new one. Restic doesn't look at the st_dev field for files because it's not considered in its change detection heuristic. It does not, and cannot have such a heuristic for directories: the timestamps on directories don't reflect changes to the files within, so those would get skipped too. In any case, it still records the metadata change, even for a file that is reported as unmodified (#2823 documents this in some more detail than the current manual). If you think of directories as entirely metadata, the fact that directories still change should make more sense. The situation is a bit strange at first glance, but it usually works well. There's a few possibilities for improvement:
|
I'm running restic 0.10.0 on android (because I can), and the "all files and dirs have changed" situation happens when backing up the the sdcard. I also had an issue similar to this on my desktop, where a file that had a weird |
I'm doing the exact same thing as @stephenedie except with btrfs snapshots and I'm having the same problem. Restic is resaving all the unchanged tree data for every snapshot.
Files use this logic that doesn't include device id: restic/internal/archiver/archiver.go Line 450 in b67b7eb
If I'm reading this right there is no change function for trees, it just relies on hash collisions: restic/internal/archiver/archiver.go Line 162 in 445b845
|
For btrfs I bind mount the latest snapshot to a directory so restic sees a stable path. Commenting out this line solves the issue, and unchanged snapshots are properly detected as unchanged by restic. restic/internal/restic/node.go Line 581 in 41a45ae
Looking at @greatroar's PR and it seems like integrating an an |
I agree, what's the use case for looking at the device ID? It can change very much. Shan't we just stop doing that (instead of introducing an option for it)? |
In the detailed example I gave, I'm storing 17.876 MiB for 341.787 GiB. It's relatively small, but these are all video files with a very high average size. Running this on trees with lots of small files may very inefficient. I tend to agree that Restic should just always ignore and not store/hash |
I can't think of a good reason. It's already ignored for files. Restic is already keying the tree off the absolute path. If that changes, the tree data is repacked. If the device under an absolute path changes (maybe a pre-failure disk gets replaced) and its mounted in the same path, that wouldn't be any reason to repack the tree data. The only reason I thought of an ignore option is "backward compat". If you stop tracking device, everyone would get a one-time tree data repack upon upgrade. I suppose you could automatically include device if an existing repo already has it, but maybe since restic is pre-1.0 now would be a great time to have the one-time change and not carry this baggage? Only users with huge amount of small files would likely even notice, and even then it would only be a one-time annoyance. |
Can I assume that this very same issue will also exist on BTRFS and LVM snapshots? Currently I do not use either, but am tempted to use LVM snapshotting to prepare the volumes before backing them up. After my experiences so far, backing up a snapshot of the drive that contains the eg. mysql database is less error prone than doing an unsnapshotted backup, due to how files are changed in time... |
My 2 cents - it's best to make the file & directory logic similar. I can't imagine why it would be okay to treat a file as cached with a different device ID, but not the folder. So, either restic should add device ID checking to files, or remove it from folders. IMO the latter makes sense to me here. For restic to produce the wrong behavior:
Since the device ID isn't stable to begin with, and there's a given use case (ZFS/btrfs snapshots) that is impacted by this, I would just remove the device ID entirly (my 2 cents) |
I am not sure what the best solution for this would be, but this is exactly the usecase that I want to use restic in. Ie. wount zfs snapshots, and then use restic to back them up :) And in my case, I don't have video files, but am backing up my codes projects, etc. - I have 248644 files and 60264 folders. I agree with #3041 (comment) that either ...
Is there any usecases where it adds any value? I would be happy to know such. Does borgbackup keep track of device id as well? Also, thanks for the project, I am excited to get going! |
Looked at the source for borgbackup, and they do not use the Description can be found here - https://github.com/borgbackup/borg/blob/3ad8dc8bd081a32206246f2a008dee161720144f/src/borg/archiver.py#L3320-L3329 I suggest we remove the Can any contributors comment on this? :) I will make a PR if it sounds OK. |
Is borg's |
Sorry, this is one of the issues that is keeping me from using restic, so I didn't know restic already had this option. Thanks for pointing it out! I guess just removing the |
The My suggestion would be to use pseudo device ids instead. restic could just map device ids to pseudo device id starting from 0 and increment that counter each time it encounters a new device id. That should essentially let the subvolume always get the same pseudo device id which would then ensure that no new tree blobs are created. [Edit] An alternative could be to add a |
Since there is no way heuristically for |
Probably |
I think what we should learn from this is: inode and devid are not stable IDs, they should not go into the metadata storage of the repository - never ever. After each reboot, or even remount, those IDs can change, especially for filesystems using virtual backing devices (like nfs, btrfs subvolumes, probably more) or non-inode filesystems (like fat etc), in case of NFS even after a server reboot (which is why we have inode generation numbers which restic probably totally ignores). Thus, detecting hard links can only be done reliably during one scan of the filesystem, and metadata must store this information in a inode/devid-agnosting way. Simply ignoring devid or inode is not a proper fix (unless you don't care about proper hardlink restore which may have bad side effects if you restore). Metadata should probably keep track of detected hard links, keep them in a hash and assign "virtual" IDs ( During restore this map can be used to recreate hard links to files already extracted. Partial restores should ignore existing hard links not part of the restore set, otherwise it would change existing files which are not part of the restore set - which could be unexpected or even bad. A file has potentially changed, if its ctime has changed. This way, we would no longer detect renamed/moved files but does this matter with dedup backups anyways? The worst thing that happens is reading the file again even if no content was changed, and in that event, at least file metadata has changed which we would store anyways in this case. With this in place, devids should no longer matter at all, and neither do inodes. It's really not the business of the backup repository if the newly backed up data source comes from different devices or a single/identical one - we just need to know the hardlinks while backing up the current set of files, and this knowledge needs to be fetched from the repository when creating new snapshots, and compared with this new situation. The problem is really in the design of storing devid and inode numbers as part of the repository metadata. It should really go away, instead keep track of the hardlinks only by grouping paths together, store that in the metadata, and use that to reconstruct known valid hardlinks during the next backup session. Also, a backup program should not try to be smart about what the parent snapshot is. It's either a snapshot of the base path currently stored, or it's something the user explicitly names on the cmdline (e.g. if you want to dynamically create a consistent snapshot of |
Getting back to my previous comment: For this to work, the inode must not to be stored in the repository. Instead, restic should use a cache storing inodes (and maybe other meta data) outside of the repository. If the cache is gone, well, then be it that way, and the file needs to be read again even if ctime and other data in the repository matches. This cache should never be backed up in the repository because upon restore, inodes would be different anyways (which also implies why the inode must not be stored in the repository because there may be collisions after a restore and a later backup). Such a cache still won't cover cases where the filesystem has unstable inode numbers. In this case, the user should be able to adjust via cmdline how file-change detection works (ctime, size, mtime, or any combination). This also means that meta data from the repository is not suitable for file change detection anyways: you never know what happened to the data between backups, the user could have rsync'ed the filesystem to a new computer... But then again, no ctime would match because you cannot set ctime. Because of this I now revise my previous comment: The inode is not needed to detect file changes, ctime is sufficient. But you can use the inode to detect file moves and hardlinks - but only while the inode generation matches and the filesystem hasn't been unmounted. Unmounting the filesystem invalidates all knowledge collected from inodes. |
Detecting hard links using those stored device IDs and inode IDs is reliable, because they are stable within the same snapshot. |
Yes, this is true. But this discussion originated from the problem of duplicated metadata in successive backup snapshots, and that's why these IDs should not be stored at all but another way of keeping information about the hardlinks should be found, and reconstructed when taking a new snapshot by looking at the current files to be backed up and loading the current devid/inode from there. |
We could give each hard-linked file a unique ID. Upon restore, we can then check whether the actual restore target's device IDs match across the hard-linked files to restore the hard link or to perform some sort of fallback/error if the device IDs do not match. This would eliminate the requirement to keep track of device IDs or to store them in the metadata. |
@kakra another way of keeping the information would have the same duplication problem, which needs to be solved.
We already have a unique ID (at least, unique within the same snapshot): i.e. the (Device_ID, Inode) tuple. You are essentially proposing to give this a different name (Unique_ID or so), but that does not solve the problem how to keep it consistent across snapshots in order to prevent metadata duplication. In #3041 (comment) I explained how to make these IDs consistent across snapshots (at least in a best effort manner). Whether or not you like to rename it is completely orthogonal to solving the problem of duplication. |
I propose that we do not need IDs at all, so there would be no duplication. Instead (but this needs additional conversion or migration) I propose that we keep a separate list of hard links grouped by inode/devid (the latter not stored in the metadata). This way, on restore, hard links can be reconstructed. If needed, we can create virtual inode IDs (and probably call id hardlink ID or something similar) which can reduce the amount of data that needs to be stored - but the classic inode/devid should not be stored in the repository - because you cannot ensure that new snapshots are based off the same IDs, and it is virtually impossible to maintain a proper, collision-free mapping from the original IDs to the current ones. We shouldn't try to fit the stored IDs into a purpose they are not meant to be used for.
This is only half of the solution because inode numbers are not stable IDs in the same way (although in most filesystems, it seems they are) - but technically, inode numbers are subject to change, and for e.g. NFS, they may not even be stable during the same snapshot because we do not record the inode generation number. If we start adding that, too, metadata duplication will only increase. I'm not trying to say that your thoughts are wrong but I think we should first try to define if our assumption are really true. And they are not: (dev_id,inode_id) is not a unique tuple during the time recording a snapshots - it's just very likely. We also need the inode generation number. And storing this in the metadata will greatly increase the chance of seeing new metadata in the next snapshot, essentially duplicating it. And this happens, because the design of storing these IDs in the repository - although they are known to be unstable - is probably wrong to begin with. Your design can still work, we "just" (simplified) need to count new hardlinks and store them in the inode field, leaving any other IDs at 0 (also the inode number for regular files), at least for persisted data, runtime needs all the details. And we also need to add the generation number at runtime to be robust backing up NFS (and similarly behaving) sources. As long as we only encounter a few hardlinks, there's not much more to do: Duplicating a few hardlink metadata would probably be okay. If not, we should look into re-using those IDs for the new snapshot as good as possible. But this makes thinks complicated, and complicated things usually are more likely to contain bugs. Just storing a grouped list of hard links would be much easier.
Sorry, but no, I'm not. That's what you want to make of it. I proposed to keep an extra list of hardlinked filenames grouped by inode. I just said (indirectly) that if we want to stick with the current fields (which we should remove), then we can probably re-use them for that purpose. But we should be agnostic about inode IDs and thus should not store them at all, e.g., by replacing this number with a new virtual ID. But I still think this makes things more complicated, and it doesn't keep the repository backwards compatible either. Also, inodes are not as unique as it sounds. And even if some man page proposes that, it may not be true. Even man pages contain errors or miss some details. What I proposed, tho, is keeping (dev_id,inode) (or whatever identifies a file uniquely) in a cache instead of storing it in the repository. The repository design must work without this information persisted somewhere, the cache can be used to speed things up.
No, you cannot. With automounts, a stat call won't resolve the mount point and mount it, you'll get the wrong device IDs unless the automount already mounted the destination volume. This behavior is documented somewhere deep in the kernel sources, and also why it does this. You'd have to access an existing path within the to-be-mounted volume, this is usually Conclusion, and please, @haslersn, don't feel personally offended because this is not about your comments but the whole thread in general: I feel like here are a lot of misconceptions about what is unique and when, and which information is correctly available at which point in time, and what things really are, and even some of these behaviors changed between kernel versions, e.g. for the automounters (so we should not rely on a specific behavior)... |
I think you meant "inode IDs" here, and then, yes, that is generally my idea but instead of persisting this in the metadata per file, we should store it as extra metadata per archive, agnostic of any IDs.
Yes, given you mean "inode IDs". I think we already found that storing device IDs in the repository is quite pointless and should be ignored (...mostly, because currently it is needed to detect hardlinks uniquely if the backup spans multiple volumes). inode/devid (and probably inodegen) should be stored only at runtime, and not persisted in the archive. To speed up future scans, this data can be stored in a cache, tho. But the cache would pretty much be always stale for btrfs or zfs because of changing devid. And because we do not persist IDs in the archive, we need some other means of knowing which filenames link to the same on-disk file (aka hardlinks). With this, we can still use the old repository format without modifying the existing logic, we create a new format readable by newer restic versions, and we prevent adding edge cases and bugs to the existing logic - which works well for most cases, except those we are adding edge cases for. I'm pretty sure this is not the last edge case and clutch we are going to add, and this is because we are misusing those IDs for a purpose they do not fulfill: they are not stable across remount/reconnects - so the archives should not try to work around that: archives are by definition their own sort of "mount". Otherwise, we add a mapping here, an edge case there, a complete exception at another place, then another mapping, more edge cases. This will make the code unmaintainable in the long run, and in a few years, nobody will understand what it does. |
I very much like this solution and this is, as far as I'm aware, the first time that somebody proposes this solution in this thread. So scrap my solution from #3041 (comment).
Why grouped by inode/devid? I'd assume the data structure can simply be a set of sets of file paths. Two file paths that are in the same set are hard links to the same file. During restore, if those file paths happen to be on different file systems, they of course cannot be hard linked, so two separate inodes are created instead and a warning is logged. Some more brainstorming: In order that we don't need to read the whole set of sets into RAM, I propose to additionally store a SHA-256 reference into above-mentioned set of sets as part of every file's metadata (call it e.g. |
This idea mainly resulted from the observation that storing runtime-specific IDs (inode, devid) is probably the wrong way to go forward in the first place. We can keep that to read older repositories but should avoid it for newer archives, I don't think that we need some migration. Also, since I originally participated in this thread, I learned a lot about inodes and what you can do with them, and what they cannot do, and where the limits are especially across different filesystems, including how device IDs work with volumes not backed by static physical devices. So whatever ideas I had before, I now think we should avoid persisting such IDs into the archives.
While it would have worked, I think it only fixes an edge case and we are going to find more, and then we get code full of edge case handling. So let's do some more brainstorming for the new idea.
Essentially, yes, a set of sets (or whatever name golang uses). "grouping" just means the following:
Later, when persisting the sets of hardlinks, they can be "anonymised" by removing the IDs, they are no longer interesting. You end up storing a set of sets. We don't need the IDs because:
Makes sense, I agree.
We should have another layer of verification because there can be collisions, so consider having more than one reference per SHA256 entry. If the SHA256 is found, it should verify to find the exact matching entry. Collisions are unlikely but not impossible.
Exactly, that's the whole idea. We could probably do some sort of delta hardlink_list but is that worth the effort? I think someone had some stats that in their backup sets, around 3% of metadata accounts for hardlinks, so this is a very low number, probably with very low noise. I wonder if it would be easier to just store the complete list, it would mean we store 3% of what we would currently store with metadata duplication. That's a pretty good trade. OTOH, I've not looked at the current implementation details - I believe you know a lot more about it. If it's easy to store just the differences to the hardlink_list, and it's easy to reconstruct from the parent snapshots, then go for it. What about sets removed from the hardlink_list because they no longer exist in the current snapshot? Will it be properly recorded? This probably involves a breaking format change. For a new client, this is easy: The old code paths are still intact and can work like before for older archive data. We probably could even avoid a one-time duplication by keeping devid/inodeid from the previous snapshot but ignore that on reads in the new code path, and simply not write those ID data for new items in the archive. An old client, tho, will only be able to read the older format. As far as I know, there's a planned format change going to happen anyways, and having this ready by then would be a good opportunity to get it in. Getting this in should avoid all headaches we could have in the future with "random" behavior of filesystems wrt object IDs (inode, dev, ...). It keeps the code easy and doesn't need to touch a lot of the existing code paths to still read old archives in the same way it does now. The only "complicated" things to get right is probably the implicit migration from old-format metadata to new-format metadata on the fly, and storing and reconstructing the hardlink_list. If it helps performance, we should consider storing a cache with discovered IDs. Keeping it out of the archive means we can simply discard it on format change (or environment change), and the only effect is it being slower one time. The nice thing about the whole idea is: If we restore a backup, we will get new inodes. When stripping this data from the archive, even taking a new backup from restored data would not create duplicate metadata (except if we store the ctime, which we should probably avoid, too, and put that in said cache instead). About ctime and a cache, and without knowledge if restic already does this: ctime serves no purpose in the archive, because it cannot be restored. If we store it in a cache instead, we can still get faster backups. And if the cache becomes destroyed, yes, we need to do full file scans to find if a file has changed - we cannot just skip it. But if the remainder metadata is the same, we also get no duplication. But this latter idea is a different idea although it somewhat fits into the concept, and thus may be appropriate to change while at it. Also noteworthy but not part of this problem: We probably still need options to tell which attribute change counts as a backup indicator: Somebody may want to backup only files which are newer, someone else may consider mtime bogus and just consider filesize as a change indicator, some filesystems may not even provide all the details (e.g. FAT has 2-seconds resolution for file times, I think). Speaking content-wise: A path indicates a file uniquely. A devid/inode is just an internal representation of the file object and should not be taken as a unique identifier of the file unless specific conditions are set, e.g. it's still the same mount and same boot-cycle, for network filesystems even that may not be true (hence there are inode generation numbers), and some filesystems even do not know what an inode is and will just create some on the fly using some algorithm, valid as long as it needs to be. So at best, an inode is a time-limited unique identifier. It just identifies an index node for an unspecified period of time, not the content. But from a user perspective, the paths identifies the file and the contents, and that's what we want to backup. Thanks for considering, this looks promising. :-) |
We could probably use |
I was not speaking of the SHA-256 of the file. I was speaking of the SHA-256 of the set of file paths. (SHA-256 is considered to be a cryptographically collision resistant hash function, so we can assume that collisions are never found. The Restic repository format already depends on this assumption by storing every object content-addressed by its SHA-256 hash.) However, there's a problem: We don't know the set of file paths before the scan, so this would require a 2nd pass. I need to think more about how this can be solved in a single pass. |
Neither me... ;-) I thought you would be hashing the attributes of a hardlink to create an index into the hashlist which you could refer from the metadata. But yes, hashing the path list probably works, too. And with this I better understand how you'd implement deduplication of hardlink lists. OTOH, my idea could solve the problem in a single path because you don't rely on the discovered paths. You could then, at the end, compute the SHA256 over the path list before storing the list in the archive. Slightly offtopic: I'm not sure if restic really "relies" on this. Surely, it uses content-based addressing to identify duplicate blocks, but I'm pretty sure it still compares the contents to be sure they are really identical, and then uses a generated index to store a reference to the block. Or does it really reference blocks exclusively by the SHA256 of the contents? I mean, yes, "collision resistant" but that doesn't mean "collision free". For Git, which has a similar issue with collisions, collisions have already been discovered to craft strange or impossible repositories, tho, it couldn't be used to attack the verification chain of the commits yet. So at least I would expect restic to also have a full SHA256 of the complete file (which are composed of individual SHA256-addressed blocks) to have another layer of verification that extracted content at least matches what it originally stored deduplicated into the archive. Otherwise, collisions go unnoticed on extraction. BTW: Calculating the probability of a collision must use the birthday paradox: It is more likely that two hashes collide with a growing number of hashes than trying to craft a single colliding hash. This is probably what many calculations get wrong, but it's still very unlikely. I found an issue about this: #1732 IOW, given the low amount of hashes that hardlink lists would create, it is more likely to find hash collisions in the block storage than the hardlink storage. |
I know
In the end of the sentence you probably meant "to find a collission for a fixed hash". Then that is true, but for SHA-256, no hash collission is known at all. If a weakness in the SHA-256 is found in the future, then there might be shortcuts to find a collision (e.g. by narrowing the search space). But as long as this is not the case, it is expected that a collision will never be found. git is a different story, because it uses SHA-1 which has only 160 bits. It is much easier to produce 2^80 hashes than to produce 2^128 hashes. Furthermore, SHA-1 has known weaknesses which allow for more efficient collision attacks and ultimately have lead to successful collision attacks. Such weaknesses are not known for SHA-256. |
I agree that getting rid of the device / inodeID is probably the only way to 100% fix this issue while still handling hardlinks. But that doesn't mean that we can't introduce a simpler temporary solution, especially once the feature flags have landed. Finding a good solution for the metadata duplication issue here appears complex enough that we shouldn't rush it into a particular restic version.
The list of paths referencing the same file instance is only available at the end of a backup. So, the final hash for each file instance would have to be injected in a second pass, which is a complexity disaster. The backup would have to iterate through the whole snapshot and modify the data it just wrote. To make matters worse, all sets of sets would still have to be kept in memory. The alternative of only storing some id (instead of the SHA256) that references some kind of mutable set is even worse, as it would introduce mutable data into the repository format. I only see two ways to store the set of sets:
The "large set" approach also has the problem that it will likely require some additional mechanism to ensure that the set can be (partially) deduplicated across snapshots; otherwise, we'd just partially recreate the current problem. The inline approach on the other hand may require changes in multiple places if the first file of a hardlink set gets removed. But except for that it is far, far simpler to implement. (no need for extra set datastructures in the repository, which also requires garbage collection changes; when only restoring a part of a backup, also only the relevant part of the hardlink sets have to be reconstructed)
Change detection is out of scope for this issue. Let's focus on hardlink detection here. |
@MichaelEischer Thanks for considering the ideas.
That's why I suggested to just use an index counter into the sets mapping (devid/inodeid/...) -> (ID, [paths...]). Once you encounter a file with nlinks > 1, lookup devid/inode and maybe inodegen in the map, if it is there, record the additional paths and refer the ID, otherwise add the a new free ID and refer that. After the snapshot is completed, all IDs can be recorded (maybe by sorting the paths and storing SHA256 references so it can be deduplicated, I don't know the inner workings) but do not record the (devid/inodeid/...) part because that is unstable. When reading a parent snapshot, let's first read this mapping and clean it from files with ncount < 2 (no hardlink, or not found), and recreate the (devid/inodeid/...) information by looking up the paths, remove conflicting files (as in devid/inode/...) from the set of paths, then remove all maps with only one or zero paths left. This can be done before scanning the file system, so the following changed-files scan can then re-use what was already discovered, and add new hardlinks, and it would still work in a single pass. The "rebuild" part of the devid/inodeid/... can be omitted if we do not refer to uniquely created IDs (and store the "large set" instead, see below). Essentially, this is your idea of the To deduplicate the large set, we could store a list of SHA256 hashes belonging to the snapshot, and then store each set of paths individually as such a SHA256 reference. As long as the paths per set are sorted before hashing, the IDs remain stable. To eliminate duplication as best as possible, this list of SHA256 hashes could be hashed into a single value, too, and then referenced by the snapshot. But I think the inline idea probably works better - as you suggested (tho, I think it's a little more complicated to implement correctly). The "large set" idea still would record a full new list of SHA256 references in case of changes. I just mentioned the "change detection" because we are going to remove devid/inode from the backup snapshots, and that will probably touch change detection - because that's actually why the whole issue has been posted in the first place. So we need a new definition what change detection means. Currently, a file is probably recorded into the snapshot if the inode/devid changed (among other hints). Is it recorded just because those IDs changed, or will it still be recorded because, e.g., the ctime changed? |
That results in unstable IDs that are extremely dependent on whether a parent snapshot is used or not. The effort to maintain such a mapping is also rather significant.
Such a prescan is a second pass.
There's not much housekeeping necessary here. The map is reconstructed at runtime, and if the reference path vanishes, then all other instances of that hardlink will simply store a new reference path. That way there's no need for housekeeping. This variant might be slightly inefficient in some cases, but is really simple as it does not have to pass on data between snapshots.
I don't think that is a good idea. The repository format and in particular the repository index are not designed to accommodate large numbers of very small objects. Those can drastically increase the memory consumption of the repository index, which is already rather high.
I think the implementation effort and complexity is rather the other way around. The large set approach will likely add 1k+ lines of code (not counting tests), whereas the inline approach should just require a few hundred lines of code.
Please take a look at restic/internal/archiver/archiver.go Line 500 in 9284f74
Is the problem here really that inodes are changing? The data from #3041 (comment) suggests that the primary problem is the deviceID. By replacing devid+inode with some other mechanism to detect hardlinks, we no longer need both for hardlink detection. That allows removing the devid, whereas the inode is also used for change detection. If necessary, we could beef up the
The metadata stored for a file is always regenerated from scratch no matter whether the file exists in the parent snapshot or not. That ensures that restic doesn't accidentally miss file metadata changes. The "change detection" only determines whether the file content is read again or not. |
As I am getting lots of notify mails on this topic, I want to add my 2 ct (or better some comments which are hopefully useful) About hard link identificationAs a matter of fact, hard links are determined by looking at nlinks, inode and devid. About the snapshots:We must always take into account that snapshots (when writing snapshots I mean all the trees referenced by a snapshot) may be a subset of a device. We may have excluded parts which in fact contain snapshots to parts we have included. To makes things worse, we must take into account that a snapshot may be modified later. About restore:We have a similar situation as during backup when restoring: We may just restore a About change detection:I think the main topic here is about change detection and not actually how to store hard link information in the snapshots. There are people with changing devid/inodes which don't want files to be re-read and metadata to be duplicated if there was no change. Now the re-reading of files can be adjusted in the parent detection. However, I don't see why - in the case we have stated that a file didn't change w.r.t. the parent - we would want to take the content from the parent without reading the file but not take the hard link information from the parent. IMO whatever we take to save information about hard links should be just copied over from the parent in the case of a match! About some suggestions:
My alternative proposal:
|
No, it's not unique in a general sense. This is the whole point here. It can be seen as mostly stable while the snapshot is taken. Actually, if taking snapshot of network file systems, inodegen should also be taken into account. But other than that, it solves no purpose storing it in the archive because it won't tell you anything about the file identity in the archive - at least not when referencing it later from a child snapshot, or when crossing file-system boundaries while creating the snapshot (in that latter case, inode won't be unique by definition without devid). But yes, for snapshot creation we can keep things as-is. The current implementation works for identifying hard-links. If it is as easy as implementing to ignore devid/inode for snapshot creation in context of the parent, then go for it. But I wonder what happens if we later restore such a snapshot? The "missing" files are inherited from the parent, thus we are going to see potentially incompatible devid/inode, and now, how do we reliably (and correctly) restore hardlinks? IMO, we need some way to identify hardlinks by a compatible identifier across snapshot inheritance. Besides that, I really appreciate a KISS approach to the problem, thanks for your insights @aawsome. I think these thoughts of you are really important:
By "in-place" you mean restoring a file into the contents of an existing file, possibly truncating it, without replace or unlink/rename? I think this is always the wrong way of doing it, so if some user wants to do it, they possibly shoot their own foot. This can easily be "fixed" by documenting the pitfalls. Or do you mean just restoring into an existing set of files, possibly correctly replacing existing files, instead of restoring into a completely empty directory?
While this may be a tempting idea, it is almost always a very bad idea - except you have a very specific set of files (e.g., an NNTP spool). Maybe this should be left to external tools which create checksums of existing files and then transform identical contents into hardlinks. restic should probably not try to mimic such a behavior. |
Is it actually a good idea to have spontaneously forming hardlinks when merging two snapshots? I'd rather not hardlink files than risking accidentally hardlinking the wrong files.
I'm not sure where that claim originated and why it continues to be repeated in this issue, but the change detection never checked the deviceID. (the tree blob deduplication is obviously by construction sensitive to deviceID changes). Based on the description that option should rather be called
Having the "wrong" deviceID/inode would not be a problem in itself. However, based on how the deviceIDs are assigned by Linux (see https://www.kernel.org/doc/Documentation/admin-guide/devices.txt) this will very, very likely result in aliasing between different deviceIDs. As a result it wouldn't be surprising to see the same inode+deviceID in a snapshot refer to two completely unrelated files. That's just asking for trouble (it's only a little bit better than always omitting the deviceID). As a bonus
Simple, it's impossible to reliably restore hardlinks using that construction. "Snapshot inheritance" is a concept that does not exist in restic. All tree metadata in a snapshot is always generated from scratch and if it is identical to an existing tree blob in the repository, then it can be deduplicated. The change detection in the |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I think I have a reasonable solution to this issue. If people agree with the approach, I'm happy to try to implement it myself. Quick recap: it's possible for
My idea is: whenever we see a new
Hard-link detection still works, and the new backup can re-use as much of the old backup's trees as possible. |
How are you fabricating the new devid such that it will not accidentally conflict with a real devid in a subsequent backup? |
The idea is that subsequent backups would see that the devid is already used by the backup and map it to something else. Something like storedDev, ok = devMap[realDev]
if !ok {
storedDev, ok = getDevFromExistingBackup(path)
if !ok {
storedDev = generateUniqueDev(devMap)
}
devMap[realDev] = storedDev
} |
Yeah the idea makes sense to me, I'm just curious about how |
It doesn't have to. Notice that the real device ID isn't used any more (except as a map key). |
I can't shake the feeling that we've already discussed something similar (dynamically generating a devId map) somewhere above (although not all parts of the id assignment scheme below). But I don't have time to read this novel (aka. this issue) again right now; that will have to wait until restic 0.17.0 is done.
If that method sequentially assigns devIds as in the example above, then it's very easy to cause a new mountpoint to renumber all devIds in a new snapshot. Just add a new mountpoint that is backed up before existing mountpoints. That is the generated id should yield as few collisions as possible. Either by picking completely random IDs or by somehow deriving them from the filepath. The latter variant has the benefit, that most it would in most cases also result in stable devIds even if no parent snapshot was detected. As pseudocode the filepath-derived variant would look something like the following. func generateUniqueDev(devMap, realDev, path):
id := hash(path)
for {
if mappedIdInMap(devMap, id) {
# use randomId in case of collisions
id = randomId()
} else {
devMap[realDev] = realDev
return
}
} That has a few nice properties:
Downsides:
The |
For all who looking for a working solution right now, without the need to apply patches: It still detects parents as changed, not perfect, but much better than meta data changes on all and everything. |
I run my backups from a ZFS snapshot in order to ensure the entire file-system is in a consistent state. After I upgraded to restic 0.10.0 from the previous official release, the backup started adding a duplicate copy of all the directory meta-data while claiming that all the directories have been changed. For example (pardon my bash):
The result occurs repeatedly after re-running the backup on a new ZFS snapshot of an otherwise static file-system. I expect it to work like the previous version in which directories were not seen to be "changed".
I tested this on the same file-system except without using a ZFS snapshot, and it does not report directories as "changed" or upload duplicate metadata. Therefore, this problem seems to be particular to using ZFS snapshots. My method for backing up from ZFS snapshots is as follows:
I find it interesting that restic is uploading new/unique directory meta-data with every run, suggesting that something about the directory meta-data is actually changing between runs. However, earlier versions of restic did not "see" these changes. I'm at a loss as to what's causing this.
In terms of severity, this is merely a nuisance to me---about 30ish MiBs added to the repo each day. However, I could see this being a bigger problem on systems with a lot more small files. Is there any way I can find out what aspect of the directory is being identified as "changed" from the command-line? Adding verbosity did not appear to do the trick.
The text was updated successfully, but these errors were encountered: