Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fclones dedupe results in updates to mtimes on directories #93

Closed
ivan opened this issue Nov 24, 2021 · 2 comments · Fixed by #106
Closed

fclones dedupe results in updates to mtimes on directories #93

ivan opened this issue Nov 24, 2021 · 2 comments · Fixed by #106
Labels
bug Something isn't working

Comments

@ivan
Copy link
Contributor

ivan commented Nov 24, 2021

Running fclones dedupe on some directory tree results in mtimes being updated for directories containing files that were deduplicated.

I don't know whether this should be addressed, because while mtimes may be desirable to preserve, the directories really were updated through file creation. This effect does make it less likely that I would want to use fclones dedupe on old directory trees with potentially informative mtimes, though.

@pkolaczk pkolaczk added the bug Something isn't working label Nov 25, 2021
@th1000s
Copy link
Contributor

th1000s commented Mar 9, 2022

Given the dedupe action is almost a userspace-transparent operation I agree that mtimes timestamps should not be affected.

Where is the right place to collect the timestamps beforehand? If placed inside reflink() it would stat directories multiple times, and returning the result io::Result<..> is probably annoying. Though maybe a concurrent hashmap could be passed along to skip that step if possible.

Where could it be done earlier? The iterator passed to run_script() is lazy so the stat'ing has to be woven somewhere into it - and it must reduce the result to unique directories. Not sure if this "dual flow" of files and directories is really possible.

Then the space-efficient fclones::Path has to be converted to a PathBuf to do the stat - should this be kept or should the conversion be repeated later when restoring the timestamps? Also, all other operations don't need that save-restore step at all, so skipping all this should not be expensive.

pkolaczk added a commit that referenced this issue Mar 20, 2022
Deduplicating a file should not affect the mtime of its parent
directory.

Fixes #93
@pkolaczk
Copy link
Owner

If placed inside reflink() it would stat directories multiple times

I implemented the simplest solution for now - I placed restoring in the reflink.
I believe the performance is not a big problem in this case - stat is way faster than actual deduplicating and stat called on the same file multiple times is cached by the OS.

pkolaczk added a commit that referenced this issue Mar 20, 2022
Deduplicating a file should not affect the mtime of its parent
directory.

Fixes #93
pkolaczk added a commit that referenced this issue Mar 20, 2022
Deduplicating a file should not affect the mtime of its parent
directory.

Fixes #93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants