Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: don't set dir modtimes if already set #7657

Merged
merged 1 commit into from Mar 7, 2024

Conversation

nielash
Copy link
Collaborator

@nielash nielash commented Mar 2, 2024

What is the purpose of this change?

Before this change, directory modtimes (and metadata) were always synced from src to dst, even if already in sync (i.e. their modtimes already matched.) This potentially required excessive API calls, made logs noisy, and was potentially problematic for backends that create "versions" or otherwise log activity updates when modtime/metadata is updated.

After this change, a new DirsEqual function is added to check whether dirs are equal based on a number of factors such as ModifyWindow and sync flags in use. If the dirs are equal, the modtime/metadata update is skipped.

For backends that require setDirModTimeAfter, the "after" sync is performed only for dirs that could have been changed by the sync (i.e. dirs containing files that were created/updated.)

Note that dir metadata (other than modtime) is not currently considered by DirsEqual, consistent with how object metadata is synced (only when objects are unequal for reasons other than metadata).

To sync dir modtimes and metadata unconditionally (the previous behavior), use --ignore-times.

Was the change discussed in an issue or in the forum before?

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@nielash nielash added this to the v1.66 milestone Mar 2, 2024
@nielash nielash force-pushed the dir-sync-equal branch 3 times, most recently from a5c9771 to 4ea5697 Compare March 3, 2024 06:19
@nielash nielash marked this pull request as ready for review March 3, 2024 06:42
@nielash nielash requested a review from ncw March 3, 2024 06:42
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really nice - thank you :-)

I put some specific nits inline.

General comments:

  • Please kill path.ToSlash with fire :-) These should not appear in core sync routines - they are supposed to translated to / at the edges and be / within rclone. If \ are getting in to the core then that needs fixing, but not here!
  • The bisync tests are quite annoying aren't they!
  • I wonder if we could combine the s.modifiedDirs and s.setDirModTimes somehow... We are storing a lot of directory objects we don't need to potentially. Perhaps we don't know until the end of the sync.

fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
fs/sync/sync.go Outdated
@@ -399,6 +402,11 @@ func (s *syncCopyMove) pairChecker(in *pipe, out *pipe, fraction int, wg *sync.W
fs.Errorf(pair.Dst, "Source and destination exist but do not match: %v", err)
s.processError(err)
} else {
if pair.Dst != nil {
s.markDirModified(path.Dir(filepath.ToSlash(pair.Dst.Remote())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not need filepath.ToSlash here - all paths should be unix / delimited in rclone internals.

If they aren't then that is a bug somewhere else :-)

Please remove them all from this file! Also see later suggestion for a function markDirModifiedObject...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all the filepath usages here :) I think the error I was seeing the other day may just have been a transient fluke of GitHub Actions (I've been seeing more of those lately)

Here is an example of what I mean though about the path library causing issues on Windows with rclone's unix paths: this test fails only on Windows. (And I confirmed it also fails on a real PC, not just GH Actions)

I think the issue in this particular case is path.Join removing the double // and then rclone adding it back incorrectly. But there may be other issues as well.

I think it would be really handy to have a more comprehensive library of rclone-safe path manipulation functions!

fs/sync/sync.go Show resolved Hide resolved
fs/sync/sync.go Show resolved Hide resolved
fs/sync/sync.go Outdated Show resolved Hide resolved
@nielash
Copy link
Collaborator Author

nielash commented Mar 5, 2024

  • Please kill path.ToSlash with fire :-) These should not appear in core sync routines - they are supposed to translated to / at the edges and be / within rclone. If \ are getting in to the core then that needs fixing, but not here!

Haha, will do! There definitely is some issue, I think specific to Windows -- I've run into it multiple times on this and other projects. But I agree this is not the proper fix for it. I'll take these out and try to see if I can find the true cause.

  • The bisync tests are quite annoying aren't they!

They sure are! I would estimate that my time on this PR was maybe 10% writing code and 90% debugging the bisync tests. But on the other hand, this is a good example of an issue worth knowing about that a more conventional unit test would probably miss.

  • I wonder if we could combine the s.modifiedDirs and s.setDirModTimes somehow... We are storing a lot of directory objects we don't need to potentially. Perhaps we don't know until the end of the sync.

Yeah, that is basically the problem. At the point that s.setDirModTimes is set, we haven't yet recursed into the directory and so we don't know if there are changed files within it. It might be possible to do it after each directory is done (instead of waiting til the end of the sync), or at least release the ones we know we definitely won't need, but I'm not sure how much difference that would make, and would probably add a lot of complexity trying to coordinate that across channels. IMO it doesn't seem worth it.

@nielash nielash force-pushed the dir-sync-equal branch 2 times, most recently from ac733f1 to ca616d7 Compare March 6, 2024 14:40
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great except for fspath.Dir which isn't correct!

See suggestions inline - thank you :-)

fs/fspath/path.go Outdated Show resolved Hide resolved
fs/fspath/path.go Outdated Show resolved Hide resolved
fs/sync/sync.go Show resolved Hide resolved
Before this change, directory modtimes (and metadata) were always synced from
src to dst, even if already in sync (i.e. their modtimes already matched.) This
potentially required excessive API calls, made logs noisy, and was potentially
problematic for backends that create "versions" or otherwise log activity
updates when modtime/metadata is updated.

After this change, a new DirsEqual function is added to check whether dirs are
equal based on a number of factors such as ModifyWindow and sync flags in use.
If the dirs are equal, the modtime/metadata update is skipped.

For backends that require setDirModTimeAfter, the "after" sync is performed only
for dirs that could have been changed by the sync (i.e. dirs containing files
that were created/updated.)

Note that dir metadata (other than modtime) is not currently considered by
DirsEqual, consistent with how object metadata is synced (only when objects are
unequal for reasons other than metadata).

To sync dir modtimes and metadata unconditionally (the previous behavior), use
--ignore-times.
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - thank you :-)

I'll merge this now.

@ncw ncw merged commit 8c69455 into rclone:master Mar 7, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants