Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync --preserve checks mtime #35

Closed
wants to merge 1 commit into from
Closed

Conversation

mdomsch
Copy link
Contributor

@mdomsch mdomsch commented Mar 2, 2012

This causes an extra HEAD request for each remote file, which greatly
slows down execution, and increases monetary cost $(0.01/10000
requests), but guarantees files whose mtime has changed will get
resync'd.

This is necessary for yum repositories, where repodata/* files may be
updated but not change size. It also correctly handles large files
whose md5 values as returned by S3 are incorrect having their content
(and thus mtime) changed, perhaps by RPM signing.

This causes an extra HEAD request for each remote file, which greatly
slows down execution, and increases monetary cost $(0.01/10000
requests), but guarantees files whose mtime has changed will get
resync'd.

This is necessary for yum repositories, where repodata/* files may be
updated but not change size.  It also correctly handles large files
whose md5 values as returned by S3 are incorrect having their content
(and thus mtime) changed, perhaps by RPM signing.
@mdomsch
Copy link
Contributor Author

mdomsch commented Mar 2, 2012

we're trading a ton of local disk I/O to calculate md5 on each file, for a HEAD call to S3 for each file
we do get the LastModified (uploaded) time from S3 w/o the HEAD call
I wonder if we can simply look at files with mtimes newer than LastModified...
and assume if file mtime is newer than LastModified, then it needs to be updated.

For regular occuring sync runs, I think that's valid...

@mdomsch
Copy link
Contributor Author

mdomsch commented Mar 2, 2012

With this patch, syncing takes 10x longer. Probably the wrong approach then. Maybe LastModified as a proxy for mtime is good enough...

@mdomsch
Copy link
Contributor Author

mdomsch commented Jul 14, 2012

Killing this pull request. What I've done elsewhere in my tree works better w/o the I/O penalty.

@mdomsch mdomsch closed this Jul 14, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant