Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: use fs.checksum instead of get_mtime_and_size #6495

Merged
merged 1 commit into from
Aug 27, 2021
Merged

ref: use fs.checksum instead of get_mtime_and_size #6495

merged 1 commit into from
Aug 27, 2021

Conversation

efiop
Copy link
Member

@efiop efiop commented Aug 26, 2021

We've been using get_mtime_and_size(+inode in state) as a checksum, to make sure that file didn't change. We don't really care about any underlying values, only the final checksum that is a combination of fs-specific values (mtime, size, inode, etc).

In case of ref objects, we are only using get_mtime_and_size for files and not for directories, which makes it really easy to migrate to fsspec-compatible fs.checksum(). We also don't ever save refodb anywhere persistent, so we can tweak checksums later if we find a need.

Pre-requisite for making state work with any filesystem.

@efiop efiop requested a review from a team as a code owner August 26, 2021 20:28
@efiop efiop requested review from pared and pmrowla and removed request for pared August 26, 2021 20:28
@efiop efiop added the refactoring Factoring and re-factoring label Aug 26, 2021
@efiop efiop changed the title [WIP] ref: use fs.checksum instead of get_mtime_and_size ref: use fs.checksum instead of get_mtime_and_size Aug 26, 2021
@pmrowla pmrowla merged commit 95ba68d into master Aug 27, 2021
@pmrowla pmrowla deleted the system branch August 27, 2021 03:37
@skshetry
Copy link
Member

I'm thinking of extending Index/State to keep track of sizes too so that we can easily get the total sizes that we are going to fetch/push. So, we should still use (mtime, size, path).

@efiop
Copy link
Member Author

efiop commented Aug 27, 2021

@skshetry What do you need mtime for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants