Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce time stamp precision #12

Closed
yemartin opened this issue Jun 24, 2019 · 4 comments
Closed

Reduce time stamp precision #12

yemartin opened this issue Jun 24, 2019 · 4 comments

Comments

@yemartin
Copy link

The really nice thing about cshatag, compared to other tags file solutions like chkbit, is that the tag follows the file along when the file is moved or copied, as long as the destination filesystem supports extended attributes.

But this unfortunately breaks when the time resolution of the target filesystem is less that the original filesystem. This would prevent detecting bit corruption that happened during move or copy operations.

For example, using the Go rewrite, and with:

  • /tmp on my root filesystem (APFS)
  • /Volumes/Organizer from my NAS, mounted through SMB (SMB_3.02)
$ rm /Volumes/Organizer/test.bin \
; touch /tmp/test.bin \
&& cshatag /tmp/test.bin \
&& mv /tmp/test.bin /Volumes/Organizer/ \
&& cshatag /Volumes/Organizer/test.bin

remove /Volumes/Organizer/test.bin? y
<outdated> /tmp/test.bin
 stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
 actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
<outdated> /Volumes/Organizer/test.bin
 stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
 actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.000000000

The second cshatag call, on the SMB share, considers the tag outdated. If corruption had happened during the move operation, cshatag would have missed it.

Suggestion: if I remember well, FAT was probably the lowest denominator, with 2 seconds resolution timestamps. So to ensure maximum compatibility, cshatag should consider the file unchanged if the file timestamp is within +/- 2 seconds of the tag timestamp.

So, to sum it bug-report style:

Current behavior

<outdated> /Volumes/Organizer/test.bin
 stored: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.563117837
 actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 1561415148.000000000

Expected behavior

<ok> /Volumes/Organizer/test.bin

Do you think this makes sense, and this is possible to add to the Go rewrite?

@es80
Copy link
Contributor

es80 commented Nov 6, 2019

Hello.
I've run into a similar issue copying files between ext4 and NTFS partitions.

On an NTFS file systems the mtime resolution is 100ns. If I copy a file from ext4 to NTFS file system using NTFS-3G, the last two digits of the file modification timestamp become 0.

Subsequent use of cshatag against the same file on NTFS will indicate the file is outdated even if the file has not been touched.

For example:

$ cshatag foo.txt
<outdated> foo.txt
 stored: 0000000000000000000000000000000000000000000000000000000000000000 0000000000.000000000
 actual: 181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b 1572947268.840550551

$ cshatag foo.txt
<ok> foo.txt

$ cp -a foo.txt /home/$USER/ntfs_mount/

$ cshatag /home/$USER/ntfs_mount/foo.txt
<outdated> /home/$USER/ntfs_mount/foo.txt
 stored: 181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b 1572947268.840550551
 actual: 181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b 1572947268.840550500

$ cshatag /home/$USER/ntfs_mount/foo.txt
<ok> /home/$USER/ntfs_mount/foo.txt

This doesn't seem to happen between other filesystems with different mtime granularity (I tested ext2, ext3 and ext4 and chatag wouldn't work for me for files on FAT or ExFAT).

For my own use I wrote some code so that cshatag could be used with an -ntfs flag which then ignores any discrepancy under 100ns.

I think it is worth keeping the full width of the time stamp as a default. The simplest solution would be to have a different type of <ok> output to indicate that, although the timestamp is different, the file is identical. For example:

$ cshatag foo.txt
<unchanged> foo.txt

This only needs one extra 'if' condition in the code. A more complex solution would be command-line flags specifying an acceptable time discrepancy such as 2s, 1s, 1ms, 100ns, 1ns (default). For example:

$ cshatag -time=100ns foo.txt
<ok> foo.txt

I would be happy to submit pull requests for either if @rfjakob is interested.

@yemartin
Copy link
Author

yemartin commented Nov 14, 2019

@es80 I was also thinking of adding an option at first. I like your -time= idea the best, by the way.

But then I realized: what happens when you do have some bit corruption during your cross-filesystem transfer, and forget to use that -time option? A false negative: bit corruption happened, but it will not be detected, which is the first job of cshatag.

Take the opposite case: we modify cshatag to ignore up to 2s time differences. Now what happens when there is bit corruption during cross-filesystem copy? We catch it, yeah! Now what if there is a legitimate file change within 2s of the original cshatagging ? We get a false positive. The user may get a scare, but no harm done.

So for me, while the option was a good idea, it needs to be implemented as the default behavior. What do you think @es80 and @rfjakob ?

@es80
Copy link
Contributor

es80 commented Nov 15, 2019

Yes, that's a good point. I wrote some code for a -time option which sought to leave the default behaviour intact. But it does start to get a bit complicated since there are quite a few options for how the program might behave in the different cases.

My main preference would be to be able to differentiate those cases with different status outputs for example, or have options for how they are processed.

But as for the default behaviour, I really don't mind. I suppose the false positives you describe would be very unlikely.

(I actually tried to cook up a false positive and found even with nanosecond precision you can do

$ ./cshatag log.txt > log.txt

or occasionally

$ ./cshatag log.txt; echo "changed" >> log.txt;

then on the next check log.txt is marked corrupt. The level of precision in timestamps is somewhat illusory - the granularity on my system is around 0.005s.)

@rfjakob
Copy link
Owner

rfjakob commented Nov 17, 2019

I like the idea of reporting it with a distinct event type. This case is now reported as <timechange>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants