New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a CRC checksum for gen files #60
base: master
Are you sure you want to change the base?
Conversation
Not as useful as I expected it to be, because loading an incomplete file usually throws an error (libc++abi.dylib: terminating with uncaught exception of type boost::archive::archive_exception: input stream error), whereas just checking the last few bytes and interpreting them as a checksum isn't helpful either, because there is no way to tell if the checksum is correct. I guess the proper way of handling this is to load the file, and if it loads successfully, then use the checksum to check its integrity. If it doesn't load successfully, that in itself is an indication that the file is incomplete. I have to see if I can apply this to practical use-cases. |
Yes, I see. You're right. I will try to improve this. |
Actually, no: I already do 'the right thing':
However, what happens is that I wrote the string and the CRC using boost::serialization. So, if this String + CRC is not a valid archive (regardless of the content), then reading will fail. We could avoid using boost::serialization to write the string + CRC (this is easy), so the CRC will fail instead of the boost archive. But will that change anything for you? I am not sure to understand your use case at 100%. |
Initially, my use case was that, before resuming a job, I wanted to quickly know which gen file to load. Because my gen files would tend to get quite large, I did not want to read the entire file to check that it was complete, so instead I only read the last few bytes, and made sure those were some predetermined string. This method is not fool-proof, but it is fast. In practice, I find at most one incomplete file (the most recent gen file), so now I am wondering whether a "just-try-to-load-it" strategy would really waste that much time. Alternatively, writing a short string after the checksum, or writing the checksum both at the start and the end of the file (so I can compare the start with the end without reading the middle), would also work for me. |
Maybe we could write the size (in byte) of the gen file at the beginning of
the file? So that we can quickly check with the actual size and decide if
the archive is fully written?
On Wed, Mar 8, 2017 at 6:17 PM Joost Huizinga ***@***.***> wrote:
Initially, my use case was that, before resuming a job, I wanted to
quickly know which gen file to load. Because my gen files would tend to get
quite large, I did not want to read the entire file to check that it was
complete, so instead I only read the last few bytes, and made sure those
were some predetermined string. This method is not fool-proof, but it is
fast.
In practice, I find at most one incomplete file (the most recent gen
file), so now I am wondering whether a "just-try-to-load-it" strategy would
really waste that much time. Alternatively, writing a short string after
the checksum, or writing the checksum both at the start and the end of the
file (so I can compare the start with the end without reading the middle),
would also work for me.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFu8v16GxolN1eVl8QTos1DE2romOW--ks5rjuIGgaJpZM4LS6hl>
.
--
-- Jean-Baptiste Mouret
ERC ResiBots: http://www.resibots.eu
Web: http://members.loria.fr/JBMouret
|
We would have to make sure that the written size corresponds to the size of the file on disk, but I think that is a good solution. |
OK, so what do you think of this new version?
On Wed, Mar 8, 2017 at 6:40 PM Joost Huizinga ***@***.***> wrote:
We would have to make sure that the written size corresponds to the size
of the file on disk, but I think that is a good solution.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFu8vyfsuni-orikl1n0qUzfndL7kvKTks5rjufpgaJpZM4LS6hl>
.
--
-- Jean-Baptiste Mouret
ERC ResiBots: http://www.resibots.eu
Web: http://members.loria.fr/JBMouret
|
Apparently, the file I checked out was missing a: The change itself, however, works great for my purposes. |
This might take some additionnal memory but that will ensure that the file is fully written (see #49).
What do you think @JoostHuizinga ?