Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a CRC checksum for gen files #60

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

add a CRC checksum for gen files #60

wants to merge 2 commits into from

Conversation

jbmouret
Copy link
Member

This might take some additionnal memory but that will ensure that the file is fully written (see #49).

What do you think @JoostHuizinga ?

@JoostHuizinga
Copy link
Contributor

Not as useful as I expected it to be, because loading an incomplete file usually throws an error (libc++abi.dylib: terminating with uncaught exception of type boost::archive::archive_exception: input stream error), whereas just checking the last few bytes and interpreting them as a checksum isn't helpful either, because there is no way to tell if the checksum is correct.

I guess the proper way of handling this is to load the file, and if it loads successfully, then use the checksum to check its integrity. If it doesn't load successfully, that in itself is an indication that the file is incomplete. I have to see if I can apply this to practical use-cases.

@jbmouret
Copy link
Member Author

jbmouret commented Mar 8, 2017

Yes, I see. You're right. I will try to improve this.

@jbmouret
Copy link
Member Author

jbmouret commented Mar 8, 2017

Actually, no: I already do 'the right thing':

  • we load the content as a string
  • we check the CRC of the string
  • then we decode the string

However, what happens is that I wrote the string and the CRC using boost::serialization. So, if this String + CRC is not a valid archive (regardless of the content), then reading will fail.

We could avoid using boost::serialization to write the string + CRC (this is easy), so the CRC will fail instead of the boost archive. But will that change anything for you? I am not sure to understand your use case at 100%.

@JoostHuizinga
Copy link
Contributor

Initially, my use case was that, before resuming a job, I wanted to quickly know which gen file to load. Because my gen files would tend to get quite large, I did not want to read the entire file to check that it was complete, so instead I only read the last few bytes, and made sure those were some predetermined string. This method is not fool-proof, but it is fast.

In practice, I find at most one incomplete file (the most recent gen file), so now I am wondering whether a "just-try-to-load-it" strategy would really waste that much time. Alternatively, writing a short string after the checksum, or writing the checksum both at the start and the end of the file (so I can compare the start with the end without reading the middle), would also work for me.

@jbmouret
Copy link
Member Author

jbmouret commented Mar 8, 2017 via email

@JoostHuizinga
Copy link
Contributor

We would have to make sure that the written size corresponds to the size of the file on disk, but I think that is a good solution.

@jbmouret
Copy link
Member Author

jbmouret commented Mar 8, 2017 via email

@JoostHuizinga
Copy link
Contributor

Apparently, the file I checked out was missing a:
#include <boost/iostreams/stream.hpp>

The change itself, however, works great for my purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants