-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Siegfried seems to skip certain files without error or warning #115
Comments
Hi Maarten Could you advise what OS you're on and what version of siegfried (sf -version)? Getting the files from you likely won't help if they can be identified individually, the problem seems more to do with their place in the file system... but if you could narrow down the issue and provide a zipped minimal directory with selected files that triggers the issue that would be a great help. Happy for you to send things to richard@itforarchivists.com cheers |
Hi Richard, The OS is CentOS Linux 7.4.1708 I checked the rights too, no anomalies there: all files have the same permissions, regardless of whether they were skipped or analysed. I'll shortly be sending you a package containing 236 files. 4 of them were consistently skipped during additional tests. The other ones are all the files in one directory that was skipped entirely. However -the plot thickens- I redid the same tests on a back-up I have of these files (the files are totally identical, they have the same sha256-hashvalue). Here the previously skipped files were analysed as normal, but different files were skipped. So I doubt it has anything to do with the files themselves, more with the way a list of them is built. Kind regards, Maarten |
Thanks Maarten, I'm downloading the files now. If you're scanning files over a network connection, it might be worth trying the |
Tried it both with -throttle 50ms and 100ms. The same files were skipped. |
The files all scanned correctly on my Windows laptop (i.e. 236 files in the zip, and 236 files in the results file). This does seem to be related to the way sf is walking your file system, rather than relating to the file contents. |
OK this golang bug seems like a possible cause: golang/go#24015 Unfortunately if this is the bug then it may be necessary to wait for a RedHat update to fix this. In later versions of the linux kernel (> 3.10) this problem seems to have been fixed |
If this is a kernel bug, a workaround pending a fix may be to use another tool like Like:
|
The golang bug-workaround (enforcing CIFS version 1.0 on mount) didn't work. The same files were skipped. Piping the list in from find, however, did work. No files were skipped then. So for me, that solved it. Thanks for the help. |
the recent golang 1.11 release has introduced a fix for this issue. I'm hopeful that a siegfried binary built with 1.11 will resolve this. Unfortunately v1.7.9 binaries are still built with 1.10 as that is the current release supported by travis/appveyor. So will leave it open until the release binaries are built with 1.11 |
Hi Maarten |
Hi,
I'm currently comparing the results from DROID and Siegfried (through Brunnhilde). In a dataset containing 216420 files, there are only 2537 discrepancies between the two (roughly 1%), which imho is not bad. However, in my test at least 50% of these discrepancies are due to Siegfried apparently skipping a file. A comparison of the outputs by roy yields "missing" from the siegfried CSV (confirmed by manually checking the Siegfried CSV: they aren't there, so no mistake by roy). I redid the Brunnhilde analysis several times and each time the same files were skipped. I analysed a few of these files (TIFF's in this case) with other programs (JHOVE, DPF Manager) and there seemed to be nothing wrong with them. I also checked whether it might be due to long paths/filenames, non-standard characters in the filename, too many files in a directory or extremely large files, but none of these things seemed a problem. This was confirmed by an individual analysis of each file with Siegfried: the files were correctly analysed. But when I tried to analyse the directory directly with Siegfried, the same files were skipped again. I have no idea why, but I can provide you with the files and the different analyses if you need them.
Kind regards,
Maarten
The text was updated successfully, but these errors were encountered: