-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bogus escape \\x has returned with latest containers #139
Comments
Thanks @richardlehane for reporting this, and for your benchmarking work. I'll see what I can do... |
I've released PRONOM container signature 20180920 to resolve the whitespace issue and an errant BOM marker, available here or via the download service: https://www.nationalarchives.gov.uk/aboutapps/pronom/droid-signature-files.htm - hopefully overcomes this issue too |
Hi @ablwr I'm still getting this error post the 1.3.10 update. Here's a file that reproduces: fmt_189_Microsoft_Office_Open_XML_testXPS_various.zip I suspect this is due to line-wrapping whitespace within the latest container file. The very latest container file removed some whitespace when it was within quoted areas of the signature but left it in otherwise e.g. the line breaks in this bit are all newlines that weren't in previous container signature files: This whitespace seems reasonable to me. It did break siegfried's parser however so I made a fix so that it just ignores these newlines. Making a very similar change to fido (line 126 of fido.py) fixes things for me: |
Oh dear... my fault for testing the .mix file and assuming both were based on the same error. I told myself I'd set up a corpus for testing but after this release. ;) OK, let's go one more round... |
#143 is now merged into master. I will do some testing and release again! |
Hi fido team,
I've been trying to run my benchmarks with the latest fido update but it is failing with a regular expression error "bogus escape \x" (full error dump follows). Resembles this (closed) issue: #56
Suggest it could relate to this whitespace issue that I reported on the PRONOM project: digital-preservation/pronom#8
Error dump:
FIDO v1.3.9 (formats-v94.xml, container-signature-20180917.xml, format_extensions.xml)\nTraceback (most recent call last):\n File "/usr/local/bin/fido", line 11, in \n sys.exit(main())\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 845, in main\n fido.identify_file(file, extension=not args.noextension)\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 358, in identify_file\n container_matches = self.match_container("OLE2", OlePackage, filename, container_file)\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 214, in match_container\n puids = klass(file, self.extract_signatures(signature_file, signature_type=signature_type)).detect_formats()\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 63, in detect_formats\n results.extend(self._process_puid_map(contents, puid_map))\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 16, in _process_puid_map\n results.extend(self._process_matches(data, puid, signatures))\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 23, in _process_matches\n if re.search(signature["signature"], data):\n File "/usr/lib/python2.7/re.py", line 146, in search\n return _compile(pattern, flags).search(string)\n File "/usr/lib/python2.7/re.py", line 251, in _compile\n raise error, v # invalid expression\nsre_constants.error: bogus escape: '\\x'\n
The text was updated successfully, but these errors were encountered: