Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogus escape \\x has returned with latest containers #139

Closed
richardlehane opened this issue Sep 20, 2018 · 6 comments
Closed

Bogus escape \\x has returned with latest containers #139

richardlehane opened this issue Sep 20, 2018 · 6 comments
Assignees

Comments

@richardlehane
Copy link

Hi fido team,
I've been trying to run my benchmarks with the latest fido update but it is failing with a regular expression error "bogus escape \x" (full error dump follows). Resembles this (closed) issue: #56

Suggest it could relate to this whitespace issue that I reported on the PRONOM project: digital-preservation/pronom#8

Error dump:

FIDO v1.3.9 (formats-v94.xml, container-signature-20180917.xml, format_extensions.xml)\nTraceback (most recent call last):\n File "/usr/local/bin/fido", line 11, in \n sys.exit(main())\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 845, in main\n fido.identify_file(file, extension=not args.noextension)\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 358, in identify_file\n container_matches = self.match_container("OLE2", OlePackage, filename, container_file)\n File "/usr/local/lib/python2.7/dist-packages/fido/fido.py", line 214, in match_container\n puids = klass(file, self.extract_signatures(signature_file, signature_type=signature_type)).detect_formats()\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 63, in detect_formats\n results.extend(self._process_puid_map(contents, puid_map))\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 16, in _process_puid_map\n results.extend(self._process_matches(data, puid, signatures))\n File "/usr/local/lib/python2.7/dist-packages/fido/package.py", line 23, in _process_matches\n if re.search(signature["signature"], data):\n File "/usr/lib/python2.7/re.py", line 146, in search\n return _compile(pattern, flags).search(string)\n File "/usr/lib/python2.7/re.py", line 251, in _compile\n raise error, v # invalid expression\nsre_constants.error: bogus escape: '\\x'\n

@ablwr
Copy link
Contributor

ablwr commented Sep 20, 2018

Thanks @richardlehane for reporting this, and for your benchmarking work. I'll see what I can do...

@ablwr ablwr self-assigned this Sep 20, 2018
@Dclipsham
Copy link

I've released PRONOM container signature 20180920 to resolve the whitespace issue and an errant BOM marker, available here or via the download service: https://www.nationalarchives.gov.uk/aboutapps/pronom/droid-signature-files.htm - hopefully overcomes this issue too

@richardlehane
Copy link
Author

Hi @ablwr I'm still getting this error post the 1.3.10 update. Here's a file that reproduces:

fmt_189_Microsoft_Office_Open_XML_testXPS_various.zip

I suspect this is due to line-wrapping whitespace within the latest container file. The very latest container file removed some whitespace when it was within quoted areas of the signature but left it in otherwise e.g. the line breaks in this bit are all newlines that weren't in previous container signature files:

image

This whitespace seems reasonable to me. It did break siegfried's parser however so I made a fix so that it just ignores these newlines.

Making a very similar change to fido (line 126 of fido.py) fixes things for me:

image

@ablwr
Copy link
Contributor

ablwr commented Sep 26, 2018

Oh dear... my fault for testing the .mix file and assuming both were based on the same error. I told myself I'd set up a corpus for testing but after this release. ;) OK, let's go one more round...

@ablwr
Copy link
Contributor

ablwr commented Sep 28, 2018

#143 is now merged into master. I will do some testing and release again!

@richardlehane
Copy link
Author

thanks @ablwr the 1.3.12 release has fixed this bug and fido now completes my benchmarks. I think this means #132 is resolved too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants