-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adobe Illustrator 14 file identified as PDF 1.5, not AI #41
Comments
This has to do with the default buffersize of FIDO which is 128 kb. Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature. If you increase it to say 512 kb, FIDO will correctly recognise the file. Example: You also might want to increase the default buffersize by changing the default settings in the code. |
Interesting. This would be the first example that I’ve seen of a file that needs more than the default 128kb to identify. I wonder if there is a better signature for AI 14? I’ve never looked at the format, but it would be surprising if one actually needed to look at 500kb before knowing a file really is an AI 14 one. Cheers, Adam. From: Maurice de Rooij [mailto:notifications@github.com] This has to do with the default buffersize of FIDO which is 128 kb. Your example file seems to have the PS subset header at an offset of ~478 kb, so FIDO never sees this header and skips to the EOF part of the signature. If you increase it to say 512 kb, FIDO will correctly recognise the file. Example: You also might want to increase the default buffersize by changing the default settings in the code. — Adam Farquhar Adam.Farquhar@bl.uk NW1 2DB http://www.bl.uk/ http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge |
Indeed interesting. Unfortunately Adobe has not published specifications for this format (or maybe I just did not find them...) After further examination it looks like the section between the PDF header and the AI subset header exists out of
Based on this we might assume the binary distance between the PDF header and the AI subset header is very variable, and depends heavily on the existence and number/size of earlier mentioned items. |
Reopened for discussion |
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145433, 2012/01/17-15:11:19 "> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
Adam Farquhar Adam.Farquhar@bl.uk NW1 2DB http://www.bl.uk/ http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge |
Heh, adventure indeed 👍 |
Updated the section about read buffers in the FIDO Usage Guide. |
Would picking the format out of the XMP payload be more reliable than looking for the "%AI5_FileFormat" comment? |
Possibly Andy. If the XMP payload is proven to be more reliable the advanced signature should be submitted to PRONOM. |
Closed due lack of recent activity. |
This Adobe Illustrator sample is being misidentified in fido 1.3.1 using the PRONOM v70 signatures: https://github.com/artefactual/archivematica-sampledata/raw/master/SampleTransfers/Images/BBhelmet.ai
The file is an Illustrator 14 (CS4) file (fmt/563), but is being identified as PDF 1.5 (fmt/19). This isn't actually wrong per se (since AI files are a superset of PDF), but isn't fully accurate. DROID 6.1.2, using the same v70 signature files, correctly identifies the file as fmt/563.
The text was updated successfully, but these errors were encountered: