Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bogus escape: '\\x' on word documents identification (container-signature updated : container-signature-20140923.xml) #56

Closed
florianajir opened this issue Nov 4, 2014 · 7 comments

Comments

@florianajir
Copy link

I tried to update conf/container-signature file (container-signature-20140923.xml) to see the difference on word document identification and the result is bad : docx files are no more recognized (identify as a zip), see :

with container-signature-20140923.xml :

FIDO v1.3.1 (formats-v78.xml, container-signature-20140923.xml, format_extensions.xml)
OK,240,fmt/40,"Microsoft Word for Windows Document","Microsoft Word for Windows 97 - 2002",11776,"/home/fajir/docs/AnnexeDoc.doc","application/msword","signature"
bogus escape: '\\x'
OK,10,x-fmt/263,"ZIP Format","ZIP format",25204,"/home/fajir/docs/cours sur les theories de la motivation.docx","application/zip","signature"

with container-signature-20130501.xml :

FIDO v1.3.1 (formats-v78.xml, container-signature-20130501.xml, format_extensions.xml)
OK,230,fmt/40,"Microsoft Word for Windows Document","Microsoft Word for Windows 97 - 2002",11776,"/home/fajir/docs/AnnexeDoc.doc","application/msword","signature"
OK,29,fmt/412,"Microsoft Office Open XML - Word","Microsoft Office Open XML - Word",25204,"/home/fajir/docs/cours sur les theories de la motivation.docx","None","signature"

I think the bug come from "bogus escape: '\x'"

@florianajir florianajir changed the title update container-signature file cause bogus escape: '\\x' on word documents update container-signature file throws bogus escape: '\\x' on word documents identification Nov 4, 2014
@florianajir florianajir changed the title update container-signature file throws bogus escape: '\\x' on word documents identification bogus escape: '\\x' on word documents identification (container-signature updated : container-signature-20140923.xml) Nov 4, 2014
@mistydemeo
Copy link
Contributor

I've fixed the regex generation in #59, however this doesn't fix the misidentification (which I am also seeing in some Office files). I can identify the same files using Siegfried correctly, so I believe the problem is in FIDO and not the signature files.

@mistydemeo
Copy link
Contributor

DROID also returns a correct match using up-to-date container signatures.

@mistydemeo
Copy link
Contributor

Deleted my previous comment, which was incorrect.

In my case, I believe the matching signature is this one:

        <ContainerSignature Id="2030" ContainerType="ZIP">
            <Description>Microsoft Excel OOXML</Description>
            <Files>
                <File>
                    <Path>[Content_Types].xml</Path>
                    <BinarySignatures>
                        <InternalSignatureCollection>
                            <InternalSignature ID="317">
                                <ByteSequence Reference="BOFoffset">
                                    <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="32768">
                                        <Sequence>'ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"'</Sequence>
                                    </SubSequence>
                                </ByteSequence>
                            </InternalSignature>
                        </InternalSignatureCollection>
                    </BinarySignatures>
                </File>
            </Files>
        </ContainerSignature> 

FIDO produces the following regex from that signature:

(?s)ContentType="application/vnd\.openxmlformats-officedocument\.spreadsheetml\.sheet\.main\+xml"

Which is a valid regex which should match the file in question.

@mistydemeo
Copy link
Contributor

Oh, I think I see what's going on. FIDO doesn't appear to do special handling of ZIP containers, e.g. it doesn't decompress them before attempting to read their contents, which means it can't match the plaintext sequences in the container signatures.

@mistydemeo
Copy link
Contributor

I've added support for ZIP containers; there's a pull request open, #60.

@Hwesta
Copy link
Contributor

Hwesta commented Oct 3, 2016

@florianajir Misty's PR #60 was merged and is part of 1.3.4 - are you still seeing this issue?

@florianajir
Copy link
Author

All right I close the issue, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants