Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems matching Windows executables #5

Closed
overlinden opened this issue May 27, 2014 · 10 comments
Closed

problems matching Windows executables #5

overlinden opened this issue May 27, 2014 · 10 comments

Comments

@overlinden
Copy link

Hi,

I am writing a file analyser in java and plan to use your simplemagic application to determine the file type of an input stream. To use your application as comfortable as possible I compiled it into a jar file including your magic.gz file.

Unfortunately the findMatch Method always returns null for all types on inputstreams. I tried windows executables, bitmaps, plain text, ...
Is there any error in the current version of your code or do I just call it in a wrong way?

This is my test code:

        ContentInfoUtil util = new ContentInfoUtil();
        ContentInfo info = util.findMatch(new FileInputStream(new File("C:\\test\\bmp.bmp")));
        System.out.println(info != null ? info.getMessage() : "Is null");

You can see the folder structure of the simplemagic.jar file attached.
The com folder includes the class files of your code. The res folder includes the magic.gz dictionary file.
simplemagic

Thank you for your help.

@j256
Copy link
Owner

j256 commented May 27, 2014

I suspect that it cannot find your magic.gz file which should be at the top of the classpath? Can you put a breakpoint in ContentInfoUtil.readEntriesFromResource(...) to see if it is being found?

It also may be that those file types are not in the magic file. If you attach the files in question I can take a look at them.

If you need to specify the specific location of the magic file, I'd use the ContentInfoUtil(String, ErrorCallBack) constructor to specify the location of the magic file in the resource path. ErrorCallBack can be null. I've changed the code to throw an exception if it can't find the magic file. Right now it just sets it to be null.

@overlinden
Copy link
Author

The magic.gz file is located in the res folder and I updated the INTERNAL_MAGIC_FILE to point to "/res/magic.gz".
I put a breakpoint on the ContentInfoUtil constructor and and stepped through it's initialisation. It seems to be fine, the magic.gz file is found and read into your data structures.

If I change the path to the magic.gz file it complains that it can not find the file.

Stepping through the findMatch method I can that the probe is read from my input stream and the application tries to find a matching pattern in MagicEntries.java:

public ContentInfo findMatch(byte[] bytes)

As windows executable I tried the "putty.exe" file, but I'm still getting a null object.

@j256
Copy link
Owner

j256 commented May 27, 2014

Hrm. It may be that my magic file does not have information about Windows executable. It's roots are in Unix-land. I just found this. http://www.delorie.com/djgpp/doc/exe/

@overlinden
Copy link
Author

I got it! :)

I tried this on a unix machine with some files from the filesystem and it worked. After that I went back to my windows machine and tested some files I copied from unix. The program is still functional.

After that I tried again with the files I used at first. It does not work. So I took a closer look at the files:

  • Exe files (in windows) are not recognized
  • BMP files generated in Windows explorer have a size of 0 bytes until you open it the first time. So this could not be recognized anyway.
  • Plain text files are not recognized.

I used exactly these three files. I verified the function with another selection of files and your application works totally fine.
Maybe I can fix the recognition of exe files later.

Thank you for your quick help.

@overlinden
Copy link
Author

On the left you see the relevant section to recognize exe files.
On the right side you see the first bytes of an exe file.

exe

The null bytes after the "MZ" string seem to be responsible for the missmatch...

@j256
Copy link
Owner

j256 commented May 27, 2014

Cool! Can you post the executable somewhere or tell me which path it is on a "standard" windows computer? I'll add it to my unit tests.

@j256 j256 changed the title findMatch returns null every time problems matching Windows executables May 28, 2014
@overlinden
Copy link
Author

You can find putty.exe here: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
I just tried this version. The executable is not recognized.

The explorer exe located in C:\WIndows is not recognized either.
I think this can be reproduced with any *.exe files.

@j256
Copy link
Owner

j256 commented May 29, 2014

Excellent. Thanks much dude.

@swpalmer
Copy link

Note that the unit test for .exe files is broken.
ContentInfoUtilTest.java has:

                new FileType("/files/x.exe", ContentType.OTHER, "MythTV", null,
                        "MythTV NuppelVideo v (640x480),progressive,aspect:1.00,fps:29.97"),

which is clearly a copy and paste error from a few lines above.

I noticed this after fixing another error that was causing the test to fail on "/files/x.gz". After fixing that issue, the bogus test for x.exe started failing (as it should have been all along).

The problem that caused the x.gz test to fail was in the method findWhitespaceWithoutEscape. It needed the following else clause added to the if statement:

else {
            lastEscape = false; // don't leave escape active after non-whitespace escaped character
}

@j256
Copy link
Owner

j256 commented Sep 20, 2016

This has been fixed in version 1.7. Turned out that I was not handling some specific magic file line patterns correctly.

@j256 j256 closed this as completed Sep 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants