Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE imphash does not match YARA, VirusTotal, pefile #299

Closed
jshlbrd opened this issue May 31, 2019 · 7 comments
Closed

PE imphash does not match YARA, VirusTotal, pefile #299

jshlbrd opened this issue May 31, 2019 · 7 comments
Assignees

Comments

@jshlbrd
Copy link

jshlbrd commented May 31, 2019

Describe the bug
The imphash calculated by lief.PE.get_imphash() does match the imphash calculated by other tools. Here's an example:

executable SHA256: ad3722ab9dc9ad41a0e50122423737c241f98cc7374b4ddac999ed6eda4cfe9c
YARA imphash: 06694565e94cd10f48e1e4b90bc04bc2
VirusTotal imphash: 06694565e94cd10f48e1e4b90bc04bc2
pefile imphash: 06694565e94cd10f48e1e4b90bc04bc2
lief imphash: 0ffe645e98030f6b53caa49d22180504

To Reproduce
Compare the output by lief.PE.get_imphash() to other tools mentioned above.

Expected behavior
The imphash output of lief.PE.get_imphash() matches other tools commonly used in the industry.

Environment (please complete the following information):

  • System and Version : Ubuntu 18.04
  • Target format: PE
  • LIEF commit version: 0.9.0-a448c5e
@jshlbrd
Copy link
Author

jshlbrd commented May 31, 2019

The following code shown below produces the same imphash as the other tools for the file sample above. I suggest that the get_imphash code be reviewed to ensure it is following the standard defined by Mandiant here as there may be discrepancies between lief and the other tools:

Mandiant's imphash convention requires the following:

Resolving ordinals to function names when they appear
Converting both DLL names and function names to all lowercase
Removing the file extensions from imported module names
Building and storing the lowercased string . in an ordered list
Generating the MD5 hash of the ordered list
import hashlib

import lief

exe = lief.PE.parse('ad3722ab9dc9ad41a0e50122423737c241f98cc7374b4ddac999ed6eda4cfe9c')

imp_list = []
for imp in exe.imports:
    imp = lief.PE.resolve_ordinals(imp)  # Resolve ordinals to function names when they appear
    imp_name = imp.name.lower()  # Convert DLL names to lowercase
    imp_name = imp_name.rsplit('.')[0]  # Remove file extensions from imported module names
    for entry in imp.entries:
        if entry.is_ordinal:
            ordinal = entry.ordinal.lower()  # Convert function names to lowercase
            imp_list.append(f'{imp_name}.{ordinal}')
        else:
            name = entry.name.lower()  # Convert function names to lowercase
            imp_list.append(f'{imp_name}.{name}')

print(hashlib.md5(','.join(imp_list).encode()).hexdigest())

Note that I get inconsistent results when testing with another file:

executable SHA256: 80e9bdfcb3bfb3800c202efcdfbb286a2b89d0bf2b8d94f2727d117b0013c821
YARA imphash: 57e98d9a5a72c8d7ad8fb7a6a58b3daf
VirusTotal imphash: 57e98d9a5a72c8d7ad8fb7a6a58b3daf
pefile imphash: 57e98d9a5a72c8d7ad8fb7a6a58b3daf
lief get_imphash: 65f1dba6c9228f668cbe607a03a3bbfd
lief parsed import code above: e2820ab424c5ee354b90bce9ac57b383

@jshlbrd
Copy link
Author

jshlbrd commented Jun 18, 2019

Looking further into this, it appears that the discrepancy comes from LIEF having a more up-to-date ordinal table mapping than pefile: https://github.com/lief-project/LIEF/tree/master/src/PE/utils/ordinals_lookup_tables

Here is pefile's table for reference: https://github.com/erocarrera/pefile/tree/master/ordlookup

I'm not sure how you resolve this without getting someone from Mandiant involved to provide guidance.

@romainthomas
Copy link
Member

Hello,
Sorry for the delay, but yes LIEF tries to resolve some imports by ordinal while computing the imphash.
I don't know how to properly address this issue if we use an up-to-date ordinal table.

@jshlbrd
Copy link
Author

jshlbrd commented Jun 25, 2019

@romainthomas No problem. Based on some private conversations I've had, I believe the best way to move forward with this is to treat LIEF's imphash calculation as its own implementation of the imphash spec. VirusTotal, YARA, and pefile may be using their own variations of the imphash spec and any changes among them will break backward compatibility. I'd suggest that if users really want the other specs, then they can code them in themselves (YARA's ordinals can be found here, pefile's ordinals can be found here, no clue what VirusTotal uses but I think it may be some version of pefile).

I'd also suggest we leave this issue open in case any folks from Mandiant would like to add their thoughts.

@williballenthin
Copy link

@jshlbrd that seems reasonable. though, i'd recommend that we document clearly that LIEF imphash != pefile imphash != XXX imphash.

chatting with people internally, it sounds like there are no plans to further tweak the algorithm. i think the feeling is that the algorithm works well as-is, and though updates could be made to the ordinal mapping, the algorithm is still deterministic. practically speaking, if this mapping is updated, then everyone that relies on the implementation must re-index their dataset.

regarding what we use and to quote a colleague:

I think everybody takes the definition used by VT / pefile.py as the official version, just by consensus and because VT is used by so many people.

@romainthomas
Copy link
Member

Ok, sounds good for me.

romainthomas pushed a commit that referenced this issue Jun 26, 2019
@evandrix
Copy link

#299 (comment)

Describe the bug
The imphash calculated by lief.PE.get_imphash() does match the imphash calculated by other tools. Here's an example:

executable SHA256: ad3722ab9dc9ad41a0e50122423737c241f98cc7374b4ddac999ed6eda4cfe9c
YARA imphash: 06694565e94cd10f48e1e4b90bc04bc2
VirusTotal imphash: 06694565e94cd10f48e1e4b90bc04bc2
pefile imphash: 06694565e94cd10f48e1e4b90bc04bc2
lief imphash: 0ffe645e98030f6b53caa49d22180504

Save
To Reproduce
Compare the output by lief.PE.get_imphash() to other tools mentioned above.

Expected behavior
The imphash output of lief.PE.get_imphash() matches other tools commonly used in the industry.

Environment (please complete the following information):

  • System and Version : Ubuntu 18.04
  • Target format: PE
  • LIEF commit version: 0.9.0-a448c5e

sample attached
00413413a221123517b4e1e5d173a5310b9a48fc.bin.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants