New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better modularisation #124

Open
0xdevalias opened this Issue Jun 2, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@0xdevalias

0xdevalias commented Jun 2, 2016

First off, love the project and the work that's gone into it, SO useful.

I've been working on a project recently where a binary file (dll) was read as UTF8, and so invalid bytes were corrupted with the utf-8 replacement character. As you could imagine, this makes it rather hard to parse.

I've been slowly making progress with pefile and a ton of reference material, but it's highlighted a point to me that I thought I would raise.

As it currently is, pefile is REALLY good at 'point and click' for a valid binary file, but nowhere near as simple to use piecemeal as an analysis toolkit. Since it will basically error out at the first problem it finds, it never really gives you a chance to properly use what it's already parsed/manually correct things before trying to parse further.

So I decided maybe I could just call each relevant section directly, and build up my own 'parse'. In doing so, I found that the PE class has a lot of interwoven dependencies that make it quite hard to pick out the individual pieces needed (without copy/pasting and hacking at it) to parse individual sections. Using Structure directly makes this somewhat easier, but still less than ideal.

What I think would end up being awesome, is modularising the relevant parsing sections (including the little checks/fixes along the way), and making them callable without needing an instance of PE. That way the main __parse__ would just end up with a bunch of function calls like parse_dos_header, check_dos_header, parse_nt_header, etc. Each would be a static function that takes all the data it needs as parameters, and returns everything relevant. To maintain the current functionality, the __parse__ function could then update the relevant vars within the PE instance.

This is getting a little long and is a bit mind dumpy, so i'll leave it here, but more than happy to further discuss how I think this would work/help implement it.

@0xdevalias 0xdevalias changed the title from Better modularisation? to Better modularisation Jun 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment