Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction bytes change between versions of the executable #1

Open
KulaGGin opened this issue Feb 16, 2022 · 3 comments
Open

Instruction bytes change between versions of the executable #1

KulaGGin opened this issue Feb 16, 2022 · 3 comments

Comments

@KulaGGin
Copy link

KulaGGin commented Feb 16, 2022

Sigmaker generates correct byte pattern but the byte sequence for the instruction itself changes. In my case byte sequence for the instruction mov rdx, rcx changed from 48 89 CA to 48 8B D1:
ida64_1Mdcgvkuyz

And so when I generate a signature in the version on the left, Sigmaker generates 48 89 CA C1 E8 04 pattern, and it won't find it in the version on the right.

Don't know how often it happens in the wild.

An obvious fix would be to replace all bytes to ?? for instructions that can be represented by different byte sequences. Not sure if it's easy enough or possible to just ask some assembler "can this instruction be represented by multiple byte sequences?"

@kweatherman
Copy link
Owner

To be clear, when you say "changes" do you mean between an incremental/updated version of the same target executable?

@KulaGGin
Copy link
Author

To be clear, when you say "changes" do you mean between an incremental/updated version of the same target executable?

Yes.

@kweatherman
Copy link
Owner

kweatherman commented Feb 16, 2022

Therein lies the rub.
I know you do research and what not into signaturing, binary diffing, etc., so you might have thought about a lot of this already.

To do this procedurally/programmatically I see a few possible solutions (more like research directions) depending on the use case.
AFAIK the two main use cases for game hacking (game exploiting, modding, etc.) is:

  1. To grab new offsets from one executable module to another for updates.
  2. To get offsets to functions and/or data at runtime dynamically.

For the first case, assuming one is still in IDA, we have the luxury of having all the disassembler data.
If one has HexRays there is some kind of intermediate representation (IR) available.
Otherwise maybe one could use mcsema or similar to lift into LLVM maybe.
And/or up to using more advanced matching techniques using call graphs, etc.
Then develop a custom signature format that has more details to enable some sort of fuzzy matching.
IDA does internally have a classification (from decode_insn()) where it basically lumps a bunch of x86 opcodes into a type like NN_call or NN_jmp, etc.. Maybe this combined with saving unique opprand values would be enough for a fuzzy matching system.

For the 2nd use case, since we probably shooting for speed and probably don't want to take the time to do disassembly, dynamic code analysis, etc., then we're more restricted on a solution.

The goal for a good tool is to automate as much of these things as possible.
I think it's in the realm of doing statistical analysis over a corpus of before and after binary updates, learning how compiler(s) of interest make binary code variations based on code, changes, etc., that this could somehow be automated. So while making the signatures the code would know where and when to replace some bytes of a signature with ?? wildcards since it could predict which parts of instructions are likely to change.
With a setup for proper automated feature extraction, maybe could be modeled into a machine learning method.
Probably a decent sized research project to see if this can even be done (plus probably learn a lot in the process of doing this).

Practically, and for the time being until research is done on one of these solutions for the use cases:
What I do, and what I found that others do when asked about use case details, people just end up manually tweaking the signatures. Loosening them up with more wildcards.
They will compare byte by byte what still matches, pray that the signature will still be unique in the end, and just replace more bytes with wildcards.
So in your example the only matching between the two cases is 0x48 so it would need to be "48 ?? ??" to still match.
If the signature is no longer unique after the change(s), then one can probably, hopefully, extend more bytes to the signature length until it is again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants