Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold conditional flags into original ones #171

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ZhangZhuoSJTU
Copy link
Collaborator

@ZhangZhuoSJTU ZhangZhuoSJTU commented Jun 28, 2018

Right now, RadecoIL uses the flag registers to indicate selectors for conditional branches, which is undesirable in decompilation or binary analysis. This PR tracks the progress of work on this.

The basic idea is to use graph matcher to find the condition patterns, grep and replace these expressions. Following tasks will be finished in this PR.

  • Handle commutativity. Using SSASorter to sort nodes' operands. It is necessary to keep the invariability through whole analysis process. (Associated with Standardize analysis stages' APIs #172 )
  • Make hash_subtree unified. In our code, there is a few hash_subtree functions which have the similar functionality. It need to be set as a basic API for SSA.
  • Rewrite the query expression in matcher.rs to support capture group, which will help use locate the original condition.
  • Add low-level analyzer for x86 to offer replacing rules for conditional flag register expressions.

This PR is associated with #155

@XVilka
Copy link
Contributor

XVilka commented Jun 29, 2018

@HMPerson1 @kriw @chinmaydd @sushant94 please check this out and write if you have any ideas/suggestions.

@HMPerson1
Copy link
Collaborator

@ZhangZhuoSJTU I don't quite understand the reasoning for doing this. Do you have any specific or motivating examples for what you expect the transformation to do?

@ZhangZhuoSJTU
Copy link
Collaborator Author

ZhangZhuoSJTU commented Jun 30, 2018

@HMPerson1 There might be some situations that we need the original conditional predication, instead of flag predication.

  • For the output of decompilation, we might need something like if (X <= Y) based on variables rather than if (ZF | OF ^ SF) based on flag registers, which makes the understanding difficult.

  • For some analysis, i.e. Value Set Analysis, we might need to know the value range of register in conditional branches. Saying following code, knowing the value range of input a for func1 and func2 would benefit future analysis. If we could set the selector to the opcodes using general registers other than flag registers, these work would be simple.

if (a < 0xdeadbeef)
    func1(a);
else
    func2(a);

@sushant94
Copy link
Collaborator

@ZhangZhuoSJTU I'd like to see explicit notes in the wiki/source code for the algorithm of hash_subtree as that is going to be an important piece that's going to be re-used, perhaps even have a separate PR for it as we'd need more testcases and discussion about sub-tree hashing algorithm.

For the capture groups in matcher.rs, it may be worth looking into how LLVM does this. I'll add a link when I find it again.

@XVilka
Copy link
Contributor

XVilka commented Jul 12, 2018

@ZhangZhuoSJTU ping?

@XVilka
Copy link
Contributor

XVilka commented Sep 13, 2018

@ZhangZhuoSJTU ping?

@ZhangZhuoSJTU
Copy link
Collaborator Author

@XVilka I am not sure whether I can finish this PR before the machine of esil-rs is improved. The interval variable calculated in that repo is buggy and my tests always fail. Will there be anyone who would like to make some progress on esil-rs?

It would be very easy to check the interval variables calculated in the way in that repo, even for x86. A simple test is to check interval variable after cmp eax, ebx.

@XVilka XVilka added this to To do in SSA Oct 29, 2018
@Mm7
Copy link
Contributor

Mm7 commented Dec 12, 2018

Hi @ZhangZhuoSJTU I'd like to help on this PR. Please can elaborate a bit more about what esil-rs is missing / why do you think it is broken? Getting this done would be especially useful to fix #216.

Make hash_subtree unified. In our code, there is a few hash_subtree functions which have the similar functionality. It need to be set as a basic API for SSA.<

Does this still apply? I can only find 2 definitions (git grep hash_subtree: analysis/matcher/gmatch.rs and frontend/containers.rs).

Rewrite the query expression in matcher.rs to support capture group, which will help use locate the original condition.<

By 'support group' you mean something like ((#ZF OpOr #OF) OpXor #SF) ?

Add low-level analyzer for x86 to offer replacing rules for conditional flag register expressions.<

Nice, but I'm wondering whether such an analyzer could be made architecture-agnostic? In theory >= should correspond to the same esil, regardless of the assembly code.

Please be patient, I'm new to radeco/radare, so my questions will probably look dumb. Thank you.

@ZhangZhuoSJTU
Copy link
Collaborator Author

I am so sorry. I am quite busy right now that do not have enough time to finish this PR. But it is really nice that you can help. It's my pressure if I can do anything helpful.

Hi @ZhangZhuoSJTU I'd like to help on this PR. Please can elaborate a bit more about what esil-rs is missing / why do you think it is broken? Getting this done would be especially useful to fix #216.

IMO, the main problem of esil-rs right now is that it cannot support all the esil opcodes, especially for some common ones like GOTO. Additionally, esil-rs still have some problems with esil's internal variables. I cannot remember clearly, but for ==, there might be some problems.

Make hash_subtree unified. In our code, there is a few hash_subtree functions which have the similar functionality. It need to be set as a basic API for SSA.<

Does this still apply? I can only find 2 definitions (git grep hash_subtree: analysis/matcher/gmatch.rs and frontend/containers.rs).

I think it would be great to reduce this into one function.

Rewrite the query expression in matcher.rs to support capture group, which will help use locate the original condition.<

By 'support group' you mean something like ((#ZF OpOr #OF) OpXor #SF) ?

I think what capture group means is the similar thing in regular expression

Add low-level analyzer for x86 to offer replacing rules for conditional flag register expressions.<

Nice, but I'm wondering whether such an analyzer could be made architecture-agnostic? In theory >= should correspond to the same esil, regardless of the assembly code.

It would be wonderful to be architecture-agnostic, but I think it is hard. The reason is that native code would express conditional predicate in different ways. For example

cmp ecx, 0x47
ja $+0x13

Above asm code's esil would look like

71,ecx,==,$z,zf,=,$b32,cf,=,$p,pf,=,$s,sf,=,$o,of,=
cf,zf,|,!,?{,4294973063,rip,=,}

You can see that there is no >= opcode, x86 uses !(zf|cf) to indicate "bigger". This is architecture-dependent, as arm would behave differently.

Please be patient, I'm new to radeco/radare, so my questions will probably look dumb. Thank you.

Of course. Thanks you so much that you would like to help.

@Mm7
Copy link
Contributor

Mm7 commented Dec 13, 2018

I am so sorry. I am quite busy right now that do not have enough time to finish this PR. But it is really nice that you can help. It's my pressure if I can do anything helpful.

Hi @ZhangZhuoSJTU I'd like to help on this PR. Please can elaborate a bit more about what esil-rs is missing / why do you think it is broken? Getting this done would be especially useful to fix #216.

IMO, the main problem of esil-rs right now is that it cannot support all the esil opcodes, especially for some common ones like GOTO. Additionally, esil-rs still have some problems with esil's internal variables. I cannot remember clearly, but for ==, there might be some problems.

Make hash_subtree unified. In our code, there is a few hash_subtree functions which have the similar functionality. It need to be set as a basic API for SSA.<

Does this still apply? I can only find 2 definitions (git grep hash_subtree: analysis/matcher/gmatch.rs and frontend/containers.rs).

I think it would be great to reduce this into one function.

Rewrite the query expression in matcher.rs to support capture group, which will help use locate the original condition.<

By 'support group' you mean something like ((#ZF OpOr #OF) OpXor #SF) ?

I think what capture group means is the similar thing in regular expression

OK, I'll give it a try!

Add low-level analyzer for x86 to offer replacing rules for conditional flag register expressions.<

Nice, but I'm wondering whether such an analyzer could be made architecture-agnostic? In theory >= should correspond to the same esil, regardless of the assembly code.

It would be wonderful to be architecture-agnostic, but I think it is hard. The reason is that native code would express conditional predicate in different ways. For example

cmp ecx, 0x47
ja $+0x13

Above asm code's esil would look like

71,ecx,==,$z,zf,=,$b32,cf,=,$p,pf,=,$s,sf,=,$o,of,=
cf,zf,|,!,?{,4294973063,rip,=,}

You can see that there is no >= opcode, x86 uses !(zf|cf) to indicate "bigger". This is architecture-dependent, as arm would behave differently.

Yes,!(zf|cf) is platform-specific, but if you replace the usage of zf and cf with their definitions you can remove all the references to platform-specific flags.
Let me rewrite your example code into a pseudo-assembly:

$z = zf = ecx == 71
cf = $b32
(we don't care about p,s,o flags)

Let's replace $b32 with (ecx - 71) < 0 (I'm not very familiar with esil, correct me if I'm mistaken).

$z = zf = ecx == 71
cf = (ecx - 71) < 0

Let's replace zf and cf in the jump:

jump if !((ecx - 71) < 0 | ecx == 71)

Then we just need to grep !( ( X - Y ) < 0 | X == Y ) and replace it with the more useful X > Y.
We would need a set of substitutions like this to handle all the cases (X<Y, X<=Y,...).

jump if ecx > 71

As you can see from this simple example we don't really need to know all the obscure architecture specific flags. Some platform agnostic rules (e.g. !((X-Y)<0|X==Y) --> X>Y) could be applied to replace arithmetic expressions to the equivalent relational expression.

I'm assuming that $b32 can always be replaced with (X - Y) < 0, regardless of the assembly instruction for which it was emitted and the CPU type.

@ZhangZhuoSJTU
Copy link
Collaborator Author

@Mm7 It sounds interesting.

By the way, I remember another shortage of esil-rs, that is it didn't handle bits in internal variables, e.g. $b63 and $b8 might be same in esil-rs

As for the platform-specific issue. I think it is a good idea. One mistake is that $b32 should be replaced by (ecx & 0x7fffffff) - (71 & 0x7fffffff) < 0. And I agree this would work for x86, because as you will see, esil's internal variables are very similar with x86. I am concern whether it would work on other architectures. But anyway, the first stuff is to make it work.

@ZhangZhuoSJTU
Copy link
Collaborator Author

@Mm7

BTW, maybe we need to do some simplify work after replacing. As the above example, the replacing result would look like

jump if !((rcx & 0xffffffff & 0xffffffff - 71 & 0xffffffff) < 0 | rcx & 0xffffffff == 71 & 0xffffffff)

Another thing is that you may need to think about is how to replace $of and $pf. But $pf is not an urgent task. Because as I know, x86's predicate would not use PF register in conditional jump. $of is more important.

Besides that, It would be very helpful if you can check how radare2 handle internal variables. It would benefit a lot to understand how internal variables work and be a good startup of this PR.

@Mm7 Mm7 mentioned this pull request Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
SSA
  
To do
Development

Successfully merging this pull request may close these issues.

None yet

5 participants