Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for textual ir #126

Merged
merged 13 commits into from Mar 16, 2018
Merged

Add parser for textual ir #126

merged 13 commits into from Mar 16, 2018

Conversation

HMPerson1
Copy link
Collaborator

Currently, it can roundtrip all the information that was already saved by ir_writer. There's probably still some information that is lost in the process (register state?) but I'm not sure what.

Will fix #117 when done.

@HMPerson1
Copy link
Collaborator Author

There seems to be a weird bug where occasionally an SSA value is just not emitted. It can be seen in when running minidec on ex-bins/key in the function sym._init:

define-fun sym._init(unknown) -> unknown {
    registers: $r15,$r14,$r13,$r12,$rbp,$rbx,$r11,$r10,$r9,$r8,$rax,$rcx,$rdx,$rsi,$rdi,$rip,$cs,$cf,$pf,$af,$zf,$sf,$tf,$if,$df,$of,$rsp,$ss,$fs_base,$gs_base,$ds,$es,$fs,$gs;
    bb_0x400438.0000():
        [@0x400438.0001] %1: $Unknown64(*?) = {rsp} - #x8;
        [@0x40043C.0003] %2: $Unknown64(*?) = Load({mem}, #x600ff8);
        [@0x400443.0000] %3: $Unknown64(*?) = %2 & %2;
        [@0x400443.0002] %4: $Unknown64 = %3 - #x0;
        [@0x400443.0004] %5: $Unknown64(*?) = %4 & #xffffffffffffffff;
        [@0x400443.0006] %6: $Unknown64(*?) = #x1 ^ %5;
        [@0x400443.0008] %7: $Unknown1(*?) = Narrow1(%6);
        [@0x400443.000A] %8: $Unknown64(*?) = %4 & #xff;
        [@0x400443.000C] %9: $Unknown64 = #x101010101010101 * %8;
        [@0x400443.000E] %10: $Unknown64(*?) = %9 & #x8040201008040201;
        [@0x400443.0010] %11: $Unknown64 = %10 % #x1ff;
        [@0x400443.0012] %12: $Unknown64(*?) = %11 & #x1;
        [@0x400443.0014] %13: $Unknown1(*?) = Narrow1(%12);
        [@0x400443.0018] %14: $Unknown64(*?) = %4 >> #x3f;
        [@0x400443.001A] %15: $Unknown1(*?) = Narrow1(%14);
        JMP IF %7 0x40044D.0000
    bb_0x400448.0000():
        [@0x400448.0024] CALL #x400490($r15={r15}, $r14={r14}, $r13={r13}, $r12={r12}, $rbp={rbp}, $rbx={rbx}, $r11={r11}, $r10={r10}, $r9={r9}, $r8={r8}, $rax=%2, $rcx={rcx}, $rdx={rdx}, $rsi={rsi}, $rdi={rdi}, $rip={rip}, $cs={cs}, $cf=%16, $pf=%13, $af={af}, $zf=%7, $sf=%15, $tf={tf}, $if={if}, $df={df}, $of=%17, $rsp=%1, $ss={ss}, $fs_base={fs_base}, $gs_base={gs_base}, $ds={ds}, $es={es}, $fs={fs}, $gs={gs});
        ...

In the last call operation, $cf is set to the value %16, but %16 was never defined.

After some debugging, I found that ir_writer never emitted that value because SSAStorage.inorder_walk skipped that node because it was never added into a block.

If we want to round-trip IR through text, we have two options to fix this:

  1. Rewrite ir_writer so that it doesn't depend on values being in blocks.
  2. Say that it is invalid to have values that aren't in a block and check this in verifier::verify before it gets to ir_writer.

Option 1 is certainly more robust, but since values not being in blocks kinda breaks inorder_walk in a sense (and also bfs_walk), it might be better to go with option 2.

@sushant94
Copy link
Collaborator

@HMPerson1 This looks great! I'll be trying this out today and merging it 👍

For register state, I'd like an additional line at the beginning of every function to "define" registers, much like declaring variables in source languages. This can be parsed to generate the register state later on.

One thing I'd like to see are a few testcases, nothing too complicated -- just something that takes handwritten IR, generates the ssa structure and then assertions to check the lifting is correct.

@HMPerson1
Copy link
Collaborator Author

I've added some test cases.

For register state, I already changed ir_writer to emit a line to "define" the registers and the register state at the very end of the function is roundtripped, but what I was confused about was the register state for all the other basicblocks. It seems like the register state for every basic block is recorded in SSAStorage (and can be retrieved with registers_in) but only that of the exit node is emited in ir_writer.

Also, could you comment on this problem?

@sushant94
Copy link
Collaborator

@HMPerson1 The question is, why is %16 not a part of the block? Does it belong to some other block? Did you look into this? If not, that needs some investigation.

@HMPerson1
Copy link
Collaborator Author

HMPerson1 commented Mar 10, 2018

It looks like it happens in constant propagation, in emit_ssa. When creating a new node for a OpNarrowed constant, it only copies over the address of the old node but not which block the old node was in:

let new_node = if w == 64 {
    const_node
} else {
    let address = self.g.address(*k).unwrap_or_else(..);
    let opcode = MOpcode::OpNarrow(w as u16);
    let new_node = self.g.insert_op(opcode, ndata.vt, None).unwrap_or_else(..);
    self.g.set_address(new_node, address);
    self.g.op_use(new_node, 0, const_node);
    new_node
};
self.g.replace_value(*k, new_node);

If I replace set_address with insert_into_block then it works again.

@@ -363,86 +373,59 @@ impl IRWriter {
let line = match ssa.g[node] {
NodeData::Op(ref opcode, vt) => {
if let MOpcode::OpConst(_) = *opcode {
radeco_err!("found const");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any const nodes in inorder_walk since all the const nodes should be in a separate table and not part of any particular block.

Do you think it might be better as radeco_warn instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, then is ok. Sorry, oversaw that.

Also made some changes to `ir_writer` so that the SSA can be round-tripped.
`minidec` also now checks that `ir_reader` can correctly read back whatever
`ir_writer` wrote.
`remove_value` will unconditionally remove all edges adjacent to the given node
and then remove it, when really we just want to remove the selector edge.
@HMPerson1 HMPerson1 force-pushed the master branch 2 times, most recently from b27cdb8 to 64a16e4 Compare March 13, 2018 01:21
@sushant94
Copy link
Collaborator

@HMPerson1 How close is this to be merged? Most of the PR lgtm, if this can be merged, I'll give it another pass and merge it in.

@HMPerson1
Copy link
Collaborator Author

I think it's done now. Would you prefer a merge to fix the conflicts or a rebase?

@sushant94
Copy link
Collaborator

Either one is fine, rebase could be preferable.

@HMPerson1 HMPerson1 changed the title [WIP] Add parser for textual ir Add parser for textual ir Mar 16, 2018
@sushant94 sushant94 merged commit 533a269 into radareorg:master Mar 16, 2018
@sushant94
Copy link
Collaborator

sushant94 commented Mar 16, 2018

Great! SGTM, merging this in :)

@radare
Copy link
Collaborator

radare commented Mar 16, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing text based radeco-ir into graph representation
4 participants