In the last article, we explore the basics of radare2. We are going to write a simple program, and then disassemble it, to see what is really doing in the processor.
Not all architectures have the same set of instructions, the most important difference is between Reduced Instruction Set Computing (embedded systems, PMDs ...) and Complex Instruction Set Computing (clusters, desktop computing ...). An example of RISC could be ARM and CISC could be x86.
Radare2 is an open source set of tools for reverse engineering and analysis of binary files (among other things, for example debugging). In this article we will cover two: rasm2 and r2.
It is used to assemble or disassemble files or hexpair strings. You can see the help screen:
$ rasm2 -h
Usage: rasm2 [-ACdDehLBvw] [-a arch] [-b bits] [-o addr] [-s syntax]
[-f file] [-F fil:ter] [-i skip] [-l len] 'code'|hex|-
-a [arch] Set architecture to assemble/disassemble (see -L)
-A Show Analysis information from given hexpairs
-b [bits] Set cpu register size (8, 16, 32, 64) (RASM2_BITS)
-c [cpu] Select specific CPU (depends on arch)
-C Output in C format
-d, -D Disassemble from hexpair bytes (-D show hexpairs)
-e Use big endian instead of little endian
-E Display ESIL expression (same input as in -d)
-f [file] Read data from file
-F [in:out] Specify input and/or output filters (att2intel, x86.pseudo, ...)
-h, -hh Show this help, -hh for long
-i [len] ignore/skip N bytes of the input buffer
-k [kernel] Select operating system (linux, windows, darwin, ..)
-l [len] Input/Output length
-L List Asm plugins: (a=asm, d=disasm, A=analyze, e=ESIL)
-o [offset] Set start address for code (default 0)
-O [file] Output file name (rasm2 -Bf a.asm -O a)
-p Run SPP over input for assembly
-s [syntax] Select syntax (intel, att)
-B Binary input/output (-l is mandatory for binary input)
-v Show version information
-w What's this instruction for? describe opcode
-q quiet mode
If '-l' value is greater than output length, output is padded with nops
If the last argument is '-' reads from stdin
Environment:
RASM2_NOPLUGINS do not load shared plugins (speedup loading)
R_DEBUG if defined, show error messages and crash signal
The option -d
, disassemble from hexpair bytes. For example, 90 corresponds to a nop operation. To disassemble hexadecimal code, type:
$ rasm2 -d <hexadecimal>
$ rasm2 -d 90
If you want to get the hexadecimal code of an instruction:
$ rasm2 "<instruction>"
$ rasm2 "nop"
let's write a simple code that adds two variables.
$ cat test.c
#include <stdio.h>
int main()
{
int a = 10;
int b = 20;
int c = a+b;
return 0;
}
$ gcc -o test test.c
Once we have the binary file, let's disassemble it.
$ r2 -vv
radare2 1.5.0 0 @ darwin-x86-64 git.1.5.0
$ r2 test
At this point, analyze the whole code: aa
(analyze all).
Let's see the main function: pdf @ main
(print disassemble function)
;-- main:
;-- section.0.__TEXT.__text:
;-- func.100000f70:
/ (fcn) entry0 38
| entry0 ();
| ; var int local_10h @ rbp-0x10
| ; var int local_ch @ rbp-0xc
| ; var int local_8h @ rbp-0x8
| ; var int local_4h @ rbp-0x4
| 0x100000f70 55 push rbp ; section 0 va=0x100000f70 pa=0x00000f70 sz=38 vsz=38 rwx=m-r-x 0.__TEXT.__text
| 0x100000f71 4889e5 mov rbp, rsp
| 0x100000f74 31c0 xor eax, eax
| 0x100000f76 c745fc000000. mov dword [local_4h], 0
| 0x100000f7d c745f80a0000. mov dword [local_8h], 0xa
| 0x100000f84 c745f4140000. mov dword [local_ch], 0x14
| 0x100000f8b 8b4df8 mov ecx, dword [local_8h]
| 0x100000f8e 034df4 add ecx, dword [local_ch]
| 0x100000f91 894df0 mov dword [local_10h], ecx
| 0x100000f94 5d pop rbp
\ 0x100000f95 c3 ret
The first two instructions are called prologue:
push rbp
save the old base pointer in the stack to restore it latermov rbp, rsp
copy the stack pointer to the base pointer
Now the base pointer points to the main frame.
mov dword [local_4h], 0
load 0 into rbp-4mov dword [local_8h], 0xa
load 10 into rbp-8mov dword [local_ch], 0x14
load 20 into rpb-12
The size of an integer in C is 4 bytes (32 bits), that's the reason why the pointer decrements in 4 (the stack grows downward). The first instruction simply say: laod value 0 below the base pointer, the second instruction says: load value 10 (0xa) below the previous value, the third instruction says: laod value 0x14(20) below the previous value. We have pushed the variable values into the stack.
mov ecx, dword [local_8h]
load value 10 into ecxadd ecx, dword [local_ch]
add ecx, rbp-12 and store result in ecxmov dword [local_10h], ecx
load the result into rbp-16.
We load the values into general purpose registers, to perform the ALU operation (add). Finally we store the sum result below rbp-12.
The last two instructions are called epilogue. We pop the old base pointer off the stack and store it in rbp, then we jump to the return address (which is also in the stack).
pop rbp
ret
A final note: The assembly code generated is different depending on the compiler and system.