# Binary and Reverse Engineering 
- reverse engineering of binary program is a popular skill in malware analysis
- as most malware programs are binary writtern in C/C++ programming languages, they needed to be reverse engineered to understand the functinalities of malware under the hood

## Executable and Linkable Format (ELF)
- https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
- common format for executables files, object code, shared libraries, and core dumps
![ELF](./media/ELF.png)
- an ELF file has two views: the program header shows the segments used at run time, whereas the section header lists the set of sections of the binary.
- let's compile hello.cpp program provided in demos folder and examine ELF format using various tools and commands

In [None]:
! cat ../demos/hello.cpp

In [None]:
! cat ../demos/hello.c

In [5]:
! g++ -m32 -o hello.exe ../demos/hello.cpp

In [6]:
! gcc -m32 -o hello_c.exe ../demos/hello.c

In [None]:
! g++ -m32 -o memory_segments.exe ../demos/memory_segments.cpp

In [None]:
! ./hello.exe

In [None]:
! ./hello_c.exe

In [None]:
! ls -al hello.exe

In [None]:
! ls -al hello_c.exe

In [None]:
! cat hello.exe

In [None]:
! cat hello_c.exe

## file utility
- displays some information about ELF files

In [7]:
! file ../demos/hello.cpp

../demos/hello.cpp: C++ source, ASCII text


In [3]:
! file ../demos/hello.c

../demos/hello.c: C source, ASCII text


In [8]:
! file hello.exe

hello.exe: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=8b33472863baba15aba327716ea002fcdb58fb39, for GNU/Linux 3.2.0, not stripped


In [9]:
! file hello_c.exe

hello_c.exe: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=0ecfcf2ec259106a7a93642771efb3c5fd26913d, for GNU/Linux 3.2.0, not stripped


In [10]:
# display hex and ASCII in two columns
! hexdump -C hello.exe

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  70 10 00 00 34 00 00 00  |........p...4...|
00000020  9c 37 00 00 00 00 00 00  34 00 20 00 0b 00 28 00  |.7......4. ...(.|
00000030  1e 00 1d 00 06 00 00 00  34 00 00 00 34 00 00 00  |........4...4...|
00000040  34 00 00 00 60 01 00 00  60 01 00 00 04 00 00 00  |4...`...`.......|
00000050  04 00 00 00 03 00 00 00  94 01 00 00 94 01 00 00  |................|
00000060  94 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 18 05 00 00  18 05 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 08 02 00 00  08 02 00 00 05 00 00 00  |................|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 f8 00 00 00  f8 00

In [11]:
! hexdump -C hello_c.exe

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  60 10 00 00 34 00 00 00  |........`...4...|
00000020  d8 35 00 00 00 00 00 00  34 00 20 00 0b 00 28 00  |.5......4. ...(.|
00000030  1e 00 1d 00 06 00 00 00  34 00 00 00 34 00 00 00  |........4...4...|
00000040  34 00 00 00 60 01 00 00  60 01 00 00 04 00 00 00  |4...`...`.......|
00000050  04 00 00 00 03 00 00 00  94 01 00 00 94 01 00 00  |................|
00000060  94 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 d4 03 00 00  d4 03 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 e4 01 00 00  e4 01 00 00 05 00 00 00  |................|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 14 01 00 00  14 01

## ELF file parts

## Symbols
- function names, e.g., if printf built-in function is used, how does the program find it?

## Sections
- symbols are organized into **sections** - code lives in one section (.text) and data lives in another (.data, .rodata)

## Segments
- sections are organized into **segments**

### Examine various sections of ELF
- let's compile demos/hello.cpp file
- use the compiled ELF file to examine various sections

### readelf and objdump
- these utilities can help us look at various parts

### look at all the symbols of a binary
- important symbols to note: main, _start, puts

In [None]:
! readelf --symbols hello.exe

In [None]:
! readelf --symbols hello_c.exe

### display all the sections
- some important sections are: .text, .rodata, .data, .bss

In [None]:
! readelf --sections hello.exe

## look at just one section, e.g., .rodata 
- read-only data is stored in .rodata, e.g. literal values (Hello World!)

In [None]:
! readelf --sections hello_c.exe

In [None]:
# let's look at just the .rodata section of hello program
! readelf -x .rodata hello.exe

In [None]:
# let's look at the .rodata section of memory_segment.exe program
! readelf -x .rodata memory_segments.exe

In [None]:
! readelf --sections hello_c.exe

### objdump program 
- objdump can also be used to examine each program sections

In [None]:
! objdump -s -j .rodata hello.exe

In [None]:
! readelf -x .data memory_segments.exe
# Note: even though global_initialized_var = 5; we see in hex but not ascii

In [None]:
! readelf -x .bss memory_segments.exe

### look at the segments
- GNU_STACK is important to note
    - RW - Read and Write; NO Execute
    - data in stack will be treated as literal values or just data but not code!

In [None]:
! readelf --segments hello_c.exe

## Disassemble using objdump
- look at the assembly code of the whole binary
- by default, objdump shows AT&T assembly syntax with %, \$
    - source before the destination
    - e.g., `mov $5, %eax`
- https://en.wikipedia.org/wiki/X86_assembly_language

In [12]:
! objdump -d hello.exe


hello.exe:     file format elf32-i386


Disassembly of section .init:

00001000 <_init>:
    1000:	53                   	push   %ebx
    1001:	83 ec 08             	sub    $0x8,%esp
    1004:	e8 97 00 00 00       	call   10a0 <__x86.get_pc_thunk.bx>
    1009:	81 c3 eb 2f 00 00    	add    $0x2feb,%ebx
    100f:	8b 83 f8 ff ff ff    	mov    -0x8(%ebx),%eax
    1015:	85 c0                	test   %eax,%eax
    1017:	74 02                	je     101b <_init+0x1b>
    1019:	ff d0                	call   *%eax
    101b:	83 c4 08             	add    $0x8,%esp
    101e:	5b                   	pop    %ebx
    101f:	c3                   	ret

Disassembly of section .plt:

00001020 <__libc_start_main@plt-0x10>:
    1020:	ff b3 04 00 00 00    	push   0x4(%ebx)
    1026:	ff a3 08 00 00 00    	jmp    *0x8(%ebx)
    102c:	00 00                	add    %al,(%eax)
	...

00001030 <__libc_start_main@plt>:
    1030:	ff a3 0c 00 00 00    	jmp    *0xc(%ebx)
    1036:	68 00 00 00 00

In [13]:
# display 20 lines after each matching line main. of hello program
! objdump -D hello.exe | grep -A20 main.:

0000119d <main>:
    119d:	8d 4c 24 04          	lea    0x4(%esp),%ecx
    11a1:	83 e4 f0             	and    $0xfffffff0,%esp
    11a4:	ff 71 fc             	push   -0x4(%ecx)
    11a7:	55                   	push   %ebp
    11a8:	89 e5                	mov    %esp,%ebp
    11aa:	53                   	push   %ebx
    11ab:	51                   	push   %ecx
    11ac:	e8 ef fe ff ff       	call   10a0 <__x86.get_pc_thunk.bx>
    11b1:	81 c3 43 2e 00 00    	add    $0x2e43,%ebx
    11b7:	83 ec 08             	sub    $0x8,%esp
    11ba:	8d 83 14 e0 ff ff    	lea    -0x1fec(%ebx),%eax
    11c0:	50                   	push   %eax
    11c1:	8b 83 f0 ff ff ff    	mov    -0x10(%ebx),%eax
    11c7:	50                   	push   %eax
    11c8:	e8 73 fe ff ff       	call   1040 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
    11cd:	83 c4 10             	add    $0x10,%esp
    11d0:	83 ec 08             	sub    $0x8,%esp
    11d3:	8b 93 ec ff ff ff    	mov    -0x14(%eb

### disassemble in Intel syntax
- much cleaner
- destination before source 
    - e.g., `mov eax, 5`

In [15]:
! objdump -M intel -D hello.exe


hello.exe:     file format elf32-i386


Disassembly of section .interp:

00000194 <.interp>:
 194:	2f                   	das
 195:	6c                   	ins    BYTE PTR es:[edi],dx
 196:	69 62 2f 6c 64 2d 6c 	imul   esp,DWORD PTR [edx+0x2f],0x6c2d646c
 19d:	69 6e 75 78 2e 73 6f 	imul   ebp,DWORD PTR [esi+0x75],0x6f732e78
 1a4:	2e 32 00             	xor    al,BYTE PTR cs:[eax]

Disassembly of section .note.gnu.build-id:

000001a8 <.note.gnu.build-id>:
 1a8:	04 00                	add    al,0x0
 1aa:	00 00                	add    BYTE PTR [eax],al
 1ac:	14 00                	adc    al,0x0
 1ae:	00 00                	add    BYTE PTR [eax],al
 1b0:	03 00                	add    eax,DWORD PTR [eax]
 1b2:	00 00                	add    BYTE PTR [eax],al
 1b4:	47                   	inc    edi
 1b5:	4e                   	dec    esi
 1b6:	55                   	push   ebp
 1b7:	00 8b 33 47 28 63    	add    BYTE PTR [ebx+0x63284733],cl
 1bd:	ba ba 15 ab a3       	mov    edx,

In [17]:
! objdump -M intel -D hello.exe | grep -A20 main.:

0000119d <main>:
    119d:	8d 4c 24 04          	lea    ecx,[esp+0x4]
    11a1:	83 e4 f0             	and    esp,0xfffffff0
    11a4:	ff 71 fc             	push   DWORD PTR [ecx-0x4]
    11a7:	55                   	push   ebp
    11a8:	89 e5                	mov    ebp,esp
    11aa:	53                   	push   ebx
    11ab:	51                   	push   ecx
    11ac:	e8 ef fe ff ff       	call   10a0 <__x86.get_pc_thunk.bx>
    11b1:	81 c3 43 2e 00 00    	add    ebx,0x2e43
    11b7:	83 ec 08             	sub    esp,0x8
    11ba:	8d 83 14 e0 ff ff    	lea    eax,[ebx-0x1fec]
    11c0:	50                   	push   eax
    11c1:	8b 83 f0 ff ff ff    	mov    eax,DWORD PTR [ebx-0x10]
    11c7:	50                   	push   eax
    11c8:	e8 73 fe ff ff       	call   1040 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
    11cd:	83 c4 10             	add    esp,0x10
    11d0:	83 ec 08             	sub    esp,0x8
    11d3:	8b 93 ec ff ff ff    	mov    edx,DWORD PT

## Hex Editor
- hex editor is used modify binary and its contents
- Google online hexeditor better than CLI hexeditor provided by Kali
    - https://hexed.it/ is pretty good one!
- compile and edit demos/system.cpp program to spawn a shell
- search and replace "clear" with "73 68 00 00 00" (sh)

In [18]:
! g++ -o program.exe ../demos/system.cpp

In [20]:
! cat ../demos/system.cpp

#include <iostream>
#include <cstdlib>

using namespace std;

int main() {
    cout << "Hello World!\n";
    system("clear");
    cout << "good bye!" << endl;
    return 0;
}

In [19]:
! ./program.exe
# run the program from terminal for better demo

Hello World!
[H[2Jgood bye!
