# Binary and Reverse Engineering 
- reverse engineering of binary program is a popular skill in malware analysis
- as most malware programs are binary writtern in C/C++ programming languages, they needed to be reverse engineered to understand the functinalities of malware under the hood

## Executable and Linkable Format (ELF)
- https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
- common format for executables files, object code, shared libraries, and core dumps
![ELF](./media/ELF.png)
- an ELF file has two views: the program header shows the segments used at run time, whereas the section header lists the set of sections of the binary.
- let's compile hello.cpp program provided in demos folder and examine ELF format using various tools and commands

In [1]:
! cat demos/hello.cpp

#include <iostream>
#include <cstdio>

using namespace std;

int main() {
    cout << "Hello World!\n";
    printf("Good bye World!\n");
    return 0;
}

In [2]:
! g++ -m32 -o hello demos/hello.cpp

In [16]:
! g++ -m32 -o memory_segments.exe demos/memory_segments.cpp

In [3]:
! ./hello

Hello World!
Good bye World!


In [4]:
! ls -al hello

-rwxr-xr-x 1 kali kali 15972 Feb 21 21:46 hello


In [5]:
! cat hello

        si	   )      �>    �>    �>    �?    $@    �?    �?    �?    �?  	  �?    �                             �  �                                                          �.  �>  �>  H  L           �.  �>  �>  �   �            �  �  �  D   D         P�td(   (   (   \   \         Q�td                          R�td�.  �>  �>              /lib/ld-linux.so.2           GNU �/����U5�p"s/&��dڸ         GNU                                      �K��                �           "   �              �              ^              �              �              �                                            ,               F              �            __gmon_start__ _ITM_deregisterTMCloneTable _ITM_registerTMCloneTable _ZNSt8ios_base4InitD1Ev _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc _ZNSt8ios_base4InitC1Ev _ZSt4cout _IO_stdin_used puts __cxa_atexit __cxa_finalize __libc_start_main libstdc++.so.6 libc

## file utility
- displays some information about ELF files

In [6]:
! file demos/hello.cpp

demos/hello.cpp: C++ source, ASCII text


In [8]:
! file hello

hello: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=36e1aec0e84c8706bae12e88d143b95c977f5256, for GNU/Linux 3.2.0, not stripped


In [9]:
# display hex and ASCII in two columns
! hexdump -C hello

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  80 10 00 00 34 00 00 00  |............4...|
00000020  90 39 00 00 00 00 00 00  34 00 20 00 0b 00 28 00  |.9......4. ...(.|
00000030  1e 00 1d 00 06 00 00 00  34 00 00 00 34 00 00 00  |........4...4...|
00000040  34 00 00 00 60 01 00 00  60 01 00 00 04 00 00 00  |4...`...`.......|
00000050  04 00 00 00 03 00 00 00  94 01 00 00 94 01 00 00  |................|
00000060  94 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 dc 04 00 00  dc 04 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 fc 02 00 00  fc 02 00 00 05 00 00 00  |................|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 f4 01 00 00  f4 01 00 00 04 00

## ELF file parts

## Symbols
- function names, e.g., if printf built-in function is used, how does the program find it?

## Sections
- symbols are organized into **sections** - code lives in one section (.text) and data lives in another (.data, .rodata)

## Segments
- sections are organized into **segments**

### Examine various sections of ELF
- let's compile demos/hello.cpp file
- use the compiled ELF file to examine various sections

### readelf and objdump
- these utilities can help us look at various parts

### look at all the symbols of a binary
- important symbols to note: main, _start, puts

In [7]:
! readelf --symbols hello


Symbol table '.dynsym' contains 13 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.1.3 (2)
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBC_2.1.3 (2)
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND __[...]@GLIBC_2.0 (3)
     4: 00000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4 (4)
     5: 00000000     0 OBJECT  GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4 (4)
     6: 00000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.0 (3)
     7: 00000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4 (4)
     8: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     9: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    10: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
    11: 00000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4 (4)
    12: 00002004     4 OBJECT  GLOBAL DEFAULT   16 _IO

### display all the sections
- some important sections are: .text, .rodata, .data, .bss

In [8]:
! readelf --sections hello

There are 30 section headers, starting at offset 0x39b4:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00000194 000194 000013 00   A  0   0  1
  [ 2] .note.gnu.bu[...] NOTE            000001a8 0001a8 000024 00   A  0   0  4
  [ 3] .note.ABI-tag     NOTE            000001cc 0001cc 000020 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        000001ec 0001ec 000020 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          0000020c 00020c 0000d0 10   A  6   1  4
  [ 6] .dynstr           STRTAB          000002dc 0002dc 000135 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          00000412 000412 00001a 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         0000042c 00042c 000050 00   A  6   2  4
  [ 9] .rel.dyn          REL             0000047c 00047c 000058 08   A  5   0  4
  [10] .rel.plt          REL      

## look at just one section, e.g., .rodata 
- read-only data is stored in .rodata, e.g. literal values (Hello World!)

In [9]:
# let's look at just the .rodata section of hello program
! readelf -x .rodata hello


Hex dump of section '.rodata':
  0x00002000 03000000 01000200 0048656c 6c6f2057 .........Hello W
  0x00002010 6f726c64 210a0047 6f6f6420 62796520 orld!..Good bye 
  0x00002020 576f726c 642100                     World!.



In [13]:
# let's look at the .rodata section of memory_segment.exe program
! readelf -x .rodata memory_segments.exe


Hex dump of section '.rodata':
  0x00002000 03000000 01000200 004f7574 70757420 .........Output 
  0x00002010 696e7369 64652066 756e6374 696f6e3a inside function:
  0x00002020 00737461 7469635f 696e6974 69616c69 .static_initiali
  0x00002030 7a65645f 76617220 3d202564 0a000000 zed_var = %d....
  0x00002040 73746174 69635f69 6e697469 616c697a static_initializ
  0x00002050 65645f76 61722069 73206174 20616464 ed_var is at add
  0x00002060 72657373 2025700a 00737461 636b5f76 ress %p..stack_v
  0x00002070 6172203d 2025640a 00737461 636b5f76 ar = %d..stack_v
  0x00002080 61722069 73206174 20616464 72657373 ar is at address
  0x00002090 2025700a 006f7574 70757420 66726f6d  %p..output from
  0x000020a0 206d6169 6e206675 6e637469 6f6e006d  main function.m
  0x000020b0 61696e20 69732061 74206164 64726573 ain is at addres
  0x000020c0 733a2025 700a0066 756e6374 696f6e20 s: %p..function 
  0x000020d0 69732061 74206164 64726573 733a2025 is at address: %
  0x000020e0 700a0000 676c6f62 616c5f69 6e69

### objdump program 
- objdump can also be used to examine each program sections

In [11]:
! objdump -s -j .rodata hello


hello:     file format elf32-i386

Contents of section .rodata:
 2000 03000000 01000200 0048656c 6c6f2057  .........Hello W
 2010 6f726c64 210a0047 6f6f6420 62796520  orld!..Good bye 
 2020 576f726c 642100                      World!.         


In [15]:
! readelf -x .data memory_segments.exe
# Note: even though global_initialized_var = 5; we see in hex but not ascii


Hex dump of section '.data':
  0x00004024 00000000 28400000 05000000 4a6f686e ....(@......John
  0x00004034 20536d69 74682100 05000000 05000000  Smith!.........



In [20]:
! readelf -x .bss memory_segments.exe

Section '.bss' has no data to dump.


### look at the segments
- GNU_STACK is important to note
    - RW - Read and Write; NO Execute
    - data in stack will be treated as literal values or just data but not code!

In [21]:
! readelf --segments hello


Elf file type is DYN (Shared object file)
Entry point 0x1080
There are 11 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00160 0x00160 R   0x4
  INTERP         0x000194 0x00000194 0x00000194 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x00000000 0x00000000 0x004dc 0x004dc R   0x1000
  LOAD           0x001000 0x00001000 0x00001000 0x002fc 0x002fc R E 0x1000
  LOAD           0x002000 0x00002000 0x00002000 0x001f4 0x001f4 R   0x1000
  LOAD           0x002ee0 0x00003ee0 0x00003ee0 0x00144 0x00148 RW  0x1000
  DYNAMIC        0x002eec 0x00003eec 0x00003eec 0x000f8 0x000f8 RW  0x4
  NOTE           0x0001a8 0x000001a8 0x000001a8 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x002018 0x00002018 0x00002018 0x0005c 0x0005c R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  

## Disassemble using objdump
- look at the assembly code of the whole binary
- by default, objdump shows AT&T assembly syntax with %, \$
    - source before the destination
    - e.g., `mov $5, %eax`
- https://en.wikipedia.org/wiki/X86_assembly_language

In [22]:
! objdump -d hello


hello:     file format elf32-i386


Disassembly of section .init:

00001000 <_init>:
    1000:	53                   	push   %ebx
    1001:	83 ec 08             	sub    $0x8,%esp
    1004:	e8 b7 00 00 00       	call   10c0 <__x86.get_pc_thunk.bx>
    1009:	81 c3 f7 2f 00 00    	add    $0x2ff7,%ebx
    100f:	8b 83 f4 ff ff ff    	mov    -0xc(%ebx),%eax
    1015:	85 c0                	test   %eax,%eax
    1017:	74 02                	je     101b <_init+0x1b>
    1019:	ff d0                	call   *%eax
    101b:	83 c4 08             	add    $0x8,%esp
    101e:	5b                   	pop    %ebx
    101f:	c3                   	ret    

Disassembly of section .plt:

00001020 <.plt>:
    1020:	ff b3 04 00 00 00    	pushl  0x4(%ebx)
    1026:	ff a3 08 00 00 00    	jmp    *0x8(%ebx)
    102c:	00 00                	add    %al,(%eax)
	...

00001030 <__cxa_atexit@plt>:
    1030:	ff a3 0c 00 00 00    	jmp    *0xc(%ebx)
    1036:	68 00 00 00 00       	push   $0x0
    103b:	e9 e0 ff ff ff       	jmp 

In [23]:
# display 20 lines after each matching line main. of hello program
! objdump -D hello | grep -A20 main.:

000011b9 <main>:
    11b9:	8d 4c 24 04          	lea    0x4(%esp),%ecx
    11bd:	83 e4 f0             	and    $0xfffffff0,%esp
    11c0:	ff 71 fc             	pushl  -0x4(%ecx)
    11c3:	55                   	push   %ebp
    11c4:	89 e5                	mov    %esp,%ebp
    11c6:	53                   	push   %ebx
    11c7:	51                   	push   %ecx
    11c8:	e8 ac 00 00 00       	call   1279 <__x86.get_pc_thunk.ax>
    11cd:	05 33 2e 00 00       	add    $0x2e33,%eax
    11d2:	83 ec 08             	sub    $0x8,%esp
    11d5:	8d 90 09 e0 ff ff    	lea    -0x1ff7(%eax),%edx
    11db:	52                   	push   %edx
    11dc:	8b 90 ec ff ff ff    	mov    -0x14(%eax),%edx
    11e2:	52                   	push   %edx
    11e3:	89 c3                	mov    %eax,%ebx
    11e5:	e8 66 fe ff ff       	call   1050 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
    11ea:	83 c4 10             	add    $0x10,%esp
    11ed:	b8 00 00 00 00       	mov    $0x0,%eax
    11f2:	8d 65 f

### disassemble in Intel syntax
- much cleaner
- destination before source 
    - e.g., `mov eax, 5`

In [24]:
! objdump -M intel -D hello


hello:     file format elf32-i386


Disassembly of section .interp:

00000194 <.interp>:
 194:	2f                   	das    
 195:	6c                   	ins    BYTE PTR es:[edi],dx
 196:	69 62 2f 6c 64 2d 6c 	imul   esp,DWORD PTR [edx+0x2f],0x6c2d646c
 19d:	69 6e 75 78 2e 73 6f 	imul   ebp,DWORD PTR [esi+0x75],0x6f732e78
 1a4:	2e 32 00             	xor    al,BYTE PTR cs:[eax]

Disassembly of section .note.gnu.build-id:

000001a8 <.note.gnu.build-id>:
 1a8:	04 00                	add    al,0x0
 1aa:	00 00                	add    BYTE PTR [eax],al
 1ac:	14 00                	adc    al,0x0
 1ae:	00 00                	add    BYTE PTR [eax],al
 1b0:	03 00                	add    eax,DWORD PTR [eax]
 1b2:	00 00                	add    BYTE PTR [eax],al
 1b4:	47                   	inc    edi
 1b5:	4e                   	dec    esi
 1b6:	55                   	push   ebp
 1b7:	00 36                	add    BYTE PTR [esi],dh
 1b9:	e1 ae                	loope  169 <_init-0xe97>
 1bb:	c0 e8 4c         

In [25]:
! objdump -M intel -D hello | grep -A20 main.:

000011b9 <main>:
    11b9:	8d 4c 24 04          	lea    ecx,[esp+0x4]
    11bd:	83 e4 f0             	and    esp,0xfffffff0
    11c0:	ff 71 fc             	push   DWORD PTR [ecx-0x4]
    11c3:	55                   	push   ebp
    11c4:	89 e5                	mov    ebp,esp
    11c6:	53                   	push   ebx
    11c7:	51                   	push   ecx
    11c8:	e8 ac 00 00 00       	call   1279 <__x86.get_pc_thunk.ax>
    11cd:	05 33 2e 00 00       	add    eax,0x2e33
    11d2:	83 ec 08             	sub    esp,0x8
    11d5:	8d 90 09 e0 ff ff    	lea    edx,[eax-0x1ff7]
    11db:	52                   	push   edx
    11dc:	8b 90 ec ff ff ff    	mov    edx,DWORD PTR [eax-0x14]
    11e2:	52                   	push   edx
    11e3:	89 c3                	mov    ebx,eax
    11e5:	e8 66 fe ff ff       	call   1050 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
    11ea:	83 c4 10             	add    esp,0x10
    11ed:	b8 00 00 00 00       	mov    eax,0x0
    11f2:	8d 65 f8    

## Hex Editor
- hex editor is used modify binary and its contents
- Google online hexeditor better than CLI hexeditor provided by Kali
    - https://hexed.it/ is pretty good one!
- compile and edit demos/system.cpp program to spawn a shell
- search and replace "clear" with "73 68 00 00 00" (sh)

In [17]:
! g++ -o program.exe demos/system.cpp

In [18]:
! ./program.exe
# run the program from terminal for better demo

Launching of Perseverence Rover count down...
[H[2J10
[H[2J9
[H[2J8
[H[2J7
[H[2J6
[H[2J5
[H[2J4
[H[2J3
[H[2J2
[H[2J1
[H[2JBlast Off!
[H[2J