# Binary and Reverse Engineering 
- reverse engineering of binary program is a popular skill in malware analysis
- as most malware programs are binary writtern in C/C++ programming languages, they needed to be reverse engineered to understand the functinalities of malware under the hood

## Executable and Linkable Format (ELF)
- https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
- common format for executables files, object code, shared libraries, and core dumps
![ELF](./media/ELF.png)
- an ELF file has two views: the program header shows the segments used at run time, whereas the section header lists the set of sections of the binary.
- let's compile hello.cpp program provided in demos folder and examine ELF format using various tools and commands

In [2]:
! cat ../demos/hello.cpp

#include <iostream>

using namespace std;

int main()
{
    cout << "Hello World!" << endl;
    return 0;
}

In [2]:
! cat ../demos/hello.c

#include <stdio.h>

int main()
{
    printf("Hello, World!\n");
    return 0;
}


In [5]:
! g++ -m32 -o hello.exe ../demos/hello.cpp

In [3]:
! gcc -m32 -o hello_c.exe ../demos/hello.c

In [None]:
! g++ -m32 -o memory_segments.exe ../demos/memory_segments.cpp

In [7]:
! ./hello.exe

Hello World!


In [4]:
! ./hello_c.exe

Hello, World!


In [None]:
! ls -al hello.exe

-rwxrwxrwx 1 codespace codespace 16132 Feb 24 20:29 hello


In [6]:
! ls -al hello_c.exe

-rwxrwxrwx 1 codespace codespace 15588 Feb 24 22:21 hello_c.exe


In [7]:
! cat hello.exe

   f     si	   p      �>    �>    �>    �?    @    �?    �?    �?    �?  	  �?         �              	             �                                            ;               �                         libstdc++.so.6 __gmon_start__ _ITM_deregisterTMCloneTable _ITM_registerTMCloneTable _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ _ZNSt8ios_base4InitD1Ev _ZNSolsEPFRSoS_E _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc _ZNSt8ios_base4InitC1Ev _ZSt4cout libc.so.6 _IO_stdin_used __cxa_atexit __cxa_finalize __libc_start_main GLIBCXX_3.4 GLIBC_2.0 GLIBC_2.1.3                               t)   Z                 ii
  �?    �?    �?    �?    �?    �?    �?                                                                                                                                                                                                                                                                   

In [8]:
! cat hello_c.exe

   @      si	   J       �>    �>    �?    @    �?    �?    �?    �?    �?    �?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

## file utility
- displays some information about ELF files

In [10]:
! file ../demos/hello.cpp

../demos/hello.cpp: C++ source, ASCII text


In [9]:
! file ../demos/hello.c

../demos/hello.c: C source, ASCII text


In [11]:
! file hello.exe

hello.exe: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=c9e50248aeef37c647e5ccf1e2b37935cbd29d2b, for GNU/Linux 3.2.0, not stripped


In [10]:
! file hello_c.exe

hello_c.exe: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=26714f636c7fbc5e12c674fdcf83bbf1baa4e0a7, for GNU/Linux 3.2.0, not stripped


In [None]:
# display hex and ASCII in two columns
! hexdump -C hello.exe

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  f0 10 00 00 34 00 00 00  |............4...|
00000020  2c 3a 00 00 00 00 00 00  34 00 20 00 0c 00 28 00  |,:......4. ...(.|
00000030  1f 00 1e 00 06 00 00 00  34 00 00 00 34 00 00 00  |........4...4...|
00000040  34 00 00 00 80 01 00 00  80 01 00 00 04 00 00 00  |4...............|
00000050  04 00 00 00 03 00 00 00  b4 01 00 00 b4 01 00 00  |................|
00000060  b4 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 98 05 00 00  98 05 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 a4 03 00 00  a4 03 00 00 05 00 00 00  |................|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 f8 01 00 00  f8 01 00 00 04 00

In [11]:
! hexdump -C hello_c.exe

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  90 10 00 00 34 00 00 00  |............4...|
00000020  0c 38 00 00 00 00 00 00  34 00 20 00 0c 00 28 00  |.8......4. ...(.|
00000030  1f 00 1e 00 06 00 00 00  34 00 00 00 34 00 00 00  |........4...4...|
00000040  34 00 00 00 80 01 00 00  80 01 00 00 04 00 00 00  |4...............|
00000050  04 00 00 00 03 00 00 00  b4 01 00 00 b4 01 00 00  |................|
00000060  b4 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 f4 03 00 00  f4 03 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 b4 02 00 00  b4 02 00 00 05 00 00 00  |................|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 a4 01 00 00  a4 01 00 00 04 00

## ELF file parts

## Symbols
- function names, e.g., if printf built-in function is used, how does the program find it?

## Sections
- symbols are organized into **sections** - code lives in one section (.text) and data lives in another (.data, .rodata)

## Segments
- sections are organized into **segments**

### Examine various sections of ELF
- let's compile demos/hello.cpp file
- use the compiled ELF file to examine various sections

### readelf and objdump
- these utilities can help us look at various parts

### look at all the symbols of a binary
- important symbols to note: main, _start, puts

In [None]:
! readelf --symbols hello.exe


Symbol table '.dynsym' contains 14 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.1.3 (2)
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_trait@GLIBCXX_3.4 (3)
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_atexit@GLIBC_2.1.3 (2)
     4: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (4)
     5: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcE@GLIBCXX_3.4 (3)
     6: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E@GLIBCXX_3.4 (3)
     7: 00000000     0 OBJECT  GLOBAL DEFAULT  UND _ZSt4cout@GLIBCXX_3.4 (3)
     8: 00000000     0 FUNC    GLOBAL DEFAULT  UND _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 (3)
     9: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
    10: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    11: 00000000     

In [12]:
! readelf --symbols hello_c.exe


Symbol table '.dynsym' contains 8 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     2: 00000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.1.3 (2)
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.0 (3)
     4: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     5: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (3)
     6: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
     7: 00002004     4 OBJECT  GLOBAL DEFAULT   18 _IO_stdin_used

Symbol table '.symtab' contains 70 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 000001b4     0 SECTION LOCAL  DEFAULT    1 
     2: 000001c8     0 SECTION LOCAL  DEFAULT    2 
     3: 000001ec     0 SECTION LOCAL  DEFAULT    3 
     4: 00000208  

### display all the sections
- some important sections are: .text, .rodata, .data, .bss

In [13]:
! readelf --sections hello.exe

There are 31 section headers, starting at offset 0x3a2c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        000001b4 0001b4 000013 00   A  0   0  1
  [ 2] .note.gnu.build-i NOTE            000001c8 0001c8 000024 00   A  0   0  4
  [ 3] .note.gnu.propert NOTE            000001ec 0001ec 00001c 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            00000208 000208 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000228 000228 000020 04   A  6   0  4
  [ 6] .dynsym           DYNSYM          00000248 000248 0000e0 10   A  7   1  4
  [ 7] .dynstr           STRTAB          00000328 000328 00017c 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          000004a4 0004a4 00001c 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         000004c0 0004c0 000050 00   A  7   2  4
  [10] .rel.dyn          REL      

## look at just one section, e.g., .rodata 
- read-only data is stored in .rodata, e.g. literal values (Hello World!)

In [14]:
! readelf --sections hello_c.exe

There are 31 section headers, starting at offset 0x380c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        000001b4 0001b4 000013 00   A  0   0  1
  [ 2] .note.gnu.build-i NOTE            000001c8 0001c8 000024 00   A  0   0  4
  [ 3] .note.gnu.propert NOTE            000001ec 0001ec 00001c 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            00000208 000208 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000228 000228 000020 04   A  6   0  4
  [ 6] .dynsym           DYNSYM          00000248 000248 000080 10   A  7   1  4
  [ 7] .dynstr           STRTAB          000002c8 0002c8 00009b 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          00000364 000364 000010 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         00000374 000374 000030 00   A  7   1  4
  [10] .rel.dyn          REL      

In [15]:
# let's look at just the .rodata section of hello program
! readelf -x .rodata hello.exe


Hex dump of section '.rodata':
  0x00002000 03000000 01000200 0048656c 6c6f2057 .........Hello W
  0x00002010 6f726c64 2100                       orld!.



In [16]:
# let's look at the .rodata section of memory_segment.exe program
! readelf -x .rodata memory_segments.exe


Hex dump of section '.rodata':
  0x00002000 03000000 01000200 004f7574 70757420 .........Output 
  0x00002010 696e7369 64652066 756e6374 696f6e3a inside function:
  0x00002020 00737461 7469635f 696e6974 69616c69 .static_initiali
  0x00002030 7a65645f 76617220 3d202564 0a000000 zed_var = %d....
  0x00002040 73746174 69635f69 6e697469 616c697a static_initializ
  0x00002050 65645f76 61722069 73206174 20616464 ed_var is at add
  0x00002060 72657373 2025700a 00737461 636b5f76 ress %p..stack_v
  0x00002070 6172203d 2025640a 00737461 636b5f76 ar = %d..stack_v
  0x00002080 61722069 73206174 20616464 72657373 ar is at address
  0x00002090 2025700a 006f7574 70757420 66726f6d  %p..output from
  0x000020a0 206d6169 6e206675 6e637469 6f6e006d  main function.m
  0x000020b0 61696e20 69732061 74206164 64726573 ain is at addres
  0x000020c0 733a2025 700a0066 756e6374 696f6e20 s: %p..function 
  0x000020d0 69732061 74206164 64726573 733a2025 is at address: %
  0x000020e0 700a0000 676c6f62 616c5f69 6e69

In [16]:
! readelf --sections hello_c.exe

There are 31 section headers, starting at offset 0x380c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        000001b4 0001b4 000013 00   A  0   0  1
  [ 2] .note.gnu.build-i NOTE            000001c8 0001c8 000024 00   A  0   0  4
  [ 3] .note.gnu.propert NOTE            000001ec 0001ec 00001c 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            00000208 000208 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000228 000228 000020 04   A  6   0  4
  [ 6] .dynsym           DYNSYM          00000248 000248 000080 10   A  7   1  4
  [ 7] .dynstr           STRTAB          000002c8 0002c8 00009b 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          00000364 000364 000010 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         00000374 000374 000030 00   A  7   1  4
  [10] .rel.dyn          REL      

### objdump program 
- objdump can also be used to examine each program sections

In [17]:
! objdump -s -j .rodata hello.exe


hello.exe:     file format elf32-i386

Contents of section .rodata:
 2000 03000000 01000200 0048656c 6c6f2057  .........Hello W
 2010 6f726c64 2100                        orld!.          


In [18]:
! readelf -x .data memory_segments.exe
# Note: even though global_initialized_var = 5; we see in hex but not ascii


Hex dump of section '.data':
  0x00004000 00000000 04400000 05000000 4a6f686e .....@......John
  0x00004010 20536d69 74682100 05000000 05000000  Smith!.........



In [19]:
! readelf -x .bss memory_segments.exe

Section '.bss' has no data to dump.


### look at the segments
- GNU_STACK is important to note
    - RW - Read and Write; NO Execute
    - data in stack will be treated as literal values or just data but not code!

In [27]:
! readelf --segments hello_c.exe


Elf file type is DYN (Shared object file)
Entry point 0x1090
There are 12 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00180 0x00180 R   0x4
  INTERP         0x0001b4 0x000001b4 0x000001b4 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x00000000 0x00000000 0x003f4 0x003f4 R   0x1000
  LOAD           0x001000 0x00001000 0x00001000 0x002b4 0x002b4 R E 0x1000
  LOAD           0x002000 0x00002000 0x00002000 0x001a4 0x001a4 R   0x1000
  LOAD           0x002ed8 0x00003ed8 0x00003ed8 0x00130 0x00134 RW  0x1000
  DYNAMIC        0x002ee0 0x00003ee0 0x00003ee0 0x000f8 0x000f8 RW  0x4
  NOTE           0x0001c8 0x000001c8 0x000001c8 0x00060 0x00060 R   0x4
  GNU_PROPERTY   0x0001ec 0x000001ec 0x000001ec 0x0001c 0x0001c R   0x4
  GNU_EH_FRAME   0x002018 0x00002018 0x00002018 0x00054 0x00054 R   0x4
  G

## Disassemble using objdump
- look at the assembly code of the whole binary
- by default, objdump shows AT&T assembly syntax with %, \$
    - source before the destination
    - e.g., `mov $5, %eax`
- https://en.wikipedia.org/wiki/X86_assembly_language

In [28]:
! objdump -d hello.exe


hello.exe:     file format elf32-i386


Disassembly of section .init:

00001000 <_init>:
    1000:	f3 0f 1e fb          	endbr32 
    1004:	53                   	push   %ebx
    1005:	83 ec 08             	sub    $0x8,%esp
    1008:	e8 23 01 00 00       	call   1130 <__x86.get_pc_thunk.bx>
    100d:	81 c3 b3 2f 00 00    	add    $0x2fb3,%ebx
    1013:	8b 83 34 00 00 00    	mov    0x34(%ebx),%eax
    1019:	85 c0                	test   %eax,%eax
    101b:	74 02                	je     101f <_init+0x1f>
    101d:	ff d0                	call   *%eax
    101f:	83 c4 08             	add    $0x8,%esp
    1022:	5b                   	pop    %ebx
    1023:	c3                   	ret    

Disassembly of section .plt:

00001030 <.plt>:
    1030:	ff b3 04 00 00 00    	pushl  0x4(%ebx)
    1036:	ff a3 08 00 00 00    	jmp    *0x8(%ebx)
    103c:	0f 1f 40 00          	nopl   0x0(%eax)
    1040:	f3 0f 1e fb          	endbr32 
    1044:	68 00 00 00 00       	push   $0x0
    1049:	e9 e2 ff ff ff       	jmp 

In [29]:
# display 20 lines after each matching line main. of hello program
! objdump -D hello.exe | grep -A20 main.:

0000122d <main>:
    122d:	f3 0f 1e fb          	endbr32 
    1231:	8d 4c 24 04          	lea    0x4(%esp),%ecx
    1235:	83 e4 f0             	and    $0xfffffff0,%esp
    1238:	ff 71 fc             	pushl  -0x4(%ecx)
    123b:	55                   	push   %ebp
    123c:	89 e5                	mov    %esp,%ebp
    123e:	53                   	push   %ebx
    123f:	51                   	push   %ecx
    1240:	e8 eb fe ff ff       	call   1130 <__x86.get_pc_thunk.bx>
    1245:	81 c3 7b 2d 00 00    	add    $0x2d7b,%ebx
    124b:	83 ec 08             	sub    $0x8,%esp
    124e:	8d 83 49 e0 ff ff    	lea    -0x1fb7(%ebx),%eax
    1254:	50                   	push   %eax
    1255:	8b 83 2c 00 00 00    	mov    0x2c(%ebx),%eax
    125b:	50                   	push   %eax
    125c:	e8 5f fe ff ff       	call   10c0 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
    1261:	83 c4 10             	add    $0x10,%esp
    1264:	83 ec 08             	sub    $0x8,%esp
    1267:	8b 93 28 00 00 0

### disassemble in Intel syntax
- much cleaner
- destination before source 
    - e.g., `mov eax, 5`

In [None]:
! objdump -M intel -D hello

In [None]:
! objdump -M intel -D hello | grep -A20 main.:

## Hex Editor
- hex editor is used modify binary and its contents
- Google online hexeditor better than CLI hexeditor provided by Kali
    - https://hexed.it/ is pretty good one!
- compile and edit demos/system.cpp program to spawn a shell
- search and replace "clear" with "73 68 00 00 00" (sh)

In [None]:
! g++ -o program.exe demos/system.cpp

In [None]:
! ./program.exe
# run the program from terminal for better demo