# Executable and Linkable Format (ELF)
- https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
- common format for executables files, object code, shared libraries, and core dumps
<img src="./resources/ELF.png">
- an ELF file has two views: the program header shows the segments used at run time, whereas the section header lists the set of sections of the binary.
- let's compile the following program and examine ELF format using various tools

```c
// demo-programs/hello.c program
#include <stdio.h>
int main() {
    puts("Hello World!");
    return 0;
}
```

In [11]:
%%bash
in=./demo-programs/hello.c
out=hello

gcc -g -o $out $in
./$out

Hello World!


In [3]:
! cat hello

ELF              `  4   �?      4    ( # "    4   4   4   `  `           �  �  �                             �  �                    X  X                       �  �           �.  �>  �>  (  ,           �.  �>  �>  �   �            �  �  �  D   D         P�td         D   D         Q�td                          R�td�.  �>  �>            /lib/ld-linux.so.2           GNU                        GNU ��V��>�M�t:��M�                       �K��                V                          "                 r               .              �                           libc.so.6 _IO_stdin_used puts __cxa_finalize __libc_start_main GLIBC_2.0 GLIBC_2.1.3 _ITM_deregisterTMCloneTable __gmon_start__ _ITM_registerTMCloneTable                          ii   @      si	   J       �>     �>     �?     @     �?    �?    �?    �?    @    @                                                     

## file utility
- displays some information about ELF files

In [9]:
! file hello

hello: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d2f756ddeea13e0d0bee4d0e82743a07a6ab4dc8, with debug_info, not stripped


In [10]:
! hexdump -C hello

00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 03 00 01 00 00 00  60 10 00 00 34 00 00 00  |........`...4...|
00000020  cc 3f 00 00 00 00 00 00  34 00 20 00 0b 00 28 00  |.?......4. ...(.|
00000030  23 00 22 00 06 00 00 00  34 00 00 00 34 00 00 00  |#.".....4...4...|
00000040  34 00 00 00 60 01 00 00  60 01 00 00 04 00 00 00  |4...`...`.......|
00000050  04 00 00 00 03 00 00 00  94 01 00 00 94 01 00 00  |................|
00000060  94 01 00 00 13 00 00 00  13 00 00 00 04 00 00 00  |................|
00000070  01 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 b8 03 00 00  b8 03 00 00 04 00 00 00  |................|
00000090  00 10 00 00 01 00 00 00  00 10 00 00 00 10 00 00  |................|
000000a0  00 10 00 00 58 02 00 00  58 02 00 00 05 00 00 00  |....X...X.......|
000000b0  00 10 00 00 01 00 00 00  00 20 00 00 00 20 00 00  |......... ... ..|
000000c0  00 20 00 00 84 01 00 00  84 01

## ELF file parts

## Symbols
- function names, e.g., if printf built-in function is used, how does the program find it?

## Sections
- symbols are organized into **sections** - code lives in one section (.text) and data lives in another (.data, .rodata)

## Segments
- sections are organized into **segments**

## readelf and objdump
- these utilities can help us look at various parts

### look at all the symbols of a binary
- import symbols: main, _start, puts

In [7]:
! readelf --symbols hello


Symbol table '.dynsym' contains 8 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     2: 00000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.1.3 (2)
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.0 (3)
     4: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     5: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (3)
     6: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
     7: 00002004     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used

Symbol table '.symtab' contains 73 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000194     0 SECTION LOCAL  DEFAULT    1 
     2: 000001a8     0 SECTION LOCAL  DEFAULT    2 
     3: 000001c8     0 SECTION LOCAL  DEFAULT    3 


### display all the sections
- some important sections .text, .rodata, .data, .bss

In [17]:
! readelf --sections hello

There are 35 section headers, starting at offset 0x3fcc:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00000194 000194 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            000001a8 0001a8 000020 00   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            000001c8 0001c8 000024 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        000001ec 0001ec 000020 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          0000020c 00020c 000080 10   A  6   1  4
  [ 6] .dynstr           STRTAB          0000028c 00028c 00009b 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          00000328 000328 000010 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         00000338 000338 000030 00   A  6   1  4
  [ 9] .rel.dyn          REL             00000368 000368 000040 08   A  5   0  4
  [10] .rel.plt     

## look at just one section, e.g., .rodata 
- some read-only data is stored in .rodata, e.g. Hello World!

In [14]:
! readelf -x .rodata hello


Hex dump of section '.rodata':
  0x00002000 03000000 01000200 48656c6c 6f20576f ........Hello Wo
  0x00002010 726c6421 00                         rld!.



In [21]:
! objdump -s -j .rodata hello


hello:     file format elf32-i386

Contents of section .rodata:
 2000 03000000 01000200 48656c6c 6f20576f  ........Hello Wo
 2010 726c6421 00                          rld!.           


In [16]:
! readelf -x .data hello


Hex dump of section '.data':
  0x00004014 00000000 18400000                   .....@..



### look at the segments
- GNU_STACK is important to note
    - RW - Read and Write; NO Execute
    - data in stack will be treated as literal values or just data but not code!

In [19]:
! readelf --segments hello


Elf file type is DYN (Shared object file)
Entry point 0x1060
There are 11 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00160 0x00160 R   0x4
  INTERP         0x000194 0x00000194 0x00000194 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x00000000 0x00000000 0x003b8 0x003b8 R   0x1000
  LOAD           0x001000 0x00001000 0x00001000 0x00258 0x00258 R E 0x1000
  LOAD           0x002000 0x00002000 0x00002000 0x00184 0x00184 R   0x1000
  LOAD           0x002ef4 0x00003ef4 0x00003ef4 0x00128 0x0012c RW  0x1000
  DYNAMIC        0x002efc 0x00003efc 0x00003efc 0x000f0 0x000f0 RW  0x4
  NOTE           0x0001a8 0x000001a8 0x000001a8 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x002018 0x00002018 0x00002018 0x00044 0x00044 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x

## Disasseble using objdump
- look at assembly of the whole binary
- by default show AT&T assembly syntax with %, $
    - source before the destination
    - `mov $5, %eax`
- https://en.wikipedia.org/wiki/X86_assembly_language

In [18]:
! objdump -d hello


hello:     file format elf32-i386


Disassembly of section .init:

00001000 <_init>:
    1000:	53                   	push   %ebx
    1001:	83 ec 08             	sub    $0x8,%esp
    1004:	e8 97 00 00 00       	call   10a0 <__x86.get_pc_thunk.bx>
    1009:	81 c3 f7 2f 00 00    	add    $0x2ff7,%ebx
    100f:	8b 83 f4 ff ff ff    	mov    -0xc(%ebx),%eax
    1015:	85 c0                	test   %eax,%eax
    1017:	74 02                	je     101b <_init+0x1b>
    1019:	ff d0                	call   *%eax
    101b:	83 c4 08             	add    $0x8,%esp
    101e:	5b                   	pop    %ebx
    101f:	c3                   	ret    

Disassembly of section .plt:

00001020 <.plt>:
    1020:	ff b3 04 00 00 00    	pushl  0x4(%ebx)
    1026:	ff a3 08 00 00 00    	jmp    *0x8(%ebx)
    102c:	00 00                	add    %al,(%eax)
	...

00001030 <puts@plt>:
    1030:	ff a3 0c 00 00 00    	jmp    *0xc(%ebx)
    1036:	68 00 00 00 00       	push   $0x0
    103b:	e9 e

### disassemble in Intel syntax
- much cleaner
- destination before source 
    - `mov eax, 4`

In [20]:
! objdump -M intel -D hello | grep -A20 main.:

00001199 <main>:
    1199:	8d 4c 24 04          	lea    ecx,[esp+0x4]
    119d:	83 e4 f0             	and    esp,0xfffffff0
    11a0:	ff 71 fc             	push   DWORD PTR [ecx-0x4]
    11a3:	55                   	push   ebp
    11a4:	89 e5                	mov    ebp,esp
    11a6:	53                   	push   ebx
    11a7:	51                   	push   ecx
    11a8:	e8 28 00 00 00       	call   11d5 <__x86.get_pc_thunk.ax>
    11ad:	05 53 2e 00 00       	add    eax,0x2e53
    11b2:	83 ec 0c             	sub    esp,0xc
    11b5:	8d 90 08 e0 ff ff    	lea    edx,[eax-0x1ff8]
    11bb:	52                   	push   edx
    11bc:	89 c3                	mov    ebx,eax
    11be:	e8 6d fe ff ff       	call   1030 <puts@plt>
    11c3:	83 c4 10             	add    esp,0x10
    11c6:	b8 00 00 00 00       	mov    eax,0x0
    11cb:	8d 65 f8             	lea    esp,[ebp-0x8]
    11ce:	59                   	pop    ecx
    11cf:	5b                   	pop    ebx
    11d0:	5d         