# Exercise 5: Memory Protection Mechanisms (Bounds checking, CFI, SFI)

### Assignment Deadline: May 5 (Monday) before 11:59PM

**This assignment is worth 15% of the semester grade.**

In our lecture, we learn about multiple compiler-based, application-level mitigations, including bounds checking, control-flow integrity (CFI), and Software Fault Isolation (SFI). In this exercise, we will ask you to implement the checking logics of these mitigations, **without** dealing with the hassle of static analysis and binary instrumentation. We will start with building an **offline** reference monitor to track all the memory traces and control transfers, and then implement the checking logics corresponding to bounds checking, CFI, and SFI.

## Step 0: Program Tracing

In this step, we will start by collecting the traces of memory access (read and write) and control transfer (call, jump, and return) in a program. Normally, a reference monitor will collect and check the traces online during the program execution. This is important as any operations violating the safety policies will have to be interrupted immediately. However, performing online checks require instrumentation and lots of optimizations to avoid significant overheads. Instead, in this exercise, we will design an **offline** reference monitor, which checks the traces **after** the program execution is finished.

We will use **Pin Tool**, a binary instrumentation and analysis framwork to collect the traces from a program. Pin allows you to design your own binary analyzer, by hooking different logics to operations in the program. Here, we provide you the code for tracing memory access and control transfer, a tool called **memtrace**, which you do not have to develop yourself.

Now, start with downloading the Pin tool:

In [None]:
!wget -O pintool-3.31.tar.gz https://github.com/chiache/csce713-assignments/raw/refs/heads/master/lab5/pintool-3.31.tar.gz
!tar -xzf pintool-3.31.tar.gz

Next, examine `memtrace.cpp`, a custom-made Pin tool for tracing memory access and control transfer.

In [None]:
%%writefile pintool-3.31/source/tools/MyPinTool/memtrace.cpp
#include "pin.H"
#include <iostream>
#include <fstream>
#include <thread>

std::ofstream TraceFile;

VOID RecordMemRead(VOID *ip, VOID *addr, THREADID threadID) {
    TraceFile << threadID << "," << ip << ",READ," << addr << "\n";
}

VOID RecordMemWrite(VOID *ip, VOID *addr, THREADID threadID) {
    TraceFile << threadID << "," <<  ip << ",WRITE," << addr << "\n";
}

VOID RecordDirectControlFlow(VOID* ip, VOID* target, const char* type, THREADID threadID) {
    TraceFile << threadID << "," <<  ip << "," << type << "," << target << "\n";
}

VOID RecordIndirectControlFlow(VOID* ip, VOID* target, const char* type, THREADID threadID) {
    TraceFile << threadID << "," <<  ip << "," << type << "," << target << "\n";
}

VOID Instruction(INS ins, VOID *v) {
    if (INS_IsMemoryRead(ins)) {
        INS_InsertPredicatedCall(
            ins, IPOINT_BEFORE, (AFUNPTR)RecordMemRead,
            IARG_INST_PTR,
            IARG_MEMORYREAD_EA,
            IARG_THREAD_ID,
            IARG_END);
    }

    if (INS_IsMemoryWrite(ins)) {
        INS_InsertPredicatedCall(
            ins, IPOINT_BEFORE, (AFUNPTR)RecordMemWrite,
            IARG_INST_PTR,
            IARG_MEMORYWRITE_EA,
            IARG_THREAD_ID,
            IARG_END);
    }

    if (INS_IsRet(ins)) {
        INS_InsertPredicatedCall(
            ins, IPOINT_BEFORE, (AFUNPTR)RecordIndirectControlFlow,
            IARG_INST_PTR,
            IARG_BRANCH_TARGET_ADDR,
            IARG_PTR, "RET",
            IARG_THREAD_ID,
            IARG_END);
    } else if (INS_IsConrolFlow(ins)) {
        const char* type = INS_IsCall(ins) ? "CALL" : "JMP";

        if (INS_IsDirectControlFlow(ins)) {
            ADDRINT target = INS_DirectControlFlowTargetAddress(ins);
            INS_InsertPredicatedCall(
                ins, IPOINT_BEFORE, (AFUNPTR)RecordDirectControlFlow,
                IARG_INST_PTR,
                IARG_ADDRINT, target,
                IARG_PTR, type,
                IARG_THREAD_ID,
                IARG_END);
        } else if (INS_IsIndirectControlFlow(ins)) {
            INS_InsertPredicatedCall(
                ins, IPOINT_BEFORE, (AFUNPTR)RecordIndirectControlFlow,
                IARG_INST_PTR,
                IARG_BRANCH_TARGET_ADDR,
                IARG_PTR, type,
                IARG_THREAD_ID,
                IARG_END);
        }
    }
}

// Called when the application exits
VOID Fini(INT32 code, VOID *v) {
    TraceFile.close();
}

// Initialization
int main(int argc, char *argv[]) {
    PIN_Init(argc, argv);
    TraceFile.open("memtrace.out");

    INS_AddInstrumentFunction(Instruction, 0);
    PIN_AddFiniFunction(Fini, 0);

    PIN_StartProgram(); // Never returns
    return 0;
}

Once you have save the source file, build the custom Pin tool for x86-64:

In [None]:
!cd pintool-3.31/source/tools/MyPinTool && mkdir -p obj-intel64 && make PIN_ROOT=/content/pintool-3.31 obj-intel64/memtrace.so

Now, you should be able to run any x86-64 program under this custom pin tool. For example, running `/bin/ls`:


In [None]:
!pintool-3.31/pin -t pintool-3.31/source/tools/MyPinTool/obj-intel64/memtrace.so -- /bin/ls

The traces are stored in `memtrace.out`. You may open the file to examine the traces.

## Step 1: Bounds Checking

**Bounds checking** is a classic technique to detect and prevent out-of-bound pointer references from their sources. For each pointer or memory access in program, we can define the upper bound and lower bound of the virtual address, to ensure that the pointer dereferencing or memory access will never go outside of its normal bounds.



Let's start with a simple example of buffer overflow:

In [None]:
%%writefile buffer-overflow.c
#include <stdio.h>
#include <string.h>

char buffer[10];

void vuln_func() {
    char *b = buffer, c;
    while(c = getchar(), c != '\n')
      *(b++) = c;
    printf("You entered: %s\n", buffer);
}

int main() {
    vuln_func();
    return 0;
}

Now, let's compile the program and examine the program binary:

In [None]:
!gcc -o buffer-overflow buffer-overflow.c -no-pie
!objdump -S buffer-overflow

You can start with running the program directly under Pin tool with the memory trace module we provided. The program should output a file called ``memtrace.out` which contains all the traces of memory access and control transfer.

In [None]:
!echo "aaaaaaaaaa" | pintool-3.31/pin -t pintool-3.31/source/tools/MyPinTool/obj-intel64/memtrace.so -- ./buffer-overflow

Now, please design a Python module to perform bounds checking on the memory traces to capture the instruction address(es) where the buffer overflow happens.

First, let's start with defining the bounds checking rules:

In [None]:
bounds_checking_rules = {
    # Each rule should be the format of "IP address: (lower bound, upper bound)"
}

Next, we will read the memory trace output and check the recorded memory traces against the bounds checking rules.

In [None]:
def read_memtrace():
    records = []

    with open('memtrace.out', 'r') as f:
        for line_number, line in enumerate(f, start=1):
            line = line.strip()
            if not line or line.startswith('#'):
                continue  # skip empty lines or comments

            parts = line.split(',')
            if len(parts) != 4:
                print(f"Skipping malformed line {line_number}: {line}")
                continue

            try:
                thread_id = int(parts[0])
                ip_address = int(parts[1], 16)
                op_type = parts[2]
                target_address = int(parts[3], 16)

                record = {
                    'thread_id': thread_id,
                    'ip_address': ip_address,
                    'op_type': op_type,
                    'target_address': target_address
                }
                records.append(record)

            except ValueError as e:
                print(f"Error parsing line {line_number}: {e}")
                continue

    return records

Now, please use the memtrace and the defined bounds checking rules to print out the IP addresses, operation types (read or write), and target addresses of memory safety violation.

**For simplicity, you only need to look at IP addresses and read/write addresses between `0x400000--0x800000` (the range of the target executable). We do not have to check against addresses within the libraries.**

## Step 2: Control-Flow Integrity

In this step, we will implement a simple version of **control-flow integrity**, a security policy to check the target of calls (both indirect and direct) and returns against a list of valid targets. To start, we will use the same memory trace Pin tool to collect the targets of control flow transfer and check against our rules.

Let's start with a simple example of stack smashing:



In [None]:
%%writefile stack-smashing.c
#include <stdio.h>
#include <string.h>

void attack_func() {
    printf("Stack smashed! You've gained unauthorized access!\n");
}

void vuln_func() {
    char buffer[10];

    printf("Enter some input: ");
    scanf("%s", buffer); // this  is unsafe and allows stack smashing

    printf("You entered: %s\n", buffer);
}

int main() {
    vuln_func();
    printf("Normal execution continues...\n");
    return 0;
}

In [None]:
!gcc -o stack-smashing stack-smashing.c -no-pie -fno-stack-protector
!objdump -S stack-smashing

Now, a small exercise for recaping what we learned in assignment 2. Please come up with an input to force the progam `stack-smashing` to print out the following:

```
Stack smashed! You've gained unauthorized access!
```

In [None]:
!echo "aaaaaaaaaa" | ./stack-smashing

Next, run `stack-smashing` with the same input and collect memory traces.

In [None]:
!echo "aaaaaaaaaa" | pintool-3.31/pin -t pintool-3.31/source/tools/MyPinTool/obj-intel64/memtrace.so -- ./stack-smashing

Now, please design a Python module to check control flow integrity on the control transfer traces to capture the instruction address(es) where the control flow hijacking happens.

First, let's start with defining the control transfer rules, incuding the rules for returns, calls, and jumps.

In [None]:
return_rules = {
    # Each rule should be the format of "IP address: [ allowed return targets ]"
}

call_rules = {
    # Each rule should be the format of "IP address: [ allowed call targets ]"
}

jump_rules = {
    # Each rule should be the format of "IP address: [ allowed jump targets ]"
}

Now, please use the memtrace and the defined control flow integrity rules to print out the IP addresses, operation types (return, call, or jump), and target addresses of control flow integrity violation.

**Again, for simplicity, you only need to look at IP addresses and memory access addresses between `0x400000--0x800000` (the range of the target executable). We do not have to check against addresses within the libraries.**

## Step 3: Software Fault Isolation (SFI)

**Software Fault Isolation (SFI)** is a technique to prevent faults in one part of a program from corrupting or interfering with other parts. Typically, it's implemented with memory boundaries and control-flow restrictions. Violating SFI usually means that one "compartment" (or thread/module) can corrupt memory it shouldn't be able to touch.

The rules of SFI are thread-based: For each thread (besides thread 0), the rules need to specify two separate rules, one for code and one for data:

*   For code, any jump, call, and return needs to fall within a code area `(CodeStart, CodeEnd)`.
*   For data, any memory access, either read or write, needs to fall within a data area `(DataStart, DataEnd)`.


Let's start with a simple example of violating SFI:

In [None]:
%%writefile sfi-violation.c
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>

#define BUFFER_SIZE 64

typedef struct {
    char buffer[BUFFER_SIZE];
} ThreadData;

void* thread_func_1(void* arg) {
    ThreadData* data = (ThreadData*)arg;
    printf("Thread 1 writing to its own buffer...\n");
    strcpy(data->buffer, "Thread 1 was here!");

    // Malicious overwrite beyond its own buffer
    printf("Thread 1 now corrupting neighbor's buffer...\n");
    memset((char*)data->buffer + BUFFER_SIZE, 'X', 128);

    return NULL;
}

void* thread_func_2(void* arg) {
    ThreadData* data = (ThreadData*)arg;
    printf("Thread 2 sleeping...\n");
    sleep(2); // Give thread 1 time to corrupt
    printf("Thread 2 buffer content: %s\n", data->buffer);
    return NULL;
}

ThreadData t1_data, t2_data;

int main() {
    pthread_t t1, t2;

    memset(&t1_data, 0, sizeof(ThreadData));
    memset(&t2_data, 0, sizeof(ThreadData));
    strcpy(t2_data.buffer, "Thread 2's secret data.");

    pthread_create(&t1, NULL, thread_func_1, (void*)&t1_data);
    pthread_create(&t2, NULL, thread_func_2, (void*)&t2_data);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    return 0;
}

In [None]:
!gcc -o sfi-violation sfi-violation.c -no-pie
!objdump -S sfi-violation

Next, run the example to collect the memory trace:

In [None]:
!pintool-3.31/pin -t pintool-3.31/source/tools/MyPinTool/obj-intel64/memtrace.so -- ./sfi-violation

In order to detect any SFI violation, we need to define the SFI rules (in Python), which should looks like this:

In [None]:
sfi_rules = {
    # Each rule should be the format of "Thread ID: (CodeStart, CodeEnd, DataStart, DataEnd)"
}

Now, please use the memtrace and the defined SFI rules to print out the IP addresses, operation types (read, write, return, call, or jump), and target addresses of SFI violation.

**Again, for simplicity, you only need to look at code addresses and data addresses between `0x400000--0x800000` (the range of the target executable). We do not have to check against addresses within the libraries.**

## Submission

Once you have finished this notebook, click "File > Download > Download as .ipynb" and upload the file to **Assignment 5** on MS Teams.

## Reference

Please cite all the sources if there's any.