Fossil is a tool for OS-agnostic data structures recovery. This tool is a Proof-of-Concept of the technique described in the research paper An OS-agnostic Approach to Memory Forensics by Andrea Olivieri, Matteo dell'Amico and Davide Balzarotti, presented at NDSS 2023.
This tool is composed by different modules:
qemu_elf_dumper.py
: dumps the physical memory of a QEMU virtual machine into an ELF core file along with some other information about the hardware of the machine (archived in JSON format inside a NOTE ELF header segment)extract_features.py
: scans the VM core dump looking for strings, kernel pointers and performing static analysis using Ghidradoubly_linked_lists.py
: matches extracted pointers to find doubly linked liststrees.py
: extracts binary tress from the VM dumpextract_structs.py
: reorganizes structures extracted by the other scripts and extracts arrays, linked lists, derived structures and children structuresfossil.py
: an interactive shell to explore the results
This tool is tested on Debian 12. Please install all the packages contained in debian_12_packages
.
It's also possible to run a Docker container containing Fossil and its dependencies (check the Docker section below).
- Clone this repository and create a python3 virtual environment (
python3 -m venv venv
) and activate it (source venv/bin/activate
) - From the virtual environment install python3 dependencies (
python3 -m pip install -r requirements.txt
) - Build the Cython module with
python3 setup.py build_ext --inplace
- Allow your virtual environment to access to the system-wide installed python library
graph-tools
(echo "/usr/lib/python3/dist-packages" > `find venv -name site-packages` /dist-packages.pth
) - Download Ghidra. This tool is tested with versions
10.1.2
and10.3.2
. Once installed, it is recommended to set the following environment variableGHIDRA_PATH=/path/to/ghidra
- Increase memory available to Ghidra with
sed -i s/MAXMEM=2G/MAXMEM=8G/g $GHIDRA_PATH/support/analyzeHeadless
- Create a QEMU Virtual Machine with your favorite OS running on a
i386
,x86_64
oraarch64
CPU architecture (Intel PAE
is not tested) - Start the VM without enabling KVM exposing the
QMP
andGDB
services (qemu [...] -qmp tcp::xxxx,server -s
). These two options open aQMP
server onlocalhost:xxxx
and aGDB
server onlocalhost:1234
- On a different terminal session run
qemu_elf_dumper.py 127.0.0.1:xxxx 127.0.0.1:1234 /path/to/existing_output_folder
. When you want to dump the VM press CTRL-C: the script will stop the machine, dump the physical memory and save information on an ELF core file.- If you are running an Intel machine, the command to use is
qemu_elf_dumper.py 127.0.0.1:xxxx 127.0.0.1:1234 -c CPUSpecifics:MAXPHYADDR:YYY
, whereYYY
is theMAX PHYSICAL ADDRESS
value of the emulated CPU. To do this run the VM as above and, once inside, run on a terminalcat /proc/cpuinfo | grep "address sizes"
. TheMAX_PHY
value is equal to thebits physical
value.
- If you are running an Intel machine, the command to use is
- Once the dumping is finished, you can move along to
data extraction
. You can either follow the instructions below or use thedry_run.py
script. This last one simply performs in order the following steps using the default values provided by each script.
- Extract pointers, strings, other metadata and perform static analysis on the ELF core dump using
extract_features.py /path/to/dump.elf /path/to/existing_path_to_results/
. This script produces variousextracted_xxx.lzma
compressed pickle files. This step can be very long due to the Ghidra static analysis phase. - Extract the doubly linked lists using
doubly_linked_lists.py --min-offset N_OFF --max-offset P_OFF --offset-step ALPHA --min-size SIZE /path/to/results/extracted_ptrs.lzma /path/to/results/dll.lzma
. This step can be very long and consume a huge amount of RAM.N_OFF
: the minimum offset (negative)P_OFF
: the maximum offset (positive)- These values represent the minimum and maximum offsets used for looking for next-prev relations between pointers. The higher the values, the longer is the execution time. In the tests, we used
8192
for 64-bit OS and4096
for 32-bit ones, but for normal uses it can be possible to reduce them. Those values must be:- One the opposite of the other (i.e.
P_OFF = - N_OFF
) - A power of 2 (i.e.
P_OFF = 2^x
) - A multiple of
ALPHA
(i.e.P_OFF mod ALPHA == 0
)
- One the opposite of the other (i.e.
ALPHA
: is the pointer size used by the OS (8 for a 64-bit CPU, 4 for a 32-bit CPU)SIZE
: is the minimum length of a doubly linked list. The lower the value, the greater the execution time, the RAM required and the number of results (including false positives). Check out the paper for more information.
- Extract all the binary trees with
trees.py --min-offset N_OFF --max-offset P_OFF --offset-step ALPHA /path/to/results/extracted_ptrs.lzma /path/to/results/trees.lzma
. The number of results dramatically increases with the increase of theP_OFF/N_OFF
absolute value. In the tests, we used64
for 64-bit OS and32
for 32-bit OS. The higher the value, the greater the execution time, the RAM required and the number of results (including false positives). Check out the paper for more information. - Reorganize and filter structures using
extract_structs.py /path/to/data
. The path should contain the output of the previous scripts. This script produces/path/to/data/results.lzma
compressed pickle file which can be explored usingfossil.py
. - Explore the results with the interactive fossil shell
fossil.py /path/to/data
. Example of fossil shell commands:- Look for a string in a circular doubly linked list:
find_string -cdl bash
- Show all the strings in the same data structure at a fixed offset:
expand_struct -cdl 103 720
- Perform a zero knowledge search in a circular doubly linked lists:
zero -cdl
- Look for a string in a circular doubly linked list:
It is possible to build and use a Docker/Podman container including the entire fossil suite. The following commands refer to podman
, but should be exactly the same with docker
.
- Clone this repository, enter in the repository directory and run
podman build -t fossil .
to build the container. - Inside the container, fossil is located at
/fossil
and requires to bind a volume to/data
which will contain input and output files. The container installs and uses Ghidra 10.3.2. If you want to use a different Ghidra version, put it in the/data
volume on the host and add the option-e GHIDRA_PATH=/data/GHIDRA_INSTALL_DIR
. - To call a fossil script run
podman run --network="host" --rm --it --volume HOST_PATH_TO_DATA:/data:Z localhost/fossil:latest /fossil/COMMAND [options]
- Example: in order to extract doubly linked lists having data in
/path/to/dumps
on the host executepodman run --network="host" --rm --it --volume /path/to/dumps:/data:Z localhost/fossil:latest /fossil/doubly_linked_lists.py --min-offset -8192 --max-offset 8192 --offset-step 8 --min-size 3 /data/extracted_ptrs.lzma /data/dll.lzma
- Example: in order to extract doubly linked lists having data in
- To run
qemu_elf_dumper.py
runqemu
on the host machine and callqemu_elf_dumper.py
inside the container with an extra option: if the bounded host path isHOST_PATH_TO_DATA
add-d HOST_PATH_TO_DATA
option to the command line