Skip to content

TypeForge: Synthesizing and Selecting Best-Fit Composite Data Types for Stripped Binaries (S&P 2025)

License

Notifications You must be signed in to change notification settings

noobone123/TypeForge

Repository files navigation

TypeForge: Synthesizing and Selecting Best-Fit Composite Data Types for Stripped Binaries

IEEE DOI CCF-A
License GitHub Stars Last Commit

Ask DeepWiki

This is the implementation of the paper titled "TypeForge: Synthesizing and Selecting Best-Fit Composite Data Types for Stripped Binaries". For more details about TypeForge, please refer to our S&P 2025 paper.

We are continuously maintaining and updating this project, aiming to provide more user-friendly features and higher efficiency.

What is TypeForge?

overview

TypeForge aims to recover composite data types (such as structures, unions, etc.) in stripped binaries. Compared to existing methods, TypeForge provides higher efficiency and accuracy.

  • TypeForge is divided into two phases: a Program Analysis phase and an LLM-assisted Refinement phase. The first phase is sufficient for common reverse engineering tasks, while the second phase further improves the accuracy of phase one results.
  • TypeForge is currently implemented as a Ghidra Extension. We welcome other developers to port it to platforms like IDA Pro, Binary Ninja, and Angr.

Recent Updates

📚 Enhanced Documentation - [07/10/2025]:

  • DeepWiki Integration: We have indexed this repository on DeepWiki 🚀 for better code navigation and understanding.

🚀 Latest Version - [06/21/2025]:

  • Bug Fixes: Fixed a bug in determining whether the TypeConstraint is empty.
  • Accuracy Improvements: Enhanced type inference precision, particularly for composite data type identification.

Project Structure

typeforge/                             # Project root
├── ...
├── build.gradle                       # Gradle build configuration
├── extension.properties               # Extension properties
├── src/                               # Main Source code of TypeForge
│   ├── main/java/typeforge            
│   │   ├── analyzer/                  # Entry functions for various program analyses
│   │   ├── base/                      # Underlying components for program analysis algorithms
│   │   │   ├── dataflow/              # Data flow analysis (including data flow abstractions, intra/inter-procedural Solvers)
│   │   │   ├── graph/                 # CallGraph
│   │   │   ├── node/                  # Binary functions and CallSites
│   │   │   ├── parallel/              # Parallel processing Callbacks
│   │   │   └── passes/                # Passes used for synthesizing possible type declarations
│   │   └── utils/                     # Other useful functions for binary analysis
│   └── test/
├── ghidra_scripts/                    # Ghidra scripts
│   ├── TypeForge.java                 # Main TypeForge script
│   └── GroundTruth.java               # Ground truth extractor (from binaries with debug symbol)
├── scripts/                           # Useful Python Scripts
│   ├── judge/                         # LLM-assisted double elimination process
│   ├── GraphExplorer.py               # (Debugging purpose) Explore dumped Type Flow Graph
│   ├── GroundTruthExtractor.py        # Ground truth extractor (wrapper, actually call GroundTruth.java)
│   └── TypeInference.py               # Type Inference (wrapper, actually call TypeForge.java)
├── lib/
└── ...

Building and Installing

Building as ghidra extension

  1. clone this repo

    git clone https://github.com/noobone123/TypeForge.git
  2. Install JDK and Ghidra (ghidra version 11.0.3 is tested).
    download ghidra from here and following the ghidra install instructions.

  3. Modify ghidraInstallDir to YOUR Ghidra installation directory in the build.gradle.

  4. build the ghidra extension.

    cd TypeForge
    gradle buildExtension
    # after building, you will find your extension zip file.
    ls -alh ./dist/ghidra_11.0.3_PUBLIC_[your-build-time]_TypeForge.zip

Installing

Please refer to the following command to unzip and install the compiled Ghidra Extension.

cp ./dist/ghidra_11.0.3_PUBLIC_[your-build-time]_TypeForge.zip \
    [YOUR-Ghidra-Installation-Directory]/Ghidra/Extensions
cd [YOUR-Ghidra-Installation-Directory]/Ghidra/Extensions
unzip ghidra_11.0.3_PUBLIC_[your-build-time]_TypeForge.zip

Getting Started

Type Inference (Headless Mode)

After installing the TypeForge, for a single stripped binary, just run:

[YOUR-Ghidra-Installation-Directory]/support/analyzeHeadless \
    [YOUR-Ghidra-Project-Directory] [YOUR-Project-Name] \
    -deleteProject -import [YOUR-Stripped-Binary] \
    -postScript TypeForge.java output=[Your-output-dir]

After a while, you will see the Type Inference results (JSON files) saved in [Your-output-dir]. For details about these JSON files, please refer to the demo. These JSON files will then be fed into Phase 2 for refinement. For more information, please refer to judge.

For batch processing, please refer to scripts.

We are currently developing additional features to directly import Type Inference results into Ghidra projects.

For more information about Ghidra Headless Mode, please refer to this guide.

Extract the Ground Truth

You can also extract the ground truth of composite data types from a binary with debug information (Note that Ghidra currently does NOT support Dwarf-5 format debug information, so you need to specify -gdwarf-4 during compilation). For more details, please refer to scripts.

Run in Ghidra GUI Mode

In development ...

Developing and Debugging

TypeForge is developed using IntelliJ IDEA (version 2024.1.7) and the intellij-ghidra plugin. For detailed development guidelines, please refer to How To Develop.

Contributors

TypeForge is written and maintained by:

Cite

If you use TypeForge for your academic work, please cite the following paper:

@inproceedings{typeforge,
  title      = {TypeForge: Synthesizing and Selecting Best-Fit Composite Data Types for Stripped Binaries},
  author     = {Wang, Yanzhong and Liang, Ruigang and Li, Yilin and Hu, Peiwei and Chen, Kai and Zhang, Bolun},
  booktitle  = {2025 IEEE Symposium on Security and Privacy (SP)},
  pages      = {2847--2864},
  year       = {2025},
  publisher  = {IEEE Computer Society},
  doi        = {10.1109/SP61157.2025.00193},
}

About

TypeForge: Synthesizing and Selecting Best-Fit Composite Data Types for Stripped Binaries (S&P 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published