GitHub - feicong/tiny-dec: A tiny educational decompiler that helps people understand how decompilation works.

Learn decompilation by building one from scratch.
_{19 stages. Every step inspectable. From raw RISC-V bytes to readable C.

RV32I: the simplest real ISA, so the focus stays on decompilation, not instruction encoding.}

Note

This is a teaching tool, not a production decompiler. The pipeline is deliberately factored into many small, isolated stages so that each concept can be studied independently. A real-world decompiler (Ghidra, IDA, angr, Binary Ninja) would fuse, reorder, or iterate these passes for performance and precision. We trade that efficiency for clarity: if you can follow each stage here, you will understand what the production tools are doing under the hood.

See It In Action

Raw RISC-V instructions go in. Readable C comes out:

Assembly (RV32I)

Decompiled Output

; parse_record
addi x2, x2, -32
sw   x1, 28(x2)
sw   x8, 24(x2)
addi x8, x2, 32
sw   x10, -12(x8)
sw   x11, -16(x8)
addi x10, x0, 0
sw   x10, -20(x8)
sw   x10, -24(x8)
jal  x0, 0x11154
lw   x10, -24(x8)
lw   x11, -16(x8)
bge  x10, x11, 0x111b8
lw   x10, -12(x8)
lw   x11, -24(x8)
slli x11, x11, 3
add  x10, x10, x11
lw   x11, 0(x10)
lw   x10, -20(x8)
add  x10, x10, x11
sw   x10, -20(x8)
lw   x10, -12(x8)
lw   x11, -24(x8)
slli x11, x11, 3
add  x10, x10, x11
lw   x11, 4(x10)
lw   x10, -20(x8)
add  x10, x10, x11
sw   x10, -20(x8)
lw   x10, -24(x8)
addi x10, x10, 1
sw   x10, -24(x8)
jal  x0, 0x11154
lw   x10, -20(x8)
lw   x1, 28(x2)
lw   x8, 24(x2)
addi x2, x2, 32
jalr x0, 0(x1)

#include <stdint.h>

typedef struct agg_8 {
  int32_t field_0;
  int32_t field_4;
} agg_8;

static uint32_t main(void) {
  int32_t local_24_4;
  int32_t local_20_4;
  int32_t local_16_4;
  int32_t local_12_4;
  uint32_t call_ret;

  local_12_4 = 20;
  local_16_4 = 2;
  local_20_4 = 10;
  local_24_4 = 1;
  call_ret = parse_record(&local_24_4, 2);
  return call_ret;
}

static uint32_t parse_record(
    agg_8* arg_x10_4,
    int32_t arg_x11_4) {
  int32_t local_24_4;
  int32_t local_20_4;

  local_20_4 = 0;
  local_24_4 = 0;
  while (local_24_4 <s arg_x11_4) {
    local_20_4 = local_20_4
        + arg_x10_4[local_24_4].field_0;
    local_20_4 = local_20_4
        + arg_x10_4[local_24_4].field_4;
    local_24_4 = local_24_4 + 1;
  }
  return local_20_4;
}

Recovered: struct layout, pointer parameter, while loop, array indexing with field access, and local variables.

It handles more than loops and structs:

// Switch recovery from equality-ladder CFG patterns
static int32_t dispatch(uint32_t arg_x10_4, uint32_t arg_x11_4) {
  uint32_t local_24_4;
  uint32_t local_12_4;

  local_24_4 = arg_x10_4;
  switch (arg_x10_4) {
  case 0:
    local_12_4 = arg_x11_4 + 1;
    break;
  case 1:
    local_12_4 = arg_x11_4 + 4;
    break;
  case 2:
    local_12_4 = arg_x11_4 << 1;
    break;
  case 3:
    local_12_4 = arg_x11_4 - 3;
    break;
  default:
    local_12_4 = -1;
    break;
  }
  return local_12_4;
}

// Interprocedural analysis with prototype inference
static int32_t main(void) {
  int32_t local_16_4;
  int32_t local_12_4;

  local_12_4 = 7;
  local_16_4 = helper(local_12_4);
  return local_16_4 - 2;
}

static int32_t helper(int32_t arg_x10_4) {
  return (arg_x10_4 << 1) + arg_x10_4 + 1;
}

Quick Start

Install

# System dependencies (Ubuntu/Debian)
sudo apt-get install -y clang lld llvm binutils build-essential python3 python3-pip

# Python dependencies
pip install poetry
poetry install

# Build RV32I fixture binaries for tests
./scripts/build_fixtures.sh

Decompile

# Full decompilation to C (default)
tiny-dec decompile ./binary.elf

# Target a specific function
tiny-dec decompile ./binary.elf --func parse_record

# Stop at any intermediate stage and look inside
tiny-dec decompile ./binary.elf --stage ssa --func main

# Inspect binary metadata
tiny-dec info ./binary.elf

Every stage is a window into the pipeline:

loader → decode → pcode → disasm → ir → simplify → dataflow → ssa
→ calls → stack → memory → scalar_types → aggregate_types → variables
→ range → interproc → structuring → c_lowering → c

Test

poetry run pytest            # all 500 tests
poetry run ruff check .      # lint
poetry run mypy tiny_dec     # type check

The Pipeline

19 stages, three phases. Each stage produces a typed, deterministic artifact you can inspect with --stage. The stages are split for learning purposes; a production decompiler would combine several of these into fewer, more tightly coupled passes.

Phase 1: Frontend (Bytes to IR)

#	Stage	What it does
00	Loader	Parse ELF structure, resolve symbols, find `main` (via pwntools, since binary parsing is not our focus)
01	Decode	Decode RV32I instruction words into structured objects
02	P-code Lift	Lift each instruction into semantic p-code operations
03	Disassembly	Recursive traversal to build basic blocks and CFG
04	IR Containers	Wrap functions and programs into durable typed containers

Phase 2: Analysis

#	Stage	What it does
05	Simplify	Constant folding, identity elimination, temporary forwarding
06	Dataflow	Forward intraprocedural constant propagation
07	SSA	Dominance-frontier SSA with memory versioning
08	Calls	Call classification, ABI argument/return snapshots
09	Stack	Frame layout recovery, stack slot identification
10	Memory	Partition recovery for stack, global, and pointer accesses
11	Scalar Types	Type inference: `bool`, `int`, `pointer`, `word`
12	Aggregate Types	Struct layout recovery from pointer arithmetic patterns
13	Variables	Group SSA values and memory slots into named variables
14	Range	Interval analysis and branch predicate refinement
15	Interproc	Cross-function prototype inference and summaries

Phase 3: Backend (Structure and Emit)

#	Stage	What it does
16	Structuring	Recover `if` / `while` / `switch` from CFG topology
17	C Lowering	Lower to C-like IR with typed declarations and expressions
18	C Render	Emit readable C with precedence-aware formatting

Project Layout

tiny_dec/
├── loader/            ELF loading and symbol resolution
├── decode/            RV32I instruction decoder
├── ir/                P-code semantics and IR containers
├── disasm/            Recursive disassembler and CFG builder
├── analysis/
│   ├── simplify/      Canonical IR cleanup
│   ├── dataflow/      Constant propagation
│   ├── ssa/           SSA construction
│   ├── calls/         Call modeling and ABI facts
│   ├── stack/         Stack frame recovery
│   ├── memory/        Memory partitioning
│   ├── types/         Scalar and aggregate type recovery
│   ├── highvars/      Variable recovery
│   ├── range/         Range and predicate analysis
│   └── interproc/     Interprocedural summaries
├── structuring/       Control-flow structuring
├── c_emit/            C lowering and rendering
├── pipeline/          End-to-end decompilation driver
└── cli.py             Command-line interface

tests/
├── fixtures/
│   ├── src/           13 C programs (loops, structs, calls, switches, ...)
│   └── bin/           39 cross-compiled RV32I ELF binaries
└── posts/
    └── post_00/ ... post_18/    One test suite per pipeline stage

Why Is It Built This Way?

Explicit over clever. Every intermediate representation is a concrete, typed Python dataclass. No hidden state, no magic.

Deterministic output. Every stage has a text renderer that produces identical output given identical input. This is both the debugging surface and the test oracle.

One stage, one job. Each stage solves exactly one problem and passes a well-defined artifact to the next. You can stop the pipeline at any point and inspect what it produced.

Conservative analysis. The pipeline never invents information. When evidence is ambiguous, it preserves the raw form rather than guessing. Residual raw<...> expressions in the output are honest about what the analysis couldn't resolve.

This is a teaching tool, not a production decompiler. The pipeline is deliberately factored into many small, isolated stages so that each concept can be studied independently. A real-world decompiler (Ghidra, IDA, angr, Binary Ninja) would fuse, reorder, or iterate these passes for performance and precision. We trade that efficiency for clarity: if you can follow each stage here, you will understand what the production tools are doing under the hood.

Fixture Binaries

13 C programs compiled to RV32I ELF at three optimization levels (39 binaries total, with strict no-compressed-instruction verification):

Variant	Flags	Purpose
`*_O0_nopie`	`-O0 -fno-pie`	Unoptimized, predictable stack layout
`*_O2_nopie`	`-O2 -fno-pie`	Optimized, tests register allocation recovery
`*_O2_pie`	`-O2 -fpie`	Position-independent, tests GOT/PLT handling

Rebuild with ./scripts/build_fixtures.sh (requires clang, lld, llvm with RISC-V support).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
.github/workflows		.github/workflows
assets		assets
docker		docker
scripts		scripts
stubs/pwn		stubs/pwn
tests		tests
tiny_dec		tiny_dec
.gitignore		.gitignore
DEVELOPMENT_NOTES.md		DEVELOPMENT_NOTES.md
LICENSE		LICENSE
README.md		README.md
development.md		development.md
implementation.md		implementation.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

See It In Action

Quick Start

Install

Decompile

Test

The Pipeline

Phase 1: Frontend (Bytes to IR)

Phase 2: Analysis

Phase 3: Backend (Structure and Emit)

Project Layout

Why Is It Built This Way?

Fixture Binaries

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

See It In Action

Quick Start

Install

Decompile

Test

The Pipeline

Phase 1: Frontend (Bytes to IR)

Phase 2: Analysis

Phase 3: Backend (Structure and Emit)

Project Layout

Why Is It Built This Way?

Fixture Binaries

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages