Skip to content

hoverboardhavoc/blobmill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blobmill

blobmill turns a permissively-licensed binary-blob upstream into a GitHub repository of per-function decompiled C source.

Given an upstream such as espressif/esp-phy-lib, the production (actions) driver:

  1. Forks the upstream on GitHub, renamed <basename>-decompiled (e.g. esp-phy-lib-decompiled). The fork's "forked from …" banner is the provenance and non-affiliation signal, and the -decompiled suffix keeps the names distinct.
  2. Preserves the upstream's default branch (master or main) on the fork as an untouched mirror — blobmill never writes to it.
  3. Seeds an orphan decompiled branch with a single parentless commit carrying only the scaffold (no .c), and makes it the fork's default branch.
  4. Enables Actions and dispatches the bundled process.yml workflow on the decompiled branch. GitHub's runner then decompiles the blobs and commits one .c file per function onto decompiled — one commit per upstream blob change, authored on the upstream commit's date and committed by blobmaster.

The committed source is generated on GitHub Actions, not pushed from the machine that runs blobmill. The result is not a git mirror of the upstream binaries; it is a decompiled view, on a parentless branch, that tracks the blob's evolution at the function level.

Requirements

  • Python 3 (standard library only — no third-party packages).
  • The GitHub CLI (gh), authenticated, for forking, pushing the orphan branch, and dispatching the workflow.

The actions driver needs nothing else locally — Ghidra runs on GitHub's runner. The local driver additionally needs, on the machine running blobmill:

  • Ghidra 11.3.2 or later, with GHIDRA_HOME pointing at the install root. The bundled Jython extension must be installed — the decompile postScript (DecompileDump.py) is a Jython 2.7 script, and headless Ghidra aborts it otherwise. Install once from the Ghidra GUI (File → Install Extensions) or by unzipping $GHIDRA_HOME/Extensions/Ghidra/*_Jython.zip into your per-user Ghidra Extensions/ directory.
  • A JDK (Ghidra 11.3.2/12.x: JDK 21) reachable via JAVA_HOME or on PATH.
  • binutils (ar, readelf) on PATH.

Usage

python3 blobmill.py --upstream-repo OWNER/REPO [options]

For example, a local pre-check followed by a production deploy:

python3 blobmill.py --upstream-repo espressif/esp-phy-lib                 # local pre-check (default)
python3 blobmill.py --upstream-repo espressif/esp-phy-lib --driver actions # fork + dispatch on GitHub

Common options:

option meaning
--driver local (default) throwaway pre-check: decompile locally with Ghidra and run the no-churn check, then dispose. No fork, no push, no GitHub writes.
--driver actions production: fork, seed the orphan decompiled branch, set it default, enable Actions, and dispatch process.yml so GitHub generates the .c.
--dry-run preflight and render templates only; no Ghidra, no fork, no push.
--reset re-seed an existing fork's decompiled branch from the scaffold. Never deletes the repository and never touches the mirror branch.
--output-name NAME fork / repo name (default <basename>-decompiled).
--branch NAME orphan branch carrying the decompiled source (default decompiled); becomes the fork's default branch.
--blob-glob GLOB glob selecting the blob archives under the upstream root.
--description TEXT optional repository description set on the fork.
--ghidra-processor SPEC force a Ghidra processor instead of auto-detecting per blob.
--license-override proceed past the license gate (see below).

License gate

Before forking, blobmill checks the upstream's SPDX license via the GitHub API and refuses to proceed unless it is on the permissive allowlist: Apache-2.0, MIT, BSD-2-Clause, BSD-3-Clause, ISC, 0BSD, Unlicense, CC0-1.0. Anything else exits non-zero unless --license-override is given.

The no-churn check

The local pre-check decompiles the same upstream commit twice into a transient working repo and a sibling local bare repo (nothing leaves the machine). A correct, deterministic pipeline reproduces byte-identical output, so the second pass must rewrite zero .c files. If it rewrites any, Ghidra is behaving non-deterministically or a header is leaking into a function body — the pipeline is unstable and the target should not be deployed. Everything the pre-check creates is deleted before it returns.

How decompilation works

On GitHub's runner (or locally for the pre-check), for each upstream commit that touches a matched blob (oldest first):

  • the blob's processor is detected from its ELF header (readelf -h); RISC-V blobs are decompiled, architectures with no bundled Ghidra processor are skipped and logged;
  • each .a archive is split into .o members with ar x;
  • Ghidra headless runs scripts/DecompileDump.py on each member, emitting one .c per function;
  • a function's file is rewritten only when its decompiled body changes, so its header keeps pointing at the upstream commit where it last actually changed.

The workflow carries a daily schedule: (cron) trigger, but GitHub heavily throttles scheduled runs on forks (and auto-disables them after 60 days idle), so cron is only a best-effort bonus. Dependable updates are run via workflow_dispatch; the initial generation is dispatched the same way.

Layout

blobmill.py            the orchestrator
templates/             material rendered into each fork
  process.py           the decompile pipeline (per-blob arch, commit walk)
  DecompileDump.py     Ghidra Jython postScript (verbatim)
  process.yml          the GitHub Actions workflow (actions driver)
  README.md NOTICE     generated-repo prose
  LICENSE gitignore    generated-repo license + ignores

Disclaimer

Decompiler output is mechanical and may be incomplete or semantically inaccurate. Generated repositories are not affiliated with or endorsed by the upstream copyright holders.

License

Apache License 2.0 — see LICENSE.

About

Vendor-agnostic Python orchestrator: turn permissively-licensed binary-blob upstreams into GitHub repos of per-function decompiled C source.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors