blobmill turns a permissively-licensed binary-blob upstream into a GitHub
repository of per-function decompiled C source.
Given an upstream such as espressif/esp-phy-lib, the production (actions)
driver:
- Forks the upstream on GitHub, renamed
<basename>-decompiled(e.g.esp-phy-lib-decompiled). The fork's "forked from …" banner is the provenance and non-affiliation signal, and the-decompiledsuffix keeps the names distinct. - Preserves the upstream's default branch (
masterormain) on the fork as an untouched mirror —blobmillnever writes to it. - Seeds an orphan
decompiledbranch with a single parentless commit carrying only the scaffold (no.c), and makes it the fork's default branch. - Enables Actions and dispatches the bundled
process.ymlworkflow on thedecompiledbranch. GitHub's runner then decompiles the blobs and commits one.cfile per function ontodecompiled— one commit per upstream blob change, authored on the upstream commit's date and committed byblobmaster.
The committed source is generated on GitHub Actions, not pushed from the
machine that runs blobmill. The result is not a git mirror of the upstream
binaries; it is a decompiled view, on a parentless branch, that tracks the
blob's evolution at the function level.
- Python 3 (standard library only — no third-party packages).
- The GitHub CLI (
gh), authenticated, for forking, pushing the orphan branch, and dispatching the workflow.
The actions driver needs nothing else locally — Ghidra runs on GitHub's
runner. The local driver additionally needs, on the machine running
blobmill:
- Ghidra 11.3.2 or later, with
GHIDRA_HOMEpointing at the install root. The bundled Jython extension must be installed — the decompile postScript (DecompileDump.py) is a Jython 2.7 script, and headless Ghidra aborts it otherwise. Install once from the Ghidra GUI (File → Install Extensions) or by unzipping$GHIDRA_HOME/Extensions/Ghidra/*_Jython.zipinto your per-user GhidraExtensions/directory. - A JDK (Ghidra 11.3.2/12.x: JDK 21) reachable via
JAVA_HOMEor onPATH. binutils(ar,readelf) onPATH.
python3 blobmill.py --upstream-repo OWNER/REPO [options]
For example, a local pre-check followed by a production deploy:
python3 blobmill.py --upstream-repo espressif/esp-phy-lib # local pre-check (default)
python3 blobmill.py --upstream-repo espressif/esp-phy-lib --driver actions # fork + dispatch on GitHub
Common options:
| option | meaning |
|---|---|
--driver local |
(default) throwaway pre-check: decompile locally with Ghidra and run the no-churn check, then dispose. No fork, no push, no GitHub writes. |
--driver actions |
production: fork, seed the orphan decompiled branch, set it default, enable Actions, and dispatch process.yml so GitHub generates the .c. |
--dry-run |
preflight and render templates only; no Ghidra, no fork, no push. |
--reset |
re-seed an existing fork's decompiled branch from the scaffold. Never deletes the repository and never touches the mirror branch. |
--output-name NAME |
fork / repo name (default <basename>-decompiled). |
--branch NAME |
orphan branch carrying the decompiled source (default decompiled); becomes the fork's default branch. |
--blob-glob GLOB |
glob selecting the blob archives under the upstream root. |
--description TEXT |
optional repository description set on the fork. |
--ghidra-processor SPEC |
force a Ghidra processor instead of auto-detecting per blob. |
--license-override |
proceed past the license gate (see below). |
Before forking, blobmill checks the upstream's SPDX license via the GitHub
API and refuses to proceed unless it is on the permissive allowlist:
Apache-2.0, MIT, BSD-2-Clause, BSD-3-Clause, ISC, 0BSD, Unlicense, CC0-1.0.
Anything else exits non-zero unless --license-override is given.
The local pre-check decompiles the same upstream commit twice into a
transient working repo and a sibling local bare repo (nothing leaves the
machine). A correct, deterministic pipeline reproduces byte-identical output,
so the second pass must rewrite zero .c files. If it rewrites any, Ghidra is
behaving non-deterministically or a header is leaking into a function body —
the pipeline is unstable and the target should not be deployed. Everything the
pre-check creates is deleted before it returns.
On GitHub's runner (or locally for the pre-check), for each upstream commit that touches a matched blob (oldest first):
- the blob's processor is detected from its ELF header (
readelf -h); RISC-V blobs are decompiled, architectures with no bundled Ghidra processor are skipped and logged; - each
.aarchive is split into.omembers withar x; - Ghidra headless runs
scripts/DecompileDump.pyon each member, emitting one.cper function; - a function's file is rewritten only when its decompiled body changes, so its header keeps pointing at the upstream commit where it last actually changed.
The workflow carries a daily schedule: (cron) trigger, but GitHub heavily
throttles scheduled runs on forks (and auto-disables them after 60 days idle),
so cron is only a best-effort bonus. Dependable updates are run via
workflow_dispatch; the initial generation is dispatched the same way.
blobmill.py the orchestrator
templates/ material rendered into each fork
process.py the decompile pipeline (per-blob arch, commit walk)
DecompileDump.py Ghidra Jython postScript (verbatim)
process.yml the GitHub Actions workflow (actions driver)
README.md NOTICE generated-repo prose
LICENSE gitignore generated-repo license + ignores
Decompiler output is mechanical and may be incomplete or semantically inaccurate. Generated repositories are not affiliated with or endorsed by the upstream copyright holders.
Apache License 2.0 — see LICENSE.