An agentic data harness for generative visual and world models. This repo holds the code. The paper data (training pools, verifier alignment labels, video pilot clips) lives on Hugging Face:
lhpku20010120/DeltaSynth.
Δ-Harness turns synthetic image-edit data generation into a closed-loop control problem:
CampaignSpec → generate → verify → diagnose → rewrite/resample → allocate next batch
Instead of producing a static dataset, the harness watches verifier scores and failure tags, decides which slices to revisit, and rewrites failed samples back into new attempts. The §E1 pilot shows R3 (agentic) reaches a +25% GoodCase rate over fixed-matrix and verifier-only baselines under matched API budget.
We compare four data-generation policies under the same generator + verifier + seed pool + budget:
| Setting | Strategy |
|---|---|
| R0 Random | uniformly samples seeds and edit codes |
| R1 Fixed Matrix | expands a static taxonomy matrix once |
| R2 Filter-only | uses the R1 matrix, verifies, keeps GoodCases — but does not adapt |
| R3 Agentic | uses coverage gaps, cost, and failure tags to plan the next batch and rewrite failed recipes |
DeltaSynth/
├── deltasynth/
│ ├── core/ schema, storage, context, operator/pipeline base classes
│ ├── harness/ campaign spec, ledger, coverage, diagnoser, policy, allocator,
│ │ planners (R0/R1/R2/R3), agentic loop runner
│ ├── operators/ EditRecipeDeriver, EditRecipeRewriter, GenBefore, EditApply,
│ │ DeltaVerifier, ThresholdFilter, RandomSampler
│ ├── serving/ OpenAI-compatible LLM/VLM/image-gen + Kling video adapters
│ └── tools/ 16 CLI tools for E1/E2/E3/E4 + DiagBench gen/split + HF upload
├── examples/
│ ├── campaigns/ declarative YAML configs for §E1 pilot, §E1b ablation,
│ │ §E1c alpha scan, §E3 main, etc.
│ └── seeds/ Δ-DiagBench v0 (476) and v0.5 (952) seed JSONs (text-only)
├── docs/
│ ├── E3_GPU_RUNBOOK.md 7-phase runbook for collaborators training LoRAs
│ ├── VIDEO_API_PROBE.md notes on Veo / Sora / Kling proxy availability
│ └── E1_HYPERPARAMETERS_TODO.md
├── tests/ 8 pytest files covering schema, harness, planners, diagnoser
└── OpenDCAI_Template/main.tex the paper itself
We split the deliverables based on what makes sense to version:
| GitHub | Hugging Face | |
|---|---|---|
| Code (Python source) | ✅ | — |
| Paper LaTeX source | ✅ | — |
| Δ-DiagBench seed JSON | ✅ (small) | ✅ (mirror) |
| §E1 pilot storage (1200 samples + images) | — | ✅ e1_pilot_storage/ |
| §E1b ablation storage (600 samples + images) | — | ✅ e1b_ablation_storage/ |
| §E1c α-scan storage (899 samples) | — | ✅ e1c_alpha_scan_storage/ |
| §E2 verifier alignment labels (n=29 + n=97) | — | ✅ e2_alignment/ |
| §E2 cross-VLM scores | — | ✅ e2_alignment/cross_vlm_scores.jsonl |
| §E3 main data pools (4×~1000 GC) | — | ✅ e3_main/{R0,R1,R2,R3}/ |
| §E3 LoRA-ready JSONL | — | ✅ e3_lora_jsonl/ |
| §E4 video pilot clips | — | ✅ e4_video_pilot/ |
| Aggregate reports | — | ✅ flat in repo root |
GitHub clone is small (~10 MB). Hugging Face download is ~28 GB across 11 zip archives.
git clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth/OpenDCAI_Template
pdflatex main.tex
open main.pdfgit clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth
pip install -e .
# Need an OpenAI-compatible proxy for image-gen + VLM scoring
cp .env.example .env
$EDITOR .env # set DELTASYNTH_API_KEY, DELTASYNTH_API_BASE
# Re-run the §E1 pilot (4 settings × 3 seeds × 100 samples; ~¥800 + 6 h wall)
for s in R0 R1 R2 R3; do
for seed in 42 43 44; do
python -m deltasynth.tools.ds_cli harness \
--campaign examples/campaigns/edit_pilot.yml \
--setting-id $s --random-seed $seed \
--out out/pilot_${s}_seed${seed} --workers 25 &
done
done
wait
# Aggregate into a markdown table
python -m deltasynth.tools.e1_table \
--roots "out/pilot_*" --out reports/pilot_replicated.mdFor collaborators working on §E3 Step 2 / Step 3, start with docs/E3_GPU_RUNBOOK.md — it covers downloading the data from HF, rebasing paths, training 5 LoRAs at matched optimisation budget, running the held-out eval set, and producing the §E3 result table. ~16 GPU-hours on 1× H100, ~¥160 API.
The seed pool JSONs are in examples/seeds/:
import json
seeds = json.load(open("examples/seeds/diag_bench_v05.json"))
print(f"v0.5 has {len(seeds['seeds'])} stress prompts × 23 edit_codes × 47 (code, slice) cells")For the eval split that LoRA / image-edit models should be tested on:
seeds_eval = json.load(open("examples/seeds/diag_bench_v05_eval.json"))
# 191 prompts, deterministic split at random_seed=20260529@techreport{liang2026deltaharness,
title={\Delta-Harness: An Agentic Data Harness for Generative Visual and World Models},
author={Liang, Hao},
year={2026},
institution={OpenDCAI},
url={https://github.com/haolpku/DeltaSynth}
}Code: MIT. Δ-DiagBench prompts and labels: CC-BY-4.0. Generated images use the OpenAI proxy's terms of service.
- §E1 pilot + §E1b ablation + §E1c α-scan: ✅ complete (pilot-full n=1200, ablation n=600, α-scan n=899)
- §E2 verifier alignment: ✅ cross-VLM panel (5 VLMs × 98 samples) + author labels (n=97 with 35 score adjustments)
- §E3 Step 1 (data generation): ✅ 4 pools × ~1000 GoodCases each, exported to HF
- §E3 Step 2 (LoRA training): ⏳ awaiting GPU (see
docs/E3_GPU_RUNBOOK.md) - §E3 Step 3 (held-out eval): ⏳ harness ready (
build_eval_set.py+run_lora_eval.py+score_eval.py) - §E4 video pilot (n=10 Kling): ✅ done; n=50 scale-up blocked on Kling proxy upstream