Δ-Harness — Code

An agentic data harness for generative visual and world models. This repo holds the code. The paper data (training pools, verifier alignment labels, video pilot clips) lives on Hugging Face: lhpku20010120/DeltaSynth.

What this is

Δ-Harness turns synthetic image-edit data generation into a closed-loop control problem:

CampaignSpec → generate → verify → diagnose → rewrite/resample → allocate next batch

Instead of producing a static dataset, the harness watches verifier scores and failure tags, decides which slices to revisit, and rewrites failed samples back into new attempts. The §E1 pilot shows R3 (agentic) reaches a +25% GoodCase rate over fixed-matrix and verifier-only baselines under matched API budget.

We compare four data-generation policies under the same generator + verifier + seed pool + budget:

Setting	Strategy
R0 Random	uniformly samples seeds and edit codes
R1 Fixed Matrix	expands a static taxonomy matrix once
R2 Filter-only	uses the R1 matrix, verifies, keeps GoodCases — but does not adapt
R3 Agentic	uses coverage gaps, cost, and failure tags to plan the next batch and rewrite failed recipes

Repository layout

DeltaSynth/
├── deltasynth/
│   ├── core/         schema, storage, context, operator/pipeline base classes
│   ├── harness/      campaign spec, ledger, coverage, diagnoser, policy, allocator,
│   │                 planners (R0/R1/R2/R3), agentic loop runner
│   ├── operators/    EditRecipeDeriver, EditRecipeRewriter, GenBefore, EditApply,
│   │                 DeltaVerifier, ThresholdFilter, RandomSampler
│   ├── serving/      OpenAI-compatible LLM/VLM/image-gen + Kling video adapters
│   └── tools/        16 CLI tools for E1/E2/E3/E4 + DiagBench gen/split + HF upload
├── examples/
│   ├── campaigns/    declarative YAML configs for §E1 pilot, §E1b ablation,
│   │                 §E1c alpha scan, §E3 main, etc.
│   └── seeds/        Δ-DiagBench v0 (476) and v0.5 (952) seed JSONs (text-only)
├── docs/
│   ├── E3_GPU_RUNBOOK.md     7-phase runbook for collaborators training LoRAs
│   ├── VIDEO_API_PROBE.md    notes on Veo / Sora / Kling proxy availability
│   └── E1_HYPERPARAMETERS_TODO.md
├── tests/            8 pytest files covering schema, harness, planners, diagnoser
└── OpenDCAI_Template/main.tex   the paper itself

What's on GitHub vs Hugging Face

We split the deliverables based on what makes sense to version:

	GitHub	Hugging Face
Code (Python source)	✅	—
Paper LaTeX source	✅	—
Δ-DiagBench seed JSON	✅ (small)	✅ (mirror)
§E1 pilot storage (1200 samples + images)	—	✅ `e1_pilot_storage/`
§E1b ablation storage (600 samples + images)	—	✅ `e1b_ablation_storage/`
§E1c α-scan storage (899 samples)	—	✅ `e1c_alpha_scan_storage/`
§E2 verifier alignment labels (n=29 + n=97)	—	✅ `e2_alignment/`
§E2 cross-VLM scores	—	✅ `e2_alignment/cross_vlm_scores.jsonl`
§E3 main data pools (4×~1000 GC)	—	✅ `e3_main/{R0,R1,R2,R3}/`
§E3 LoRA-ready JSONL	—	✅ `e3_lora_jsonl/`
§E4 video pilot clips	—	✅ `e4_video_pilot/`
Aggregate reports	—	✅ flat in repo root

GitHub clone is small (~10 MB). Hugging Face download is ~28 GB across 11 zip archives.

Quick start

Just want to read the paper

git clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth/OpenDCAI_Template
pdflatex main.tex
open main.pdf

Reproduce the §E1 / §E2 / §E1c numbers (no GPU)

git clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth
pip install -e .

# Need an OpenAI-compatible proxy for image-gen + VLM scoring
cp .env.example .env
$EDITOR .env  # set DELTASYNTH_API_KEY, DELTASYNTH_API_BASE

# Re-run the §E1 pilot (4 settings × 3 seeds × 100 samples; ~¥800 + 6 h wall)
for s in R0 R1 R2 R3; do
  for seed in 42 43 44; do
    python -m deltasynth.tools.ds_cli harness \
      --campaign examples/campaigns/edit_pilot.yml \
      --setting-id $s --random-seed $seed \
      --out out/pilot_${s}_seed${seed} --workers 25 &
  done
done
wait

# Aggregate into a markdown table
python -m deltasynth.tools.e1_table \
  --roots "out/pilot_*" --out reports/pilot_replicated.md

Train LoRAs on the §E3 data (needs GPU)

For collaborators working on §E3 Step 2 / Step 3, start with docs/E3_GPU_RUNBOOK.md — it covers downloading the data from HF, rebasing paths, training 5 LoRAs at matched optimisation budget, running the held-out eval set, and producing the §E3 result table. ~16 GPU-hours on 1× H100, ~¥160 API.

Use Δ-DiagBench as a benchmark

The seed pool JSONs are in examples/seeds/:

import json
seeds = json.load(open("examples/seeds/diag_bench_v05.json"))
print(f"v0.5 has {len(seeds['seeds'])} stress prompts × 23 edit_codes × 47 (code, slice) cells")

For the eval split that LoRA / image-edit models should be tested on:

seeds_eval = json.load(open("examples/seeds/diag_bench_v05_eval.json"))
# 191 prompts, deterministic split at random_seed=20260529

Citation

@techreport{liang2026deltaharness,
  title={\Delta-Harness: An Agentic Data Harness for Generative Visual and World Models},
  author={Liang, Hao},
  year={2026},
  institution={OpenDCAI},
  url={https://github.com/haolpku/DeltaSynth}
}

License

Code: MIT. Δ-DiagBench prompts and labels: CC-BY-4.0. Generated images use the OpenAI proxy's terms of service.

Status

§E1 pilot + §E1b ablation + §E1c α-scan: ✅ complete (pilot-full n=1200, ablation n=600, α-scan n=899)
§E2 verifier alignment: ✅ cross-VLM panel (5 VLMs × 98 samples) + author labels (n=97 with 35 score adjustments)
§E3 Step 1 (data generation): ✅ 4 pools × ~1000 GoodCases each, exported to HF
§E3 Step 2 (LoRA training): ⏳ awaiting GPU (see docs/E3_GPU_RUNBOOK.md)
§E3 Step 3 (held-out eval): ⏳ harness ready (build_eval_set.py + run_lora_eval.py + score_eval.py)
§E4 video pilot (n=10 Kling): ✅ done; n=50 scale-up blocked on Kling proxy upstream

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
deltasynth		deltasynth
docs		docs
examples		examples
reports		reports
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Δ-Harness — Code

What this is

Repository layout

What's on GitHub vs Hugging Face

Quick start

Just want to read the paper

Reproduce the §E1 / §E2 / §E1c numbers (no GPU)

Train LoRAs on the §E3 data (needs GPU)

Use Δ-DiagBench as a benchmark

Citation

License

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Δ-Harness — Code

What this is

Repository layout

What's on GitHub vs Hugging Face

Quick start

Just want to read the paper

Reproduce the §E1 / §E2 / §E1c numbers (no GPU)

Train LoRAs on the §E3 data (needs GPU)

Use Δ-DiagBench as a benchmark

Citation

License

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages