Skip to content

haolpku/DeltaSynth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Δ-Harness — Code

Paper Dataset on HF License

An agentic data harness for generative visual and world models. This repo holds the code. The paper data (training pools, verifier alignment labels, video pilot clips) lives on Hugging Face: lhpku20010120/DeltaSynth.


What this is

Δ-Harness turns synthetic image-edit data generation into a closed-loop control problem:

CampaignSpec → generate → verify → diagnose → rewrite/resample → allocate next batch

Instead of producing a static dataset, the harness watches verifier scores and failure tags, decides which slices to revisit, and rewrites failed samples back into new attempts. The §E1 pilot shows R3 (agentic) reaches a +25% GoodCase rate over fixed-matrix and verifier-only baselines under matched API budget.

We compare four data-generation policies under the same generator + verifier + seed pool + budget:

Setting Strategy
R0 Random uniformly samples seeds and edit codes
R1 Fixed Matrix expands a static taxonomy matrix once
R2 Filter-only uses the R1 matrix, verifies, keeps GoodCases — but does not adapt
R3 Agentic uses coverage gaps, cost, and failure tags to plan the next batch and rewrite failed recipes

Repository layout

DeltaSynth/
├── deltasynth/
│   ├── core/         schema, storage, context, operator/pipeline base classes
│   ├── harness/      campaign spec, ledger, coverage, diagnoser, policy, allocator,
│   │                 planners (R0/R1/R2/R3), agentic loop runner
│   ├── operators/    EditRecipeDeriver, EditRecipeRewriter, GenBefore, EditApply,
│   │                 DeltaVerifier, ThresholdFilter, RandomSampler
│   ├── serving/      OpenAI-compatible LLM/VLM/image-gen + Kling video adapters
│   └── tools/        16 CLI tools for E1/E2/E3/E4 + DiagBench gen/split + HF upload
├── examples/
│   ├── campaigns/    declarative YAML configs for §E1 pilot, §E1b ablation,
│   │                 §E1c alpha scan, §E3 main, etc.
│   └── seeds/        Δ-DiagBench v0 (476) and v0.5 (952) seed JSONs (text-only)
├── docs/
│   ├── E3_GPU_RUNBOOK.md     7-phase runbook for collaborators training LoRAs
│   ├── VIDEO_API_PROBE.md    notes on Veo / Sora / Kling proxy availability
│   └── E1_HYPERPARAMETERS_TODO.md
├── tests/            8 pytest files covering schema, harness, planners, diagnoser
└── OpenDCAI_Template/main.tex   the paper itself

What's on GitHub vs Hugging Face

We split the deliverables based on what makes sense to version:

GitHub Hugging Face
Code (Python source)
Paper LaTeX source
Δ-DiagBench seed JSON ✅ (small) ✅ (mirror)
§E1 pilot storage (1200 samples + images) e1_pilot_storage/
§E1b ablation storage (600 samples + images) e1b_ablation_storage/
§E1c α-scan storage (899 samples) e1c_alpha_scan_storage/
§E2 verifier alignment labels (n=29 + n=97) e2_alignment/
§E2 cross-VLM scores e2_alignment/cross_vlm_scores.jsonl
§E3 main data pools (4×~1000 GC) e3_main/{R0,R1,R2,R3}/
§E3 LoRA-ready JSONL e3_lora_jsonl/
§E4 video pilot clips e4_video_pilot/
Aggregate reports ✅ flat in repo root

GitHub clone is small (~10 MB). Hugging Face download is ~28 GB across 11 zip archives.


Quick start

Just want to read the paper

git clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth/OpenDCAI_Template
pdflatex main.tex
open main.pdf

Reproduce the §E1 / §E2 / §E1c numbers (no GPU)

git clone https://github.com/haolpku/DeltaSynth.git
cd DeltaSynth
pip install -e .

# Need an OpenAI-compatible proxy for image-gen + VLM scoring
cp .env.example .env
$EDITOR .env  # set DELTASYNTH_API_KEY, DELTASYNTH_API_BASE

# Re-run the §E1 pilot (4 settings × 3 seeds × 100 samples; ~¥800 + 6 h wall)
for s in R0 R1 R2 R3; do
  for seed in 42 43 44; do
    python -m deltasynth.tools.ds_cli harness \
      --campaign examples/campaigns/edit_pilot.yml \
      --setting-id $s --random-seed $seed \
      --out out/pilot_${s}_seed${seed} --workers 25 &
  done
done
wait

# Aggregate into a markdown table
python -m deltasynth.tools.e1_table \
  --roots "out/pilot_*" --out reports/pilot_replicated.md

Train LoRAs on the §E3 data (needs GPU)

For collaborators working on §E3 Step 2 / Step 3, start with docs/E3_GPU_RUNBOOK.md — it covers downloading the data from HF, rebasing paths, training 5 LoRAs at matched optimisation budget, running the held-out eval set, and producing the §E3 result table. ~16 GPU-hours on 1× H100, ~¥160 API.

Use Δ-DiagBench as a benchmark

The seed pool JSONs are in examples/seeds/:

import json
seeds = json.load(open("examples/seeds/diag_bench_v05.json"))
print(f"v0.5 has {len(seeds['seeds'])} stress prompts × 23 edit_codes × 47 (code, slice) cells")

For the eval split that LoRA / image-edit models should be tested on:

seeds_eval = json.load(open("examples/seeds/diag_bench_v05_eval.json"))
# 191 prompts, deterministic split at random_seed=20260529

Citation

@techreport{liang2026deltaharness,
  title={\Delta-Harness: An Agentic Data Harness for Generative Visual and World Models},
  author={Liang, Hao},
  year={2026},
  institution={OpenDCAI},
  url={https://github.com/haolpku/DeltaSynth}
}

License

Code: MIT. Δ-DiagBench prompts and labels: CC-BY-4.0. Generated images use the OpenAI proxy's terms of service.


Status

  • §E1 pilot + §E1b ablation + §E1c α-scan: ✅ complete (pilot-full n=1200, ablation n=600, α-scan n=899)
  • §E2 verifier alignment: ✅ cross-VLM panel (5 VLMs × 98 samples) + author labels (n=97 with 35 score adjustments)
  • §E3 Step 1 (data generation): ✅ 4 pools × ~1000 GoodCases each, exported to HF
  • §E3 Step 2 (LoRA training): ⏳ awaiting GPU (see docs/E3_GPU_RUNBOOK.md)
  • §E3 Step 3 (held-out eval): ⏳ harness ready (build_eval_set.py + run_lora_eval.py + score_eval.py)
  • §E4 video pilot (n=10 Kling): ✅ done; n=50 scale-up blocked on Kling proxy upstream

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors