Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.
The pipeline decomposes seed problems into atomic elements, recombines them to form novel frameworks, then synthesizes verifiable tasks with solutions and test cases.
pip install -r requirements.txt| Setting | Environment variable | CLI override | Default |
|---|---|---|---|
| LLM API key | OPENAI_API_KEY |
--api_key |
— (required) |
| LLM API base URL | OPENAI_BASE_URL |
--base_url |
https://api.deepseek.com/v1 |
| Sandbox API URL | SANDBOX_URL |
--sandbox_url |
— (required for steps 0, 6, 7) |
export OPENAI_API_KEY=your_api_key_here
export OPENAI_BASE_URL=https://api.deepseek.com/v1 # optional, this is the default
export SANDBOX_URL=http://your-sandbox-api/run # needed for validation stepsEvery script's --help groups its arguments into Model & API, Sandbox (where
relevant), I/O paths, Schema (step 0 only), Processing, and Filtering.
0_pipeline.py runs the full synthesis end-to-end (extract → recombine → design →
solve → validate):
python 0_pipeline.py \
--seed_data_path /path/to/seed_data.json \
--output_path ./iter_1/output.json \
--iter_num iter_1 \
--sandbox_url http://your-sandbox-api/run| Argument | Group | Required | Default | Description |
|---|---|---|---|---|
--model |
Model & API | No | deepseek-chat |
LLM model identifier. A filename-safe nickname is derived from it (whitespace and illegal characters replaced with _) for output naming |
--api_key |
Model & API | No | $OPENAI_API_KEY |
LLM API key |
--base_url |
Model & API | No | $OPENAI_BASE_URL or https://api.deepseek.com/v1 |
LLM API base URL |
--sandbox_url |
Sandbox | Yes* | $SANDBOX_URL |
Code execution sandbox API URL |
--seed_data_path |
I/O | Yes | — | Seed data JSON (list of problem objects) |
--output_path |
I/O | Yes | — | Final output JSON path |
--iter_num |
I/O | Yes | — | Subdirectory name for intermediate outputs |
--schema_path |
Schema | No | schemas/algorithm.json |
Element schema definition file |
--core_element |
Schema | No | first element in schema | Anchor element for Step 2 recombination |
--num_samples |
Processing | No | 1 |
Samples generated per seed item in Step 2 |
--max_workers |
Processing | No | 8 |
Parallel worker count |
* Required, but satisfied by the SANDBOX_URL env var if not passed explicitly.
The numbered scripts let you run (or re-run) each stage separately. They share the same Model & API / Sandbox configuration described above.
| Step | Script | Purpose |
|---|---|---|
| 1 | 1_element_schema_optimization.py |
Analyze element entropy & conditional mutual information, then propose schema refinements |
| 2 | 2_extract_elements.py |
Extract atomic elements from seed coding problems |
| 3 | 3_generated_new_elements.py |
Recombine extracted elements into novel element sets |
| 4 | 4_design_task.py |
Design new coding tasks from element sets |
| 5 | 5_generate_solution_test.py |
Generate reference solutions and test-case generators |
| 6 | 6_valid_problem.py |
Validate problems by running solutions against generated tests (sandbox) |
| 7 | 7_adversarial_refinement.py |
Strengthen test-case generators via near-miss adversarial refinement (sandbox) |
Examples:
# Step 2: extract elements
python 2_extract_elements.py \
--input_path seeds.json --output_path extracted.json
# Step 6: validate solutions (requires SANDBOX_URL or --sandbox_url)
python 6_valid_problem.py \
--input_path solutions.json \
--output_path valid.json \
--debug_output_path valid_debug.json
# Step 7: adversarial test refinement (requires SANDBOX_URL or --sandbox_url)
python 7_adversarial_refinement.py \
--input_path valid.json --output_path refined.jsonRun python <script>.py --help for the full, grouped argument list of any step.
The default schema (schemas/algorithm.json) defines four elements for algorithmic tasks: Core Algorithm Idea, Story Background, Strategy Diversity, and Difficulty Level. Additional domain schemas are available at schemas/data_science.json and schemas/tool_usage.json. To adapt ADR to a different task domain, create a new schema file:
[
{"name": "Element A", "definition": "..."},
{"name": "Element B", "definition": "..."}
]Then pass it via --schema_path your_schema.json.
ADR/
├── 0_pipeline.py # Full end-to-end pipeline (entry point)
├── 1_element_schema_optimization.py # Step 1: schema entropy/CMI analysis
├── 2_extract_elements.py # Step 2: extract atomic elements
├── 3_generated_new_elements.py # Step 3: recombine elements
├── 4_design_task.py # Step 4: design tasks
├── 5_generate_solution_test.py # Step 5: generate solutions + tests
├── 6_valid_problem.py # Step 6: validate via sandbox
├── 7_adversarial_refinement.py # Step 7: adversarial test refinement
├── requirements.txt # Pinned dependencies
├── schemas/
│ ├── algorithm.json # Default algorithmic element schema
│ ├── data_science.json # Data science element schema
│ └── tool_usage.json # Tool usage element schema
├── utils/ # Non-synthesis utility clients
│ ├── openai_utils.py # LLM call helpers (incl. safe_llm_call)
│ └── call_sandbox_api.py # Sandbox execution client
├── utils_synthetic/ # Data-synthesis prompts and helpers
│ ├── prompts_synthetic_algorithm.py # Algorithmic prompts + schema/adversarial prompts
│ ├── prompts_synthetic_data_science.py
│ ├── prompts_synthetic_tool_usage.py
│ ├── exec_utils.py # Generated-code timeout helper
│ ├── pipeline_utils.py # Element schema + parsing/validation/IO helpers
│ └── exp_*.py # Per-step parsing/prompt-building helpers
└── assets/

