FrontierScience Training

A progressive training pipeline for scientific reasoning, modeled after OpenAI's FrontierScience benchmark. Trains language models through three stages: scientific discovery, rubric-based verification, and knowledge graph self-improvement.

Pipeline Overview

Stage 1: Discovery          Stage 2: Verification        Stage 3: Graph Improvement
┌─────────────────┐        ┌─────────────────────┐       ┌──────────────────────┐
│ Propose novel    │        │ Rubric-based eval   │       │ Agent co-evolves     │
│ hypotheses,      │───────>│ with discovery +    │──────>│ with its knowledge   │
│ mechanisms,      │        │ verification        │       │ graph via 7 action   │
│ experiments      │        │ curriculum          │       │ types + 5-stage gate │
└─────────────────┘        └─────────────────────┘       └──────────────────────┘
     5 tasks                  11 task types                  7 action types
     RL + SFT                 Alternating curriculum         Three-layer arch

Key innovation: Each stage builds on the previous checkpoint. The model progressively learns to propose ideas (Stage 1), evaluate them rigorously (Stage 2), and then improve the knowledge graph it was trained on (Stage 3).

Quick Start

# Clone and install
git clone https://github.com/YOUR_USERNAME/frontierscience-training.git
cd frontierscience-training
pip install -e ".[dev]"

# Set up credentials
cp .env.example .env
# Edit .env with your API keys

# Run tests (no external dependencies needed)
pytest tests/ -v

# Start training (requires Tinker SDK)
python -m src.discovery.train --config configs/discovery.yaml --mode sft

Stages

Stage 1: Scientific Discovery

The model learns to propose novel scientific ideas grounded in a knowledge graph.

5 Discovery Tasks: hypothesis proposal, mechanism proposal, experiment design, open question elaboration, rival hypothesis generation

Training: SFT on 2,690 samples, then GRPO-style RL with rewards for structure, coherence, grounding, novelty, and informativeness.

python -m src.discovery.train --config configs/discovery.yaml --mode rl --batches 25

Stage 2: Rubric-Based Verification

Alternating curriculum between discovery and verification modes, inspired by the FrontierScience 10-point rubric system.

6 Discovery Rubric Axes: testability, coherence, novelty value, assumption clarity, pressure point relevance, comparative advantage

11 Verification Tasks: consistency check, evidence sufficiency, contradiction audit, confidence calibration, and more.

python -m src.verification.train --config configs/verification.yaml --batches 150

Stage 3: Graph Self-Improvement

The model proposes modifications to its own knowledge graph, evaluated through a 5-stage gating pipeline.

7 Action Types: propose_edge, add_node, update_node, split_node, merge_nodes, retype_node, deprecate_node

5-Stage Validation:

Schema Compliance (hard gate)
Semantic Plausibility (embedding similarity >= 0.60)
Graded Hallucination Detection (fabricated/misattributed/speculative)
Structural Validation (orphan, cycle, duplicate checks)
Ensemble Validator with frozen critic

Three-Layer Architecture: Speculative (in-memory) -> Probationary (Neo4j, isolated) -> Main Graph

python -m src.graph_improvement.train \
    --config configs/graph_improvement.yaml \
    --mode rl --batches 100

Training Results

Stage	Key Metric	Achieved
Discovery	Validity Rate	98.9%
Verification (Discovery)	Acceptance Rate	98%
Verification (Verification)	Pass Rate	60%
Graph Improvement	Final Reward	0.814
Graph Improvement	Parse Rate	98.6%
Graph Improvement	Validation Rate	90.8%

Requirements

Python >= 3.10
Tinker SDK for model training
Neo4j >= 5.0 (optional, for graph-aware features)
OpenAI API key (for embedding generation)
Anthropic API key (optional, for target regeneration)

See docs/SETUP.md for detailed setup instructions.

Documentation

Setup Guide - Environment, API keys, Neo4j
Architecture - Pipeline design, reward systems, rubric philosophy
Training Guide - Per-stage commands and expected metrics
FrontierScience Alignment - How this maps to the FrontierScience benchmark
Data Validation - Anti-patterns and validation rules
Graph Schema - Full knowledge graph schema (28 node types, 35 relationships)

Project Structure

frontierscience-training/
├── src/
│   ├── common/              # Shared utilities (embeddings, guards, validation)
│   ├── discovery/           # Stage 1: Scientific Discovery
│   ├── verification/        # Stage 2: Rubric-Based Verification
│   └── graph_improvement/   # Stage 3: Graph Self-Improvement
├── configs/                 # YAML configs per stage
├── data/                    # Training data (100MB total)
├── tests/                   # Test suite (82 tests)
└── docs/                    # Documentation

License

Apache 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrontierScience Training

Pipeline Overview

Quick Start

Stages

Stage 1: Scientific Discovery

Stage 2: Rubric-Based Verification

Stage 3: Graph Self-Improvement

Training Results

Requirements

Documentation

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

FrontierScience Training

Pipeline Overview

Quick Start

Stages

Stage 1: Scientific Discovery

Stage 2: Rubric-Based Verification

Stage 3: Graph Self-Improvement

Training Results

Requirements

Documentation

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages