Skip to content

shuruheel/frontierscience-training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FrontierScience Training

A progressive training pipeline for scientific reasoning, modeled after OpenAI's FrontierScience benchmark. Trains language models through three stages: scientific discovery, rubric-based verification, and knowledge graph self-improvement.

Pipeline Overview

Stage 1: Discovery          Stage 2: Verification        Stage 3: Graph Improvement
┌─────────────────┐        ┌─────────────────────┐       ┌──────────────────────┐
│ Propose novel    │        │ Rubric-based eval   │       │ Agent co-evolves     │
│ hypotheses,      │───────>│ with discovery +    │──────>│ with its knowledge   │
│ mechanisms,      │        │ verification        │       │ graph via 7 action   │
│ experiments      │        │ curriculum          │       │ types + 5-stage gate │
└─────────────────┘        └─────────────────────┘       └──────────────────────┘
     5 tasks                  11 task types                  7 action types
     RL + SFT                 Alternating curriculum         Three-layer arch

Key innovation: Each stage builds on the previous checkpoint. The model progressively learns to propose ideas (Stage 1), evaluate them rigorously (Stage 2), and then improve the knowledge graph it was trained on (Stage 3).

Quick Start

# Clone and install
git clone https://github.com/YOUR_USERNAME/frontierscience-training.git
cd frontierscience-training
pip install -e ".[dev]"

# Set up credentials
cp .env.example .env
# Edit .env with your API keys

# Run tests (no external dependencies needed)
pytest tests/ -v

# Start training (requires Tinker SDK)
python -m src.discovery.train --config configs/discovery.yaml --mode sft

Stages

Stage 1: Scientific Discovery

The model learns to propose novel scientific ideas grounded in a knowledge graph.

5 Discovery Tasks: hypothesis proposal, mechanism proposal, experiment design, open question elaboration, rival hypothesis generation

Training: SFT on 2,690 samples, then GRPO-style RL with rewards for structure, coherence, grounding, novelty, and informativeness.

python -m src.discovery.train --config configs/discovery.yaml --mode rl --batches 25

Stage 2: Rubric-Based Verification

Alternating curriculum between discovery and verification modes, inspired by the FrontierScience 10-point rubric system.

6 Discovery Rubric Axes: testability, coherence, novelty value, assumption clarity, pressure point relevance, comparative advantage

11 Verification Tasks: consistency check, evidence sufficiency, contradiction audit, confidence calibration, and more.

python -m src.verification.train --config configs/verification.yaml --batches 150

Stage 3: Graph Self-Improvement

The model proposes modifications to its own knowledge graph, evaluated through a 5-stage gating pipeline.

7 Action Types: propose_edge, add_node, update_node, split_node, merge_nodes, retype_node, deprecate_node

5-Stage Validation:

  1. Schema Compliance (hard gate)
  2. Semantic Plausibility (embedding similarity >= 0.60)
  3. Graded Hallucination Detection (fabricated/misattributed/speculative)
  4. Structural Validation (orphan, cycle, duplicate checks)
  5. Ensemble Validator with frozen critic

Three-Layer Architecture: Speculative (in-memory) -> Probationary (Neo4j, isolated) -> Main Graph

python -m src.graph_improvement.train \
    --config configs/graph_improvement.yaml \
    --mode rl --batches 100

Training Results

Stage Key Metric Achieved
Discovery Validity Rate 98.9%
Verification (Discovery) Acceptance Rate 98%
Verification (Verification) Pass Rate 60%
Graph Improvement Final Reward 0.814
Graph Improvement Parse Rate 98.6%
Graph Improvement Validation Rate 90.8%

Requirements

  • Python >= 3.10
  • Tinker SDK for model training
  • Neo4j >= 5.0 (optional, for graph-aware features)
  • OpenAI API key (for embedding generation)
  • Anthropic API key (optional, for target regeneration)

See docs/SETUP.md for detailed setup instructions.

Documentation

Project Structure

frontierscience-training/
├── src/
│   ├── common/              # Shared utilities (embeddings, guards, validation)
│   ├── discovery/           # Stage 1: Scientific Discovery
│   ├── verification/        # Stage 2: Rubric-Based Verification
│   └── graph_improvement/   # Stage 3: Graph Self-Improvement
├── configs/                 # YAML configs per stage
├── data/                    # Training data (100MB total)
├── tests/                   # Test suite (82 tests)
└── docs/                    # Documentation

License

Apache 2.0. See LICENSE.

About

A progressive training pipeline for scientific reasoning, modeled after OpenAI's FrontierScience benchmark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages