GitHub - lonexreb/sim-scout: Automated Literature-to-Simulation Pipeline for Automotive CFD — From published paper to benchmarked simulation in hours, not weeks.

Automated Literature-to-Simulation Pipeline for Automotive CFD

From published paper to benchmarked simulation in hours, not weeks.

The Problem

A new turbulence model paper drops on arXiv. Today, evaluating it against your vehicle design means:

Read & understand the paper (1-2 days)
Implement the model in OpenFOAM C++ (3-5 days)
Set up benchmark cases and mesh (1-2 days)
Run simulations and post-process (1-2 days)
Compare against existing models (1 day)

Total: 1-2 weeks per paper. Most papers never get evaluated.

The Solution

SimScout automates the entire pipeline:

Paper published on arXiv  ──►  Structured extraction  ──►  C++ code generation
                                                                    │
   Benchmark report  ◄──  Karpathy Loop optimization  ◄──  OpenFOAM simulation

Target: <24 hours from publication to benchmark report.

Architecture

graph TB
    subgraph Literature["📚 Literature Intelligence"]
        A[arXiv / SAE / ASME Feeds] --> B[Paper Scanner]
        B --> C[Paper Analyzer]
        C --> D[Knowledge Cards]
        D --> E[(PostgreSQL + pgvector)]
    end

    subgraph CodeGen["⚙️ Code Generation"]
        D --> F[Base Model Selector]
        F --> G[C++ Turbulence Model Gen]
        G --> H[Case File Generator]
        H --> I[wmake Compilation]
    end

    subgraph Benchmark["📊 Benchmark Engine"]
        I --> J[DrivAerML Runner]
        J --> K[Karpathy Loop]
        K --> L[Metric Calculator]
        L --> M[Report Generator]
    end

    subgraph Infra["🔧 Infrastructure"]
        N[Celery + Redis] -.-> J
        O[OpenFOAM v2512 Container] -.-> I
        O -.-> J
    end

    M --> P[Streamlit Dashboard]
    M --> Q[REST API]
    M --> R[Weekly Digest]

    style Literature fill:#1a1a2e,stroke:#16213e,color:#e94560
    style CodeGen fill:#1a1a2e,stroke:#16213e,color:#0f3460
    style Benchmark fill:#1a1a2e,stroke:#16213e,color:#533483
    style Infra fill:#0f0f0f,stroke:#333,color:#888

Tech Stack

Layer	Technology	Version	Why
Language	Python	3.13+	42% faster than 3.10, best library compat
CFD Solver	OpenFOAM	v2512	Latest ESI release, new two-layer wall treatment
LLM — Analysis	GPT-4o	Latest	Structured Outputs for guaranteed JSON extraction
LLM — Code Gen	Claude Opus 4	Latest	Superior long-form C++ generation
GNN Surrogates	PyTorch Geometric	2.7+	NVIDIA-recommended, 30% faster than DGL
Benchmark Data	DrivAerML	500 variants	160M cells/variant, CC-BY-SA, HuggingFace
Vector DB	PostgreSQL + pgvector	0.8+	HNSW indexing, filtered vector search
Job Queue	Celery + Redis	5.6+ / 8.6+	Long-running HPC job orchestration
Dashboard	Streamlit	1.55+	Rapid scientific visualization
Literature APIs	Semantic Scholar + arXiv	v0.11 / v2.4	1 RPS authenticated, category filtering
ML-CFD Bridge	SmartSim	0.8+	In-memory Redis coupling (post-MVP)
Tooling	uv + ruff + mypy	Latest	100x faster installs, unified linting

Key Features

Literature Intelligence

Automated scanning of arXiv (physics.flu-dyn, cs.CE), SAE, and ASME feeds
Structured extraction into CFDKnowledgeCard objects: equations, constants, BCs, validation cases
4-layer citation verification ensuring paper credibility
Semantic similarity search via pgvector to find related prior work
Relevance filtering tuned for automotive aerodynamics

Code Generation

Base model matching — identifies the closest existing OpenFOAM model (kOmegaSST, kEpsilon, etc.)
C++ turbulence model generation with OpenFOAM coding conventions
Automated wmake compilation with error-driven retry (up to 3 iterations)
Runtime-selectable models — generated code loads via controlDict without recompiling OpenFOAM
Case file generation from parameterized templates

Benchmark Engine

DrivAerML integration — 500 parametric car variants, ~30TB of reference data
Karpathy Loop optimization — tunes model constants to minimize Cd prediction error
Automated convergence checking with residual and force coefficient monitoring
Comprehensive reports — Cd/Cl comparison, surface Cp plots, convergence curves

GNN Surrogates (Post-MVP)

MeshGraphNet and X-MeshGraphNet architectures for fast aerodynamic prediction
Trained on DrivAerML/DrivAerNet++ datasets (up to 8K car variants)
1000x speedup over full CFD for preliminary screening

Quickstart

Prerequisites

Python 3.13+
Docker & Docker Compose
uv (recommended) or pip
OpenAI API key (for GPT-4o)
Anthropic API key (for Claude)

Installation

# Clone the repository
git clone https://github.com/your-org/sim-scout.git
cd sim-scout

# Install dependencies with uv (recommended)
uv sync

# Or with pip
pip install -e ".[dev]"

# Copy environment template
cp .env.example .env
# Edit .env with your API keys

# Start infrastructure (PostgreSQL + Redis + OpenFOAM)
docker compose up -d

# Initialize the database
uv run python -m scripts.init_db

# Verify installation
uv run pytest

Run Your First Scan

# Scan arXiv for recent automotive CFD papers
uv run python -m scripts.scan_papers --days 7 --category "physics.flu-dyn"

# Analyze a specific paper
uv run python -m scripts.scan_papers --arxiv-id 2408.11969

# Run benchmark on a generated model
uv run python -m scripts.run_benchmark --model-id <model_id> --variants 5

# Launch the dashboard
uv run streamlit run src/dashboard/app.py

Project Structure

sim-scout/
├── CLAUDE.md                     # AI assistant instructions
├── README.md
├── pyproject.toml                # Project config (uv, ruff, mypy, hatch)
├── docker-compose.yml            # PostgreSQL + Redis + OpenFOAM
├── .env.example                  # Environment template
│
├── src/
│   ├── main.py                   # Application entrypoint
│   ├── config.py                 # Pydantic settings
│   │
│   ├── literature/               # Paper discovery & analysis
│   │   ├── scanner.py            # arXiv/SAE/ASME feed monitoring
│   │   ├── paper_analyzer.py     # LLM-powered structured extraction
│   │   ├── knowledge_cards.py    # CFDKnowledgeCard dataclass + DB ops
│   │   ├── citation_verifier.py  # 4-layer credibility verification
│   │   └── relevance_filter.py   # Automotive aero relevance scoring
│   │
│   ├── codegen/                  # OpenFOAM code generation
│   │   ├── openfoam_generator.py # Orchestrates full case generation
│   │   ├── turbulence_model.py   # C++ turbulence model source gen
│   │   ├── boundary_conditions.py# BC implementation generation
│   │   ├── template_library.py   # Parameterized OF case templates
│   │   └── compiler.py           # wmake compilation + validation
│   │
│   ├── benchmark/                # Simulation execution & analysis
│   │   ├── drivaerml_runner.py   # DrivAerML dataset integration
│   │   ├── metric_calculator.py  # Cd, Cl, surface Cp comparison
│   │   ├── convergence_checker.py# Residual & force convergence
│   │   └── report_generator.py   # Automated benchmark reports
│   │
│   ├── optimizer/                # Karpathy Loop
│   │   ├── param_tuner.py        # Model constant optimization
│   │   └── loop.py               # Tune → Run → Measure → Keep
│   │
│   ├── data/
│   │   ├── openfoam_templates/   # Base case files (simpleCar, etc.)
│   │   └── reference_results/    # Validated baseline results
│   │
│   ├── api/
│   │   └── routes.py             # FastAPI REST endpoints
│   │
│   └── dashboard/
│       └── app.py                # Streamlit visualization
│
├── openfoam/
│   ├── Dockerfile.openfoam       # OpenFOAM v2512 dev container
│   └── custom_models/            # Generated C++ turbulence models
│
├── tests/
│   ├── test_paper_analyzer.py
│   ├── test_codegen.py
│   └── test_benchmark.py
│
├── scripts/
│   ├── scan_papers.py            # CLI: scan literature feeds
│   ├── run_benchmark.py          # CLI: execute benchmark suite
│   ├── download_drivaerml.py     # CLI: fetch DrivAerML from HuggingFace
│   └── init_db.py                # CLI: initialize PostgreSQL + pgvector
│
└── docs/
    └── assets/
        ├── banner-dark.svg       # Dark mode banner
        └── banner-light.svg      # Light mode banner

DrivAerML Dataset

SimScout benchmarks against the DrivAerML dataset — the largest open automotive CFD dataset:

Property	Value
Car variants	500 parametric DrivAer configurations
Mesh resolution	~160M cells per variant
Total size	~30 TB
Data included	Cd/Cl/Cm coefficients, surface Cp (VTP), volume fields (VTU), STL geometries
License	CC-BY-SA 4.0
Paper	arXiv:2408.11969

Related datasets for GNN training:

DrivAerNet — 4,000 car designs with full 3D flow fields
DrivAerNet++ — 8,000 car designs with parametric geometry

# Download a subset for benchmarking (first 10 variants)
uv run python -m scripts.download_drivaerml --variants 10

Development Roadmap

Phase 1: Literature Pipeline (Weeks 1-4)

arXiv + Semantic Scholar scanner with rate-limit handling
GPT-4o paper analyzer with structured output schemas
CFDKnowledgeCard extraction and PostgreSQL storage
pgvector semantic search for related papers
Citation verifier (4-layer: DOI, author, venue, cross-ref)
Automotive aerodynamics relevance filter

Phase 2: OpenFOAM Integration (Weeks 5-8)

OpenFOAM v2512 Docker dev container with wmake toolchain
Base model matching algorithm (kOmegaSST, kEpsilon, Spalart-Allmaras)
Claude-powered C++ turbulence model generation
Parameterized case file generator
Automated compilation with 3-retry error correction loop

Phase 3: Benchmark + Optimization (Weeks 9-12)

DrivAerML HuggingFace integration + subset downloader
Celery-based simulation job queue
Karpathy Loop: constant tuning to minimize Cd error
Convergence checker (residuals + force coefficients)
Benchmark report generator with Plotly visualizations

Phase 4: Dashboard + API (Weeks 13-16)

Streamlit dashboard: model comparison, Cp surface plots
FastAPI REST endpoints for programmatic access
Weekly digest: new papers + preliminary benchmark results
Webhook notifications (Slack, email)

Post-MVP

GNN surrogate models (MeshGraphNet on DrivAerNet++)
SmartSim in-memory ML-CFD coupling
Multi-solver support (SU2, STAR-CCM+ export)
Temporal workflow orchestration (replace Celery for complex DAGs)

Success Metrics

Metric	Target	How Measured
Paper extraction rate	>80% of relevant CFD papers yield valid KnowledgeCards	Automated QA on extracted fields
First-compile success	>70% of generated C++ compiles without manual intervention	CI compilation pipeline
Cd accuracy	Within 5% of published results for validated models	DrivAerML reference comparison
End-to-end latency	<24 hours from paper publication to benchmark report	Pipeline timestamp tracking

Competitive Landscape

Project	What It Does	SimScout's Differentiator
ChatCFD	Conversational OF case setup (82% success)	We go further: paper → new model → benchmark
OpenFOAMGPT 2.0	End-to-end conversational CFD	We automate the research pipeline, not just Q&A
FoamGPT	Natural language → OF commands	We generate novel turbulence models, not just run existing ones

SimScout's unique value: No existing tool goes from "a paper was published" to "here's how it performs on your vehicle design" automatically.

Configuration

SimScout uses Pydantic Settings for configuration. All settings can be overridden via environment variables:

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
POSTGRES_URL=postgresql://simscout:pass@localhost:5432/simscout
REDIS_URL=redis://localhost:6379/0
OPENFOAM_DOCKER_IMAGE=opencfd/openfoam-dev:2512
DRIVAERML_CACHE_DIR=./data/drivaerml
SCAN_INTERVAL_HOURS=24
MAX_CFD_RUNTIME_MINUTES=30

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
uv sync --group dev

# Run linters
uv run ruff check src/ tests/
uv run mypy src/

# Run tests
uv run pytest -v

# Run tests with coverage
uv run pytest --cov=src --cov-report=html

References

DrivAerML: Ashton, N. et al. (2024). DrivAerML: High-Fidelity CFD Dataset for Automotive Aerodynamics
OpenFOAM v2512: Release Notes
SmartSim: CrayLabs/SmartSim
PyTorch Geometric: pyg-team/pytorch_geometric
MeshGraphNets: Pfaff, T. et al. (2021). Learning Mesh-Based Simulation with Graph Networks. ICLR.
DrivAerNet++: El-Refaie, M. et al. (2024). DrivAerNet++

License

This project is licensed under the Apache License 2.0 — see LICENSE for details.

_{Built with precision for CFD engineers who'd rather simulate than search.}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs/assets		docs/assets
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

The Solution

Architecture

Tech Stack

Key Features

Literature Intelligence

Code Generation

Benchmark Engine

GNN Surrogates (Post-MVP)

Quickstart

Prerequisites

Installation

Run Your First Scan

Project Structure

DrivAerML Dataset

Development Roadmap

Phase 1: Literature Pipeline (Weeks 1-4)

Phase 2: OpenFOAM Integration (Weeks 5-8)

Phase 3: Benchmark + Optimization (Weeks 9-12)

Phase 4: Dashboard + API (Weeks 13-16)

Post-MVP

Success Metrics

Competitive Landscape

Configuration

Contributing

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Problem

The Solution

Architecture

Tech Stack

Key Features

Literature Intelligence

Code Generation

Benchmark Engine

GNN Surrogates (Post-MVP)

Quickstart

Prerequisites

Installation

Run Your First Scan

Project Structure

DrivAerML Dataset

Development Roadmap

Phase 1: Literature Pipeline (Weeks 1-4)

Phase 2: OpenFOAM Integration (Weeks 5-8)

Phase 3: Benchmark + Optimization (Weeks 9-12)

Phase 4: Dashboard + API (Weeks 13-16)

Post-MVP

Success Metrics

Competitive Landscape

Configuration

Contributing

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages