Automated Literature-to-Simulation Pipeline for Automotive CFD
From published paper to benchmarked simulation in hours, not weeks.
A new turbulence model paper drops on arXiv. Today, evaluating it against your vehicle design means:
- Read & understand the paper (1-2 days)
- Implement the model in OpenFOAM C++ (3-5 days)
- Set up benchmark cases and mesh (1-2 days)
- Run simulations and post-process (1-2 days)
- Compare against existing models (1 day)
Total: 1-2 weeks per paper. Most papers never get evaluated.
SimScout automates the entire pipeline:
Paper published on arXiv ──► Structured extraction ──► C++ code generation
│
Benchmark report ◄── Karpathy Loop optimization ◄── OpenFOAM simulation
Target: <24 hours from publication to benchmark report.
graph TB
subgraph Literature["📚 Literature Intelligence"]
A[arXiv / SAE / ASME Feeds] --> B[Paper Scanner]
B --> C[Paper Analyzer]
C --> D[Knowledge Cards]
D --> E[(PostgreSQL + pgvector)]
end
subgraph CodeGen["⚙️ Code Generation"]
D --> F[Base Model Selector]
F --> G[C++ Turbulence Model Gen]
G --> H[Case File Generator]
H --> I[wmake Compilation]
end
subgraph Benchmark["📊 Benchmark Engine"]
I --> J[DrivAerML Runner]
J --> K[Karpathy Loop]
K --> L[Metric Calculator]
L --> M[Report Generator]
end
subgraph Infra["🔧 Infrastructure"]
N[Celery + Redis] -.-> J
O[OpenFOAM v2512 Container] -.-> I
O -.-> J
end
M --> P[Streamlit Dashboard]
M --> Q[REST API]
M --> R[Weekly Digest]
style Literature fill:#1a1a2e,stroke:#16213e,color:#e94560
style CodeGen fill:#1a1a2e,stroke:#16213e,color:#0f3460
style Benchmark fill:#1a1a2e,stroke:#16213e,color:#533483
style Infra fill:#0f0f0f,stroke:#333,color:#888
| Layer | Technology | Version | Why |
|---|---|---|---|
| Language | Python | 3.13+ | 42% faster than 3.10, best library compat |
| CFD Solver | OpenFOAM | v2512 | Latest ESI release, new two-layer wall treatment |
| LLM — Analysis | GPT-4o | Latest | Structured Outputs for guaranteed JSON extraction |
| LLM — Code Gen | Claude Opus 4 | Latest | Superior long-form C++ generation |
| GNN Surrogates | PyTorch Geometric | 2.7+ | NVIDIA-recommended, 30% faster than DGL |
| Benchmark Data | DrivAerML | 500 variants | 160M cells/variant, CC-BY-SA, HuggingFace |
| Vector DB | PostgreSQL + pgvector | 0.8+ | HNSW indexing, filtered vector search |
| Job Queue | Celery + Redis | 5.6+ / 8.6+ | Long-running HPC job orchestration |
| Dashboard | Streamlit | 1.55+ | Rapid scientific visualization |
| Literature APIs | Semantic Scholar + arXiv | v0.11 / v2.4 | 1 RPS authenticated, category filtering |
| ML-CFD Bridge | SmartSim | 0.8+ | In-memory Redis coupling (post-MVP) |
| Tooling | uv + ruff + mypy | Latest | 100x faster installs, unified linting |
- Automated scanning of arXiv (
physics.flu-dyn,cs.CE), SAE, and ASME feeds - Structured extraction into
CFDKnowledgeCardobjects: equations, constants, BCs, validation cases - 4-layer citation verification ensuring paper credibility
- Semantic similarity search via pgvector to find related prior work
- Relevance filtering tuned for automotive aerodynamics
- Base model matching — identifies the closest existing OpenFOAM model (kOmegaSST, kEpsilon, etc.)
- C++ turbulence model generation with OpenFOAM coding conventions
- Automated
wmakecompilation with error-driven retry (up to 3 iterations) - Runtime-selectable models — generated code loads via
controlDictwithout recompiling OpenFOAM - Case file generation from parameterized templates
- DrivAerML integration — 500 parametric car variants, ~30TB of reference data
- Karpathy Loop optimization — tunes model constants to minimize Cd prediction error
- Automated convergence checking with residual and force coefficient monitoring
- Comprehensive reports — Cd/Cl comparison, surface Cp plots, convergence curves
- MeshGraphNet and X-MeshGraphNet architectures for fast aerodynamic prediction
- Trained on DrivAerML/DrivAerNet++ datasets (up to 8K car variants)
- 1000x speedup over full CFD for preliminary screening
- Python 3.13+
- Docker & Docker Compose
- uv (recommended) or pip
- OpenAI API key (for GPT-4o)
- Anthropic API key (for Claude)
# Clone the repository
git clone https://github.com/your-org/sim-scout.git
cd sim-scout
# Install dependencies with uv (recommended)
uv sync
# Or with pip
pip install -e ".[dev]"
# Copy environment template
cp .env.example .env
# Edit .env with your API keys
# Start infrastructure (PostgreSQL + Redis + OpenFOAM)
docker compose up -d
# Initialize the database
uv run python -m scripts.init_db
# Verify installation
uv run pytest# Scan arXiv for recent automotive CFD papers
uv run python -m scripts.scan_papers --days 7 --category "physics.flu-dyn"
# Analyze a specific paper
uv run python -m scripts.scan_papers --arxiv-id 2408.11969
# Run benchmark on a generated model
uv run python -m scripts.run_benchmark --model-id <model_id> --variants 5
# Launch the dashboard
uv run streamlit run src/dashboard/app.pysim-scout/
├── CLAUDE.md # AI assistant instructions
├── README.md
├── pyproject.toml # Project config (uv, ruff, mypy, hatch)
├── docker-compose.yml # PostgreSQL + Redis + OpenFOAM
├── .env.example # Environment template
│
├── src/
│ ├── main.py # Application entrypoint
│ ├── config.py # Pydantic settings
│ │
│ ├── literature/ # Paper discovery & analysis
│ │ ├── scanner.py # arXiv/SAE/ASME feed monitoring
│ │ ├── paper_analyzer.py # LLM-powered structured extraction
│ │ ├── knowledge_cards.py # CFDKnowledgeCard dataclass + DB ops
│ │ ├── citation_verifier.py # 4-layer credibility verification
│ │ └── relevance_filter.py # Automotive aero relevance scoring
│ │
│ ├── codegen/ # OpenFOAM code generation
│ │ ├── openfoam_generator.py # Orchestrates full case generation
│ │ ├── turbulence_model.py # C++ turbulence model source gen
│ │ ├── boundary_conditions.py# BC implementation generation
│ │ ├── template_library.py # Parameterized OF case templates
│ │ └── compiler.py # wmake compilation + validation
│ │
│ ├── benchmark/ # Simulation execution & analysis
│ │ ├── drivaerml_runner.py # DrivAerML dataset integration
│ │ ├── metric_calculator.py # Cd, Cl, surface Cp comparison
│ │ ├── convergence_checker.py# Residual & force convergence
│ │ └── report_generator.py # Automated benchmark reports
│ │
│ ├── optimizer/ # Karpathy Loop
│ │ ├── param_tuner.py # Model constant optimization
│ │ └── loop.py # Tune → Run → Measure → Keep
│ │
│ ├── data/
│ │ ├── openfoam_templates/ # Base case files (simpleCar, etc.)
│ │ └── reference_results/ # Validated baseline results
│ │
│ ├── api/
│ │ └── routes.py # FastAPI REST endpoints
│ │
│ └── dashboard/
│ └── app.py # Streamlit visualization
│
├── openfoam/
│ ├── Dockerfile.openfoam # OpenFOAM v2512 dev container
│ └── custom_models/ # Generated C++ turbulence models
│
├── tests/
│ ├── test_paper_analyzer.py
│ ├── test_codegen.py
│ └── test_benchmark.py
│
├── scripts/
│ ├── scan_papers.py # CLI: scan literature feeds
│ ├── run_benchmark.py # CLI: execute benchmark suite
│ ├── download_drivaerml.py # CLI: fetch DrivAerML from HuggingFace
│ └── init_db.py # CLI: initialize PostgreSQL + pgvector
│
└── docs/
└── assets/
├── banner-dark.svg # Dark mode banner
└── banner-light.svg # Light mode banner
SimScout benchmarks against the DrivAerML dataset — the largest open automotive CFD dataset:
| Property | Value |
|---|---|
| Car variants | 500 parametric DrivAer configurations |
| Mesh resolution | ~160M cells per variant |
| Total size | ~30 TB |
| Data included | Cd/Cl/Cm coefficients, surface Cp (VTP), volume fields (VTU), STL geometries |
| License | CC-BY-SA 4.0 |
| Paper | arXiv:2408.11969 |
Related datasets for GNN training:
- DrivAerNet — 4,000 car designs with full 3D flow fields
- DrivAerNet++ — 8,000 car designs with parametric geometry
# Download a subset for benchmarking (first 10 variants)
uv run python -m scripts.download_drivaerml --variants 10- arXiv + Semantic Scholar scanner with rate-limit handling
- GPT-4o paper analyzer with structured output schemas
-
CFDKnowledgeCardextraction and PostgreSQL storage - pgvector semantic search for related papers
- Citation verifier (4-layer: DOI, author, venue, cross-ref)
- Automotive aerodynamics relevance filter
- OpenFOAM v2512 Docker dev container with wmake toolchain
- Base model matching algorithm (kOmegaSST, kEpsilon, Spalart-Allmaras)
- Claude-powered C++ turbulence model generation
- Parameterized case file generator
- Automated compilation with 3-retry error correction loop
- DrivAerML HuggingFace integration + subset downloader
- Celery-based simulation job queue
- Karpathy Loop: constant tuning to minimize Cd error
- Convergence checker (residuals + force coefficients)
- Benchmark report generator with Plotly visualizations
- Streamlit dashboard: model comparison, Cp surface plots
- FastAPI REST endpoints for programmatic access
- Weekly digest: new papers + preliminary benchmark results
- Webhook notifications (Slack, email)
- GNN surrogate models (MeshGraphNet on DrivAerNet++)
- SmartSim in-memory ML-CFD coupling
- Multi-solver support (SU2, STAR-CCM+ export)
- Temporal workflow orchestration (replace Celery for complex DAGs)
| Metric | Target | How Measured |
|---|---|---|
| Paper extraction rate | >80% of relevant CFD papers yield valid KnowledgeCards | Automated QA on extracted fields |
| First-compile success | >70% of generated C++ compiles without manual intervention | CI compilation pipeline |
| Cd accuracy | Within 5% of published results for validated models | DrivAerML reference comparison |
| End-to-end latency | <24 hours from paper publication to benchmark report | Pipeline timestamp tracking |
| Project | What It Does | SimScout's Differentiator |
|---|---|---|
| ChatCFD | Conversational OF case setup (82% success) | We go further: paper → new model → benchmark |
| OpenFOAMGPT 2.0 | End-to-end conversational CFD | We automate the research pipeline, not just Q&A |
| FoamGPT | Natural language → OF commands | We generate novel turbulence models, not just run existing ones |
SimScout's unique value: No existing tool goes from "a paper was published" to "here's how it performs on your vehicle design" automatically.
SimScout uses Pydantic Settings for configuration. All settings can be overridden via environment variables:
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
POSTGRES_URL=postgresql://simscout:pass@localhost:5432/simscout
REDIS_URL=redis://localhost:6379/0
OPENFOAM_DOCKER_IMAGE=opencfd/openfoam-dev:2512
DRIVAERML_CACHE_DIR=./data/drivaerml
SCAN_INTERVAL_HOURS=24
MAX_CFD_RUNTIME_MINUTES=30We welcome contributions! See CONTRIBUTING.md for guidelines.
# Development setup
uv sync --group dev
# Run linters
uv run ruff check src/ tests/
uv run mypy src/
# Run tests
uv run pytest -v
# Run tests with coverage
uv run pytest --cov=src --cov-report=html- DrivAerML: Ashton, N. et al. (2024). DrivAerML: High-Fidelity CFD Dataset for Automotive Aerodynamics
- OpenFOAM v2512: Release Notes
- SmartSim: CrayLabs/SmartSim
- PyTorch Geometric: pyg-team/pytorch_geometric
- MeshGraphNets: Pfaff, T. et al. (2021). Learning Mesh-Based Simulation with Graph Networks. ICLR.
- DrivAerNet++: El-Refaie, M. et al. (2024). DrivAerNet++
This project is licensed under the Apache License 2.0 — see LICENSE for details.
Built with precision for CFD engineers who'd rather simulate than search.