Skip to content

lonexreb/sim-scout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SimScout Banner

Automated Literature-to-Simulation Pipeline for Automotive CFD

Python 3.13+ OpenFOAM v2512 License DrivAerML LLM Powered

From published paper to benchmarked simulation in hours, not weeks.


The Problem

A new turbulence model paper drops on arXiv. Today, evaluating it against your vehicle design means:

  1. Read & understand the paper (1-2 days)
  2. Implement the model in OpenFOAM C++ (3-5 days)
  3. Set up benchmark cases and mesh (1-2 days)
  4. Run simulations and post-process (1-2 days)
  5. Compare against existing models (1 day)

Total: 1-2 weeks per paper. Most papers never get evaluated.

The Solution

SimScout automates the entire pipeline:

Paper published on arXiv  ──►  Structured extraction  ──►  C++ code generation
                                                                    │
   Benchmark report  ◄──  Karpathy Loop optimization  ◄──  OpenFOAM simulation

Target: <24 hours from publication to benchmark report.


Architecture

graph TB
    subgraph Literature["📚 Literature Intelligence"]
        A[arXiv / SAE / ASME Feeds] --> B[Paper Scanner]
        B --> C[Paper Analyzer]
        C --> D[Knowledge Cards]
        D --> E[(PostgreSQL + pgvector)]
    end

    subgraph CodeGen["⚙️ Code Generation"]
        D --> F[Base Model Selector]
        F --> G[C++ Turbulence Model Gen]
        G --> H[Case File Generator]
        H --> I[wmake Compilation]
    end

    subgraph Benchmark["📊 Benchmark Engine"]
        I --> J[DrivAerML Runner]
        J --> K[Karpathy Loop]
        K --> L[Metric Calculator]
        L --> M[Report Generator]
    end

    subgraph Infra["🔧 Infrastructure"]
        N[Celery + Redis] -.-> J
        O[OpenFOAM v2512 Container] -.-> I
        O -.-> J
    end

    M --> P[Streamlit Dashboard]
    M --> Q[REST API]
    M --> R[Weekly Digest]

    style Literature fill:#1a1a2e,stroke:#16213e,color:#e94560
    style CodeGen fill:#1a1a2e,stroke:#16213e,color:#0f3460
    style Benchmark fill:#1a1a2e,stroke:#16213e,color:#533483
    style Infra fill:#0f0f0f,stroke:#333,color:#888
Loading

Tech Stack

Layer Technology Version Why
Language Python 3.13+ 42% faster than 3.10, best library compat
CFD Solver OpenFOAM v2512 Latest ESI release, new two-layer wall treatment
LLM — Analysis GPT-4o Latest Structured Outputs for guaranteed JSON extraction
LLM — Code Gen Claude Opus 4 Latest Superior long-form C++ generation
GNN Surrogates PyTorch Geometric 2.7+ NVIDIA-recommended, 30% faster than DGL
Benchmark Data DrivAerML 500 variants 160M cells/variant, CC-BY-SA, HuggingFace
Vector DB PostgreSQL + pgvector 0.8+ HNSW indexing, filtered vector search
Job Queue Celery + Redis 5.6+ / 8.6+ Long-running HPC job orchestration
Dashboard Streamlit 1.55+ Rapid scientific visualization
Literature APIs Semantic Scholar + arXiv v0.11 / v2.4 1 RPS authenticated, category filtering
ML-CFD Bridge SmartSim 0.8+ In-memory Redis coupling (post-MVP)
Tooling uv + ruff + mypy Latest 100x faster installs, unified linting

Key Features

Literature Intelligence

  • Automated scanning of arXiv (physics.flu-dyn, cs.CE), SAE, and ASME feeds
  • Structured extraction into CFDKnowledgeCard objects: equations, constants, BCs, validation cases
  • 4-layer citation verification ensuring paper credibility
  • Semantic similarity search via pgvector to find related prior work
  • Relevance filtering tuned for automotive aerodynamics

Code Generation

  • Base model matching — identifies the closest existing OpenFOAM model (kOmegaSST, kEpsilon, etc.)
  • C++ turbulence model generation with OpenFOAM coding conventions
  • Automated wmake compilation with error-driven retry (up to 3 iterations)
  • Runtime-selectable models — generated code loads via controlDict without recompiling OpenFOAM
  • Case file generation from parameterized templates

Benchmark Engine

  • DrivAerML integration — 500 parametric car variants, ~30TB of reference data
  • Karpathy Loop optimization — tunes model constants to minimize Cd prediction error
  • Automated convergence checking with residual and force coefficient monitoring
  • Comprehensive reports — Cd/Cl comparison, surface Cp plots, convergence curves

GNN Surrogates (Post-MVP)

  • MeshGraphNet and X-MeshGraphNet architectures for fast aerodynamic prediction
  • Trained on DrivAerML/DrivAerNet++ datasets (up to 8K car variants)
  • 1000x speedup over full CFD for preliminary screening

Quickstart

Prerequisites

  • Python 3.13+
  • Docker & Docker Compose
  • uv (recommended) or pip
  • OpenAI API key (for GPT-4o)
  • Anthropic API key (for Claude)

Installation

# Clone the repository
git clone https://github.com/your-org/sim-scout.git
cd sim-scout

# Install dependencies with uv (recommended)
uv sync

# Or with pip
pip install -e ".[dev]"

# Copy environment template
cp .env.example .env
# Edit .env with your API keys

# Start infrastructure (PostgreSQL + Redis + OpenFOAM)
docker compose up -d

# Initialize the database
uv run python -m scripts.init_db

# Verify installation
uv run pytest

Run Your First Scan

# Scan arXiv for recent automotive CFD papers
uv run python -m scripts.scan_papers --days 7 --category "physics.flu-dyn"

# Analyze a specific paper
uv run python -m scripts.scan_papers --arxiv-id 2408.11969

# Run benchmark on a generated model
uv run python -m scripts.run_benchmark --model-id <model_id> --variants 5

# Launch the dashboard
uv run streamlit run src/dashboard/app.py

Project Structure

sim-scout/
├── CLAUDE.md                     # AI assistant instructions
├── README.md
├── pyproject.toml                # Project config (uv, ruff, mypy, hatch)
├── docker-compose.yml            # PostgreSQL + Redis + OpenFOAM
├── .env.example                  # Environment template
│
├── src/
│   ├── main.py                   # Application entrypoint
│   ├── config.py                 # Pydantic settings
│   │
│   ├── literature/               # Paper discovery & analysis
│   │   ├── scanner.py            # arXiv/SAE/ASME feed monitoring
│   │   ├── paper_analyzer.py     # LLM-powered structured extraction
│   │   ├── knowledge_cards.py    # CFDKnowledgeCard dataclass + DB ops
│   │   ├── citation_verifier.py  # 4-layer credibility verification
│   │   └── relevance_filter.py   # Automotive aero relevance scoring
│   │
│   ├── codegen/                  # OpenFOAM code generation
│   │   ├── openfoam_generator.py # Orchestrates full case generation
│   │   ├── turbulence_model.py   # C++ turbulence model source gen
│   │   ├── boundary_conditions.py# BC implementation generation
│   │   ├── template_library.py   # Parameterized OF case templates
│   │   └── compiler.py           # wmake compilation + validation
│   │
│   ├── benchmark/                # Simulation execution & analysis
│   │   ├── drivaerml_runner.py   # DrivAerML dataset integration
│   │   ├── metric_calculator.py  # Cd, Cl, surface Cp comparison
│   │   ├── convergence_checker.py# Residual & force convergence
│   │   └── report_generator.py   # Automated benchmark reports
│   │
│   ├── optimizer/                # Karpathy Loop
│   │   ├── param_tuner.py        # Model constant optimization
│   │   └── loop.py               # Tune → Run → Measure → Keep
│   │
│   ├── data/
│   │   ├── openfoam_templates/   # Base case files (simpleCar, etc.)
│   │   └── reference_results/    # Validated baseline results
│   │
│   ├── api/
│   │   └── routes.py             # FastAPI REST endpoints
│   │
│   └── dashboard/
│       └── app.py                # Streamlit visualization
│
├── openfoam/
│   ├── Dockerfile.openfoam       # OpenFOAM v2512 dev container
│   └── custom_models/            # Generated C++ turbulence models
│
├── tests/
│   ├── test_paper_analyzer.py
│   ├── test_codegen.py
│   └── test_benchmark.py
│
├── scripts/
│   ├── scan_papers.py            # CLI: scan literature feeds
│   ├── run_benchmark.py          # CLI: execute benchmark suite
│   ├── download_drivaerml.py     # CLI: fetch DrivAerML from HuggingFace
│   └── init_db.py                # CLI: initialize PostgreSQL + pgvector
│
└── docs/
    └── assets/
        ├── banner-dark.svg       # Dark mode banner
        └── banner-light.svg      # Light mode banner

DrivAerML Dataset

SimScout benchmarks against the DrivAerML dataset — the largest open automotive CFD dataset:

Property Value
Car variants 500 parametric DrivAer configurations
Mesh resolution ~160M cells per variant
Total size ~30 TB
Data included Cd/Cl/Cm coefficients, surface Cp (VTP), volume fields (VTU), STL geometries
License CC-BY-SA 4.0
Paper arXiv:2408.11969

Related datasets for GNN training:

  • DrivAerNet — 4,000 car designs with full 3D flow fields
  • DrivAerNet++ — 8,000 car designs with parametric geometry
# Download a subset for benchmarking (first 10 variants)
uv run python -m scripts.download_drivaerml --variants 10

Development Roadmap

Phase 1: Literature Pipeline (Weeks 1-4)

  • arXiv + Semantic Scholar scanner with rate-limit handling
  • GPT-4o paper analyzer with structured output schemas
  • CFDKnowledgeCard extraction and PostgreSQL storage
  • pgvector semantic search for related papers
  • Citation verifier (4-layer: DOI, author, venue, cross-ref)
  • Automotive aerodynamics relevance filter

Phase 2: OpenFOAM Integration (Weeks 5-8)

  • OpenFOAM v2512 Docker dev container with wmake toolchain
  • Base model matching algorithm (kOmegaSST, kEpsilon, Spalart-Allmaras)
  • Claude-powered C++ turbulence model generation
  • Parameterized case file generator
  • Automated compilation with 3-retry error correction loop

Phase 3: Benchmark + Optimization (Weeks 9-12)

  • DrivAerML HuggingFace integration + subset downloader
  • Celery-based simulation job queue
  • Karpathy Loop: constant tuning to minimize Cd error
  • Convergence checker (residuals + force coefficients)
  • Benchmark report generator with Plotly visualizations

Phase 4: Dashboard + API (Weeks 13-16)

  • Streamlit dashboard: model comparison, Cp surface plots
  • FastAPI REST endpoints for programmatic access
  • Weekly digest: new papers + preliminary benchmark results
  • Webhook notifications (Slack, email)

Post-MVP

  • GNN surrogate models (MeshGraphNet on DrivAerNet++)
  • SmartSim in-memory ML-CFD coupling
  • Multi-solver support (SU2, STAR-CCM+ export)
  • Temporal workflow orchestration (replace Celery for complex DAGs)

Success Metrics

Metric Target How Measured
Paper extraction rate >80% of relevant CFD papers yield valid KnowledgeCards Automated QA on extracted fields
First-compile success >70% of generated C++ compiles without manual intervention CI compilation pipeline
Cd accuracy Within 5% of published results for validated models DrivAerML reference comparison
End-to-end latency <24 hours from paper publication to benchmark report Pipeline timestamp tracking

Competitive Landscape

Project What It Does SimScout's Differentiator
ChatCFD Conversational OF case setup (82% success) We go further: paper → new model → benchmark
OpenFOAMGPT 2.0 End-to-end conversational CFD We automate the research pipeline, not just Q&A
FoamGPT Natural language → OF commands We generate novel turbulence models, not just run existing ones

SimScout's unique value: No existing tool goes from "a paper was published" to "here's how it performs on your vehicle design" automatically.


Configuration

SimScout uses Pydantic Settings for configuration. All settings can be overridden via environment variables:

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
POSTGRES_URL=postgresql://simscout:pass@localhost:5432/simscout
REDIS_URL=redis://localhost:6379/0
OPENFOAM_DOCKER_IMAGE=opencfd/openfoam-dev:2512
DRIVAERML_CACHE_DIR=./data/drivaerml
SCAN_INTERVAL_HOURS=24
MAX_CFD_RUNTIME_MINUTES=30

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
uv sync --group dev

# Run linters
uv run ruff check src/ tests/
uv run mypy src/

# Run tests
uv run pytest -v

# Run tests with coverage
uv run pytest --cov=src --cov-report=html

References


License

This project is licensed under the Apache License 2.0 — see LICENSE for details.


Built with precision for CFD engineers who'd rather simulate than search.

About

Automated Literature-to-Simulation Pipeline for Automotive CFD — From published paper to benchmarked simulation in hours, not weeks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors