Skip to content

eugenmik/alloyforge

Repository files navigation

AlloyForge — Heat-Resistant Alloy Design Assistant

License: AGPL v3 Status Python Stack Last commit

Version 0.9.0 — auth, security hardening, CALPHAD analysis, MatCalc TDB option. See UPDATE_v0.9.md for the upgrade guide and the full list of v0.9 changes.

AlloyForge is a research tool for metallurgical engineers designing heat-resistant nickel-, cobalt-, and iron-based superalloys. It combines a fast PHACOMP / d-electrons screening engine, a curated reference database of 40 commercial alloys, a cost calculator, and a RAG-powered chat over a knowledge base assembled from any metallurgy literature you load (textbooks, handbooks, papers — anything you can OCR to markdown).

Architecture

┌─────────────┐         ┌─────────────┐         ┌──────────────┐
│   NiceGUI   │ ──HTTP─→│   FastAPI   │ ──SQL──→│  PostgreSQL  │
│  (port 8080)│         │  (port 8000)│         │  + pgvector  │
└─────────────┘         └─────────────┘         └──────────────┘
                              │
                              ├──→ DeepSeek API (LLM)
                              └──→ bge-m3 (local embeddings)
  • Backend (FastAPI) — JWT-protected REST API for PHACOMP, alloy DB, cost, recommendation, CALPHAD analysis, RAG
  • UI (NiceGUI) — interactive interface with login: composition checker, Bo-Md diagram, alloy DB browser, design recommender, CALPHAD analysis, RAG chat, plus help page
  • PostgreSQL + pgvector — relational DB for alloys/elements + vector store for chunks
  • bge-m3 — multilingual (RU/EN) embeddings (CPU, ~2GB)
  • DeepSeek API — LLM for RAG answers and trade-off analysis

Prerequisites

  • Python 3.12 (3.11 also works)
  • Docker + Docker Compose (for production-style local run and VPS deploy)
  • DeepSeek API key (https://platform.deepseek.com/api_keys)
  • ~5GB disk space (PostgreSQL data + bge-m3 model + Docker images)

Quick start (local, in venv — no Docker)

For development and testing PHACOMP without the full stack:

cd alloyforge
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install backend deps
pip install -r backend/requirements.txt

# Run PHACOMP validation (no DB needed)
python scripts/validate_phacomp.py
python scripts/example_usage.py

For the full stack including DB and UI, use the Docker recipe below.


Full stack with Docker Compose

This is the recommended way to run AlloyForge — both for your local dev and for VPS deployment.

Step 1 — configure environment

cd alloyforge
cp .env.example .env
nano .env   # fill in DEEPSEEK_API_KEY and change passwords

Important settings to change in .env:

  • DB_PASSWORD — set a strong password
  • DEEPSEEK_API_KEY — your DeepSeek key (without it, RAG/LLM endpoints return placeholder messages)
  • UI_SECRET — random string for session encryption

Step 2 — drop your processed markdown sources into data/markdown/

The RAG pipeline accepts any metallurgy literature you can OCR to markdown — textbooks, monographs, handbooks, conference proceedings, internal notes. Each source goes into its own directory; expected structure (compatible with MinerU output, but any tool producing markdown will work):

data/markdown/
  <source_stem_1>/
    auto/
      <source_stem_1>.md
      <source_stem_1>_content_list.json
      images/...
  <source_stem_2>/
    auto/...
  ...

The BOOK_METADATA dict in backend/scripts/ingest_rag.py maps directory names to titles and authors. Add an entry for each source you load, or extend the dict with your own.

Step 3 — build and start

cd docker
docker compose up -d --build

First build takes 5-10 minutes (downloads PyTorch CPU wheel, NiceGUI, etc.). Postgres should come up healthy in ~10 seconds, then backend starts and runs Alembic migrations automatically.

Check that all services are healthy:

docker compose ps
docker compose logs backend | tail -30

You should see:

INFO  [alembic.runtime.migration] Running upgrade  -> 0001_initial
INFO:     Started server process [...]
INFO:     Uvicorn running on http://0.0.0.0:8000

Step 4 — seed the database

docker exec -it alloyforge_backend python -m scripts.seed_db

This loads:

  • 30 elements with PHACOMP parameters (Md, Bo, Nv) and snapshot prices
  • 40 reference alloys (CMSX-4, ЖС6У, IN718, MAR-M247, ASM Vol 3 alloys, etc.) with pre-computed PHACOMP

Step 5 — ingest the RAG knowledge base

docker exec -it alloyforge_backend python -m scripts.ingest_rag

This will:

  • Chunk each markdown file (~800 chars/chunk with 100 char overlap)
  • Compute bge-m3 embeddings on CPU (~2-5 minutes per source)
  • Insert into pgvector with IVFFlat index

Throughput on CPU is roughly a few minutes per source; total time scales linearly with corpus size.

⚠ The first run will download the bge-m3 model (~2GB). It's cached in a Docker volume for next runs.

Step 6 — open the UI

Browse to http://localhost:8080

You'll see 5 pages in the left sidebar:

  • 🧪 Composition checker — live PHACOMP as you adjust composition
  • 📊 Bo-Md diagram — interactive scatter plot of all 40 alloys
  • 📚 Alloy database — searchable table with filters and detail views
  • 🎯 Design recommender — input requirements → ranked candidates with LLM trade-off analysis
  • 💬 Knowledge chat — RAG Q&A over your books

API docs at http://localhost:8000/docs.


Verifying the system works

After setup, test endpoints:

# Health
curl http://localhost:8000/health

# Stats
curl http://localhost:8000/api/stats

# PHACOMP for CMSX-4
curl -X POST http://localhost:8000/api/phacomp/check \
  -H "Content-Type: application/json" \
  -d '{"composition_wt": {"Cr":6.5,"Co":9,"Mo":0.6,"W":6,"Re":3,"Ta":6.5,"Al":5.6,"Ti":1,"Hf":0.1}, "base":"Ni"}'

# RAG query (requires ingestion done and DEEPSEEK_API_KEY set)
curl -X POST http://localhost:8000/api/rag/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Why is Re added to nickel superalloys?"}'

Deploying to a VPS

The same docker-compose.yml works on any Docker-capable VPS. Steps:

  1. Push your alloyforge repo to GitHub
  2. SSH into VPS, install Docker if not present
  3. Clone and configure:
git clone https://github.com/YOUR_USER/alloyforge.git
cd alloyforge
cp .env.example .env
nano .env       # set strong passwords, DEEPSEEK_API_KEY, etc.

# Copy your markdown directory (from local machine)
# scp -r data/markdown/ vps:~/alloyforge/data/

cd docker
docker compose up -d --build

docker exec alloyforge_backend python -m scripts.seed_db
docker exec alloyforge_backend python -m scripts.ingest_rag
  1. Reverse proxy with nginx (optional, for HTTPS):
server {
    listen 443 ssl;
    server_name alloyforge.yourdomain.com;
    # ... ssl config ...

    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

⚠ Memory budget: bge-m3 occupies ~2.5 GB resident, plus PostgreSQL and the Python runtimes. A 16 GB host has ~8–10 GB of headroom for typical use; 8 GB hosts work but are tight under concurrent load.


Project layout

alloyforge/
├── README.md                 # this file
├── README_PHACOMP.md         # PHACOMP-specific notes
├── .env.example              # environment template
├── .gitignore
├── backend/                  # FastAPI backend
│   ├── alembic/              # DB migrations
│   ├── alembic.ini
│   ├── requirements.txt
│   ├── app/
│   │   ├── main.py           # FastAPI app + endpoints
│   │   ├── api/schemas.py    # Pydantic models
│   │   ├── core/config.py    # Settings (reads .env)
│   │   ├── db/               # SQLAlchemy + sessions
│   │   ├── phacomp/          # PHACOMP calculator (validated)
│   │   ├── services/         # Recommender + element bridge
│   │   ├── cost/             # Cost calculator
│   │   └── rag/              # Embeddings, LLM, RAG service
│   └── scripts/
│       ├── seed_db.py        # Load elements + alloys CSVs
│       └── ingest_rag.py     # Chunk + embed markdown files
├── ui/                       # NiceGUI frontend
│   ├── main.py
│   ├── requirements.txt
│   └── ui_pages/             # 5 pages
├── data/
│   ├── elements/elements_phacomp.csv
│   ├── seeds/reference_alloys.csv
│   └── markdown/             # Place MinerU output here
├── docker/
│   ├── Dockerfile.backend
│   ├── Dockerfile.ui
│   └── docker-compose.yml
└── scripts/                  # Standalone helpers
    ├── validate_phacomp.py
    └── example_usage.py

Maintenance commands

Re-build after code changes:

cd docker
docker compose up -d --build backend ui

View logs:

docker compose logs -f backend       # follow backend
docker compose logs -f ui
docker compose logs postgres

Reset the database (⚠ destroys all data):

docker compose down -v
docker compose up -d
docker exec alloyforge_backend python -m scripts.seed_db
docker exec alloyforge_backend python -m scripts.ingest_rag

Add/update element price (admin):

curl -X POST http://localhost:8000/api/elements/price \
  -H "Content-Type: application/json" \
  -d '{"symbol": "Re", "price_usd_kg": 2100, "source": "LME 2026-04"}'

Add a new source to the RAG:

  1. Drop the markdown at <source_stem>/auto/<source_stem>.md into data/markdown/
  2. Add an entry to BOOK_METADATA dict in backend/scripts/ingest_rag.py (title/author/year)
  3. Re-run docker exec alloyforge_backend python -m scripts.ingest_rag (skips already-indexed sources)

Known limitations (MVP)

  1. PHACOMP estimates Md/Bo via simple partitioning — values are systematically ~0.05 eV lower than Morinaga published values. Internal thresholds are calibrated to compensate. Relative ranking is preserved. CALPHAD integration (Phase 3) will fix this.

  2. No CALPHAD verification stage yet. The recommender uses pre-computed PHACOMP only. Integrating pycalphad with mc_ni.tdb would add high-fidelity phase analysis on top candidates (Phase 3).

  3. Co- and Fe-Ni-base alloys use Ni-base parameters as proxy. Acceptable for MVP screening; refine later.

  4. No authentication. OK for personal use behind VPN/firewall, NOT for public exposure. Add auth before opening to clients.

  5. Snapshot prices. Element prices in seed are April 2026 estimates. Update via POST /api/elements/price for accurate cost calculations.

  6. Embedding model on CPU. bge-m3 inference takes ~50ms/query on a typical CPU. Acceptable but noticeable. Switch EMBEDDING_DEVICE=cuda if you have a GPU on the deployment host.


What's next (Phase 2 / Phase 3)

  • Phase 2 — grow the RAG corpus: OCR additional sources to markdown, drop them into data/markdown/, add a BOOK_METADATA entry, and re-run the ingest script (it skips already-indexed sources). UI doesn't need changes.
  • Phase 3 — on a CUDA-capable host:
    • Replace partitioning estimate with CALPHAD γ-composition (using pycalphad + mc_ni.tdb)
    • Swap DEEPSEEK_* for a local LLM (e.g. via Ollama)
    • Switch EMBEDDING_DEVICE=cuda
    • Add multi-objective optimization (NSGA-II / pymoo) for design space exploration
    • Add user auth + multi-user support

For research use only. Not a substitute for experimental validation.

About

Heat-resistant superalloy design assistant — PHACOMP/Md/Bo/Nv engine, alloy DB, cost calc, RAG over metallurgy textbooks (FastAPI + NiceGUI + pgvector).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors