DISCLAIMER: This software is provided for research and educational purposes only. Not intended for clinical or veterinary use. No warranty of fitness for any particular purpose.
Design your own mRNA cancer vaccine.
mutavax is an open studio for designing personalized mRNA cancer vaccines — for dogs, cats, and humans. Sequence a tumor and a healthy sample, run the studio on your own machine, hand the design to a manufacturer.
Site: https://mutavax.straehhuber.com
| Pick a species | Stage the samples | Run alignment | Find the mutations | Read what they mean |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
| Score the neoantigens | Curate the cassette | Design the construct | Hand off the vaccine | |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Sample. Sequence a tumor and a matched healthy sample at any standard lab. Two sequencing files — that's the whole input.
Compute. Run mutavax on your machine. Eight guided stages compare tumor vs. healthy, find the cancer-specific mutations, and design the vaccine. ≈12 hours on a workstation.
Design. Send the finished design to a GMP manufacturer. A vial arrives roughly ten days later.
| # | Stage | State | Tools |
|---|---|---|---|
| 1 | Ingestion | Live | samtools, pigz, fastp |
| 2 | Alignment | Live — chunked stop-and-resume on commodity hardware | strobealign, samtools |
| 3 | Variant Calling | Live — karyogram + plain-English filter buckets, Broad 1000G panel-of-normals on human runs | GATK Mutect2 (GPU via NVIDIA Parabricks when available) |
| 4 | Annotation | Live — cancer-gene cards + lollipop plot | Ensembl VEP 111 |
| 5 | Neoantigen Prediction | Live — binding buckets + peptide × allele heatmap + antigen funnel | pVACseq 5.4.0, MHCflurry 2.0 (default, license-free) or NetMHCpan 4.2, NetMHCIIpan 4.3 |
| 6 | Epitope Selection | Live — 8-slot cassette curation UI | pVACview + custom scoring |
| 7 | mRNA Construct Design | Live — molecule hero + λ slider trading CAI vs. MFE + codon swap preview + 7/7 manufacturability checks | LinearDesign, DNAchisel, ViennaRNA |
| 8 | Construct Output | Live — color-coded FASTA with FASTA/GenBank/JSON downloads, CMO release flow, vet dosing, audit trail | pVACvector, Biopython |
Every live stage is pause-and-resumable. Progress is surfaced honestly, tool names live in the expert drawer.
Tumor + matched-normal sequencing for one patient. FASTQ, BAM, or CRAM. ≥30× coverage for confident somatic variant calling.
| Recommended | |
|---|---|
| RAM | 64 GB — strobealign indexing peaks around 31 GB free |
| CPU | 16 cores |
| Disk | 1 TB SSD — a 30× human WGS costs ~400 GB (deduped BAMs + FASTQs); multiple cases share the ~55 GB reference + VEP cache + PON footprint |
| GPU | NVIDIA Ampere+ (RTX 3090 / 4090 / A-series / H-series) — Parabricks accelerates stage 3 Mutect2 ~10× (opt-in) |
| OS | Linux |
Everything runs in a single Docker container: FastAPI backend + Next.js frontend + samtools + strobealign + GATK + VEP + pVACtools + MHCflurry + Parabricks base in one ~10 GB image. No cloud, no object storage.
You don't need to clone this repo. Paste the compose file below, run docker compose up -d, open the browser.
Ubuntu / Debian / Linux Mint:
curl -fsSL https://get.docker.com | sudo bash
sudo usermod -aG docker "$USER"macOS / Windows: install Docker Desktop. For GPU-accelerated stage 3 variant calling on Linux, also install the NVIDIA Container Toolkit.
mkdir ~/mutavax && cd ~/mutavax
curl -fsSL https://raw.githubusercontent.com/niach/mutavax/main/docker-compose.yml -o docker-compose.ymlThe file pulls the pre-built image from GHCR (ghcr.io/niach/mutavax) — no build step on your machine.
Most users don't need one. Add it if you want to customize anything:
cat > .env <<'EOF'
# Where workspace artifacts, references, and the SQLite DB live. Default: ./data
# MUTAVAX_DATA_ROOT=./data
# Stage 9 AI review — only needed if you want the LLM review feature.
# ANTHROPIC_API_KEY=
# Switch the class-I predictor back to DTU NetMHCpan (default is MHCflurry).
# MUTAVAX_CLASS_I_PREDICTOR=NetMHCpan
EOFSee .env.example for the full list of overrides.
The compose file ships with MHCflurry as the default class-I predictor — a license-free alternative to NetMHCpan, validated to match NetMHCpan AUC = 1.000 on the canonical tumor-antigen benchmark. Human users running stages 1–5 class-I only need nothing else.
Opt in to the DTU NetMHC stack if you want:
- non-human species (dog DLA / cat FLA — MHCflurry has no canine or feline training data), or
- class-II neoantigen scoring (NetMHCIIpan has no license-free equivalent).
Both are free for academic use; commercial usage needs a separate DTU license. Fill the forms, download the Linux tarballs:
- NetMHCpan 4.2 — https://services.healthtech.dtu.dk/services/NetMHCpan-4.2/
- NetMHCIIpan 4.3 — https://services.healthtech.dtu.dk/services/NetMHCIIpan-4.3/
Extract them so the layout is:
./data/netmhc/
├── netMHCpan-4.2/
└── netMHCIIpan-4.3/
That dir is mounted at /tools/src:ro inside the backend container, which matches the stock DTU wrapper scripts' hardcoded NMHOME — no script edits.
The backend auto-creates ./data/inbox/, ./data/workspaces/, ./data/references/, and ./data/vep-cache/ on first start. Drop your tumor + normal FASTQ / BAM / CRAM pair into ./data/inbox/ and the app registers them into a workspace.
docker compose up -dOpen http://localhost:3000. Create a workspace, pick a species, follow the stages.
LAN access: The web UI binds to 0.0.0.0:3000, so any other machine on your network can hit http://<server-ip>:3000. Put it behind Caddy/Traefik if you want TLS.
GPU-accelerated stage 3 (opt-in):
curl -fsSL https://raw.githubusercontent.com/niach/mutavax/main/docker-compose.gpu.yml -o docker-compose.gpu.yml
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -dRequires NVIDIA drivers + the NVIDIA Container Toolkit.
Species reference — GRCh38 (human), UU_Cfam_GSD_1.0 (dog), or Felis_catus_9.0 (cat) — is auto-downloaded on the first alignment run. Cached under ./data/references/, shared across workspaces.
Human workspaces apply the Broad's 1000 Genomes panel-of-normals to Mutect2 to filter recurrent artefacts and low-frequency germline variants. The VCF is auto-downloaded, renamed from UCSC to Ensembl contigs, and indexed on first variant-calling run. Lives under ./data/references/pon/grch38/. Set MUTAVAX_PON_GRCH38_VCF="" in .env to disable.
Dog and cat workspaces skip the PON (no curated canine / feline panel exists yet).
Alignment refuses to start with "insufficient memory." Indexing the human reference peaks around 31 GB of RAM. Either free some up, or drop a prebuilt index into ./data/references/ (see the contributors section below).
Stage 5 preflight says a NetMHC binary is missing. You set MUTAVAX_CLASS_I_PREDICTOR=NetMHCpan but didn't drop the tarballs in. Check ls ./data/netmhc/ — should contain netMHCpan-4.2/ and netMHCIIpan-4.3/ as directories, not tarballs.
Stage 5 finishes with zero peptides. Your patient alleles weren't recognized by pvacseq. The Patient MHC panel marks these with a strikethrough + SKIPPED pill. For dog, pvacseq only recognizes a handful of DLA-88 alleles and zero class II alleles.
Annotation complains about missing TSL fields. Rerun stage 4 on the workspace — older annotations predate the --tsl flag and need refreshing.
Clone the repo for source-level work:
git clone https://github.com/niach/mutavax.git
cd mutavax
npm installFrontend: Next.js 15, React 19, TypeScript, Tailwind. Backend: FastAPI + SQLAlchemy, all bioinformatics tools in one Docker image, SQLite under ./data/.
Dev workflow — hot-reload the backend from the cloned source, run the Next.js dev server on the host:
docker compose -f docker-compose.yml -f docker-compose.dev.yml up # backend with --reload on :8000
npm run dev # next dev on :3000Set NEXT_PUBLIC_API_URL=http://localhost:8000 in your .env for this workflow so the browser hits the native uvicorn instead of the same-origin /backend proxy.
Fast tests (lint + TS + backend non-integration):
npm run test:fastBrowser and live real-data paths:
npx playwright install chromium
npm run test:integration
npm run sample-data:smoke
npm run test:backend:real-data
npm run test:browser:real-dataSample datasets for smoke and full validation runs:
npm run sample-data:smoke # COLO829 smoke (~50k read pairs per lane)
npm run sample-data:full # COLO829 full 100x WGS (~174 GB)
npm run sample-data:alignment # BAM/CRAM normalization fixture
python3 scripts/fetch_canine_dlbcl_sample_data.py # canine DLBCL smoke
python3 scripts/fetch_canine_dlbcl_sample_data.py --mode full # full DLBCL1 pair (~45 GB)Regenerate the screenshots in this README (frontend + backend must be running):
# Stages 1–5 need a real completed pipeline run; point the script at that workspace.
node scripts/take-screenshots.mjs <workspace-id>
# Stages 6–8 can be captured from a synthetic demo workspace that skips the heavy
# bioinformatics (inserts minimum DB stubs only — not suitable for any real run).
docker cp scripts/seed_demo_workspace.py mutavax:/tmp/seed.py
WORKSPACE_ID=$(docker exec mutavax python /tmp/seed.py)
node scripts/take-screenshots.mjs --stages=6,7,8 "$WORKSPACE_ID"Alignment compute knobs (chunk size, per-chunk aligner threads, samtools sort memory, parallel chunks) are tunable from the UI's Compute Settings drawer on the alignment stage — no env file edit needed. They persist to ./data/settings.json.
Full list of env overrides lives in .env.example.
mutavax is inspired by Paul Conyngham's 2025 personalized mRNA vaccine for his dog Rosie (mast cell cancer, 75% tumor shrinkage). His pipeline — BWA-MEM2 → Mutect2 → VEP → pVACseq with NetMHCpan — proved the approach works on a single-patient, single-desktop scale. mutavax is an attempt to make that pipeline accessible as a guided workspace, species-flexible by default.
Built on the shoulders of:
- pVACtools (Griffith Lab)
- MHCflurry (openvax) — license-free class-I binding predictor
- NetMHCpan / NetMHCIIpan (DTU Health Tech)
- Ensembl VEP + its pVACseq-ready plugins (Frameshift, Wildtype, Downstream)
- GATK Mutect2 and NVIDIA Parabricks
- strobealign, samtools, pigz








