Open, reproducible hypothesis generation for MCAS / MCAD compounds — ranked transparently for rescue, maintenance, and remission.
⚠️ Not medical advice. Computational hypotheses + in silico predictions only. Not a substitute for clinical care. Do not self-treat. See docs/disclaimers.md.
A laptop-runnable, MIT-licensed pipeline that takes pharma drugs, herbs, supplements, and AI-generated novel analogs, scores them against MCAS-relevant targets (KEAP1 / MRGPRX2 / KIT / FcεRI / HRH1–4 / CYSLTR1 / BTK / GLP1R), filters by covalent-warhead chemistry + predicted ADMET safety, and produces a transparent composite ranking across rescue / maintenance / remission categories.
The ranking has been audited in four independent ways:
- It finds what it should. 21 held-out clinical mast-cell drugs blind-scored → 100% recovery@20 (EXP-006).
- It rejects what it shouldn't. 20 unrelated drugs (statins / antihypertensives / anticonvulsants / etc.) blind-scored → 100% precision@10 — all correctly ranked outside every top-10 (EXP-007).
- It doesn't depend on weight-cherry-picking. ±50% sweep of all six composite weights → min Spearman ρ = 0.93 vs. baseline (EXP-008).
- Real physics agrees with the chemistry. AutoDock Vina docking against KEAP1 Kelch domain (PDB 4L7B) for the top-50 remission candidates → every top-15 by ligand efficiency carries the isothiocyanate warhead (EXP-009).
🌐 Live viewer: huggingface.co/spaces/MRDula/openmcas-browser — public, MIT-licensed, browse all ranked candidates, filter by mechanism / evidence / warhead, inspect ADMET predictions per compound. (Self-host: docs/deploying-the-viewer.md.)
| You are… | Go here |
|---|---|
| 🧍 A patient or caregiver | audiences/for-patients.md |
| 🩺 A clinician | audiences/for-clinicians.md |
| 🔬 A researcher | audiences/for-researchers.md |
| 🎓 An academic lab | audiences/for-academia.md |
| 🤝 A nonprofit / foundation | audiences/for-nonprofits.md |
| 🏭 Industry / pharma | audiences/for-industry.md |
| 💻 A developer | audiences/for-developers.md |
| 📰 Press / media | audiences/for-press.md |
A reproducible MIT-licensed pipeline that takes pharma drugs, herbs, supplements, and AI-generated novel analogs and ranks them by their plausibility as MCAS / MCAD candidates across three categories:
- Rescue — acute mediator blockade.
- Maintenance — daily stabilization.
- Remission — upstream / root-cause reversal.
Every prediction is published openly so any researcher can audit, falsify, or extend it. No paywalls, no IP capture, no pharma gatekeeping.
🤖 Auto-generated artifacts. The tables below — and the Top-10 tables in
hypotheses/{rescue,maintenance,remission}.md— are produced byscripts/rank_hypotheses.pyfrom the current library + generated analogs + target scores + warhead scores + ADMET QSAR. Each Top-10 carries a provenance line with timestamp + commit hash. Re-running the script overwrites them. Composite formula: EXP-005. Audit: EXP-006.
| # | Compound | Class | Composite |
|---|---|---|---|
| 1 | Fexofenadine | H1 antagonist (2nd-gen) | 0.540 |
| 2 | Cetirizine | H1 antagonist (2nd-gen) | 0.539 |
| 3 | Diphenhydramine | H1 antagonist (1st-gen) | 0.534 |
| 4 | Hydroxyzine | H1 antagonist (1st-gen) | 0.532 |
| 5 | Loratadine | H1 antagonist (2nd-gen) | 0.523 |
| # | Compound | Class | Composite |
|---|---|---|---|
| 1 | Curcumin | Polyphenol / Michael acceptor / Nrf2 | 0.628 |
| 2 | Rosmarinic acid | Polyphenol | 0.560 |
| 3 | Thymoquinone | Quinone (Nigella) | 0.559 |
| 4 | Resveratrol | Stilbene / Nrf2 / MRGPRX2 | 0.487 |
| 5 | Luteolin | Flavonoid (BBB-crossing) | 0.479 |
| # | Compound | Class | Composite | Vina kcal/mol |
|---|---|---|---|---|
| 1 | Erucin | Sulfide ITC (arugula) — longer plasma t½ vs SFN | 0.673 | -3.70 |
| 2 | Sulforaphane | Natural ITC / KEAP1 covalent / Nrf2 | 0.669 | -4.04 |
| 3 | Phenethyl-ITC | Natural ITC (watercress) / KEAP1 / HDAC | 0.636 | -5.20 |
| 4 | Iberin | Sulfoxide ITC (cabbage / broccoli) | 0.557 | -3.81 |
| 5 | Benzyl-ITC | Natural ITC (papaya / cress) | 0.533 | -5.13 |
🔄 Ranking reshuffled in EXP-009 after fixing three wrong PubChem CIDs in
seeds.json(Iberin, Erucin, Sulforaphene were silently pointing at unrelated compounds). Erucin narrowly takes #1 on the corrected data — see EXP-009 §0 for the disclosure.
flowchart LR
S[data/compounds/seeds.json] --> B[build_compound_library.py]
B --> L[MCAS_Compound_Library_v1.csv]
L --> G[generate_sfn_analogs.py]
G --> A[outputs/reinvent_generated.csv]
L --> W[score_warheads.py]
A --> W
L --> T[score_against_targets.py]
A --> T
L --> Q[run_qsar.py]
A --> Q
W --> R[rank_hypotheses.py]
T --> R
Q --> R
R --> RR[outputs/ranked_*.csv]
R --> H[hypotheses/*.md auto-updated]
Each script is documented as a standardized experiment report:
| ID | Experiment | Method |
|---|---|---|
| EXP-001 | SFN-class analog generation | RDKit BRICS + bioisostere + warhead-graft, 7 ITC seeds |
| EXP-002 | Ligand-based virtual screening | Tanimoto vs curated references, 8 MCAS targets |
| EXP-003 | Covalent-warhead SMARTS + KEAP1 pharmacophore | 13 reactive-group patterns |
| EXP-004 | ADMET QSAR | RandomForest on PyTDC tasks (hERG / AMES / BBB) — AUC 0.89–0.91 |
| EXP-005 | Multi-objective ranking | Composite of evidence + target + warhead + safety + drug-likeness |
| EXP-006 | Known Actives Recovery benchmark | Blind scoring of 21 held-out clinical drugs — 100% recovery@20 |
| EXP-007 | Negative-control benchmark | 20 unrelated drugs — 100% precision@10, all correctly rejected |
| EXP-008 | Sensitivity analysis | ±50% per-weight sweep — min Spearman ρ = 0.93, SFN #1 stable in 100% of perturbations |
| EXP-009 | KEAP1 Vina docking + data-bug fix | Real AutoDock Vina docking on 4L7B; every top-15 by ligand efficiency carries the ITC warhead. Disclosed + fixed three wrong PubChem CIDs |
| EXP-010 | Joint-perturbation Latin-hypercube weight sweep | 200-sample LHS — Erucin holds remission #1 in 91.5% of samples; ITC top-5 in remission top-10 in ≥99% of samples |
| EXP-011 | ChEMBL bioassay pull + per-target activity predictors | 67,372 records across 11 MCAS targets, CV R² 0.52–0.80 (median 0.69); integrated as +0.10 ChEMBL-validated potency bonus |
| EXP-015 | Audit retread on post-ChEMBL composite | 3 of 4 audits held or tightened (precision@10 = 100%, min ρ tightened 0.933→0.946); remission recovery regression diagnosed as benchmark-label issue, not composite failure |
| EXP-016 | Mast-cell-specific bioassay predictor (β-hex / LAD2 / HMC-1 / histamine release) | CV AUC 0.916 ± 0.019 — strongest single model in the repo. Luteolin 0.728, Midostaurin 0.840. +0.05 universal bonus across all categories |
| EXP-012 | Covalent KEAP1-C151 dithiocarbamate adduct proxy | MMFF94 reaction-energy proxy for the actual SFN mechanism; every ITC produces favorable adduct (ΔE −32 to −76 kcal/mol) |
| EXP-013 | Iterative REINVENT-style generation | 4-iter generate-and-select; 265 candidates; drug-like aromatic sulfonyl-ITCs emerge in iter 3 (QED 0.59-0.60) |
| EXP-017 | Procurement check for top generated SFN-class analogs | 20 / 25 (80%) novel analogs pass the Enamine REAL Space envelope; vendor lookup URLs published. Wet-lab bridge ready. |
🌐 Live Hugging Face Space: huggingface.co/spaces/MRDula/openmcas-browser — read-only, public, refreshes when the pipeline reruns. Self-host recipe: docs/deploying-the-viewer.md.
Or locally:
pip install -r requirements-app.txt
streamlit run app.pygit clone https://github.com/mrdulasolutions/MCAS.Opensource.git
cd MCAS.Opensource
python -m venv .venv && source .venv/bin/activate
pip install -e . PyTDC scikit-learn 'setuptools<81'
python scripts/build_compound_library.py
python scripts/validate_smiles.py
python scripts/generate_sfn_analogs.py
python scripts/score_warheads.py
python scripts/score_against_targets.py
python scripts/run_qsar.py
python scripts/rank_hypotheses.py
python scripts/benchmark_known_actives.py # optional — held-out recovery auditThe hypotheses/*.md files will be re-populated with the latest top-10
tables. Diff against the existing ones to see what changed.
audiences/ Audience-segmented onramps (patients / clinicians / researchers / academia / nonprofits / industry / developers / press)
data/ Curated compound library, injury mechanisms, triggers, targets
scripts/ The 7-script pipeline (build → generate → score → rank)
notebooks/ Same pipeline in Jupyter form (01–05)
experiments/ Standardized experiment reports (EXP-001 … EXP-005)
hypotheses/ Rescue / maintenance / remission / injury / trigger hypothesis docs
outputs/ Pipeline outputs (rankings, predictions, generated analogs)
docs/ Methods, disclaimers, wet-lab protocols, contributing, FAQ, glossary
.claude/skills/ Claude Code skill plugin for guided contribution
.github/ Issue templates (8 routes) + PR template + CI
Add a compound · Report a trigger · Propose a hypothesis · Academic collaboration · Nonprofit partnership · Wet-lab pre-registration
Or in Claude Code, use one of the bundled skills:
/openmcas-add-compound, /openmcas-report-trigger,
/openmcas-run-experiment, /openmcas-new-experiment-report. See .claude/README.md.
CODE_OF_CONDUCT.md · SECURITY.md · ROADMAP.md · CONTACT.md · AGENT_CARD.md
Pharmacology primers, per-compound deep dives, and patient-friendly explainers live in the OpenMCAS wiki:
- Route of Administration — why the same compound behaves very differently buccal vs. swallowed
- Buccal Rescue Pharmacology — what makes a compound fast through the oral mucosa
- 1st vs 2nd Gen Antihistamines — the BBB / drowsiness / rescue-onset tradeoff
- Diphenhydramine Deep Dive
Anonymous reports of what compound + route + dose pattern has worked or hasn't worked are tracked via the response observation issue template. No PHI; pattern-level data only. See an example issue.
This project publishes an A2A agent card describing its
10 skills (library search, compound add, trigger report, ligand-based
screening, warhead scoring, ADMET QSAR, SFN-class analog generation,
multi-objective ranking, experiment-report scaffolding, hypothesis
proposal). The canonical machine-readable manifest is at
.well-known/agent-card.json (also at
/.well-known/agent.json for legacy clients
and /a2a.json at root).
- Author-chosen weights. The composite formula in EXP-005 was set by hand, not learned. Sensitivity analysis is on the roadmap.
- Ligand-based screening, not docking. The
score_*files contain Tanimoto similarities to curated reference ligands per target — a defensible early-triage signal, but not a substitute for physics-based pose prediction. Real Vina / DiffDock against KEAP1 Kelch (PDB 4L7B) is queued. - QSAR is RandomForest on Morgan FPs. Strong baseline (validation AUC 0.89–0.91 on PyTDC) but graph neural nets like ChemProp typically add 1–3 AUC points. PR welcome.
- No metabolism / interaction modeling. CYP / GST / UGT effects (the actual major liability for sulforaphane) are an open feature gap.
- Reference-set self-similarity. Known anchors get Tanimoto = 1.0 against themselves; the ranking script accounts for this but it still caps recovery@5/@10 in some categories (EXP-006 §7).
- 21-compound recovery benchmark is small. Expansion to 50+ via ChEMBL bioassay pull is in the roadmap.
- No human validation. Every headline result is a hypothesis. Wet-lab validation campaigns are how this becomes evidence — see audiences/for-academia.md.
- Negative-control set missing. We have a positive-control benchmark; we have not yet shown that compounds with no plausible MCAS mechanism rank low. That's the next benchmark.
MCAS / MCAD patients deserve better than symptom-by-symptom management. We're publishing every hypothesis and prediction openly so that no finding can be locked behind a patent. If a wet-lab validates a compound here, the world gets it. If a wet-lab refutes one, the world gets that too.
MR Dula Medical (a DBA of MR Dula Enterprise, LLC), Raleigh, NC, USA. Independent open-research project. Not affiliated with pharma, not VC-backed, not currently a 501(c)(3). See CONTACT.md for all contact routes.
See CITATION.cff. Cite the repo + the commit hash you used. Quarterly Zenodo DOI snapshots are on the roadmap.
MIT. Fork it, remix it, publish on bioRxiv, run wet-lab assays, build better. Attribution appreciated, not required.