Skip to content

mrdulasolutions/MCAS.Opensource

Repository files navigation

OpenMCAS

Open, reproducible hypothesis generation for MCAS / MCAD compounds — ranked transparently for rescue, maintenance, and remission.

License: MIT Validate SMILES Sync HF Space Experiments Mast-cell predictor AUC ChEMBL records Compounds Generated analogs Recovery@20 Negative precision@10 Min Spearman ρ KEAP1 Vina Live viewer A2A Agent Card

⚠️ Not medical advice. Computational hypotheses + in silico predictions only. Not a substitute for clinical care. Do not self-treat. See docs/disclaimers.md.

What this is, in one paragraph

A laptop-runnable, MIT-licensed pipeline that takes pharma drugs, herbs, supplements, and AI-generated novel analogs, scores them against MCAS-relevant targets (KEAP1 / MRGPRX2 / KIT / FcεRI / HRH1–4 / CYSLTR1 / BTK / GLP1R), filters by covalent-warhead chemistry + predicted ADMET safety, and produces a transparent composite ranking across rescue / maintenance / remission categories.

The ranking has been audited in four independent ways:

  1. It finds what it should. 21 held-out clinical mast-cell drugs blind-scored → 100% recovery@20 (EXP-006).
  2. It rejects what it shouldn't. 20 unrelated drugs (statins / antihypertensives / anticonvulsants / etc.) blind-scored → 100% precision@10 — all correctly ranked outside every top-10 (EXP-007).
  3. It doesn't depend on weight-cherry-picking. ±50% sweep of all six composite weights → min Spearman ρ = 0.93 vs. baseline (EXP-008).
  4. Real physics agrees with the chemistry. AutoDock Vina docking against KEAP1 Kelch domain (PDB 4L7B) for the top-50 remission candidates → every top-15 by ligand efficiency carries the isothiocyanate warhead (EXP-009).

Try it in your browser (no clone, no install)

🌐 Live viewer: huggingface.co/spaces/MRDula/openmcas-browser — public, MIT-licensed, browse all ranked candidates, filter by mechanism / evidence / warhead, inspect ADMET predictions per compound. (Self-host: docs/deploying-the-viewer.md.)


🧭 Start here — pick your door

You are… Go here
🧍 A patient or caregiver audiences/for-patients.md
🩺 A clinician audiences/for-clinicians.md
🔬 A researcher audiences/for-researchers.md
🎓 An academic lab audiences/for-academia.md
🤝 A nonprofit / foundation audiences/for-nonprofits.md
🏭 Industry / pharma audiences/for-industry.md
💻 A developer audiences/for-developers.md
📰 Press / media audiences/for-press.md

Or read the FAQ and glossary.


What this is

A reproducible MIT-licensed pipeline that takes pharma drugs, herbs, supplements, and AI-generated novel analogs and ranks them by their plausibility as MCAS / MCAD candidates across three categories:

  • Rescue — acute mediator blockade.
  • Maintenance — daily stabilization.
  • Remission — upstream / root-cause reversal.

Every prediction is published openly so any researcher can audit, falsify, or extend it. No paywalls, no IP capture, no pharma gatekeeping.


Live results

🤖 Auto-generated artifacts. The tables below — and the Top-10 tables in hypotheses/{rescue,maintenance,remission}.md — are produced by scripts/rank_hypotheses.py from the current library + generated analogs + target scores + warhead scores + ADMET QSAR. Each Top-10 carries a provenance line with timestamp + commit hash. Re-running the script overwrites them. Composite formula: EXP-005. Audit: EXP-006.

🔴 Rescue top 5

# Compound Class Composite
1 Fexofenadine H1 antagonist (2nd-gen) 0.540
2 Cetirizine H1 antagonist (2nd-gen) 0.539
3 Diphenhydramine H1 antagonist (1st-gen) 0.534
4 Hydroxyzine H1 antagonist (1st-gen) 0.532
5 Loratadine H1 antagonist (2nd-gen) 0.523

Full ranking →

🟡 Maintenance top 5

# Compound Class Composite
1 Curcumin Polyphenol / Michael acceptor / Nrf2 0.628
2 Rosmarinic acid Polyphenol 0.560
3 Thymoquinone Quinone (Nigella) 0.559
4 Resveratrol Stilbene / Nrf2 / MRGPRX2 0.487
5 Luteolin Flavonoid (BBB-crossing) 0.479

Full ranking →

🟢 Remission top 5 (post-EXP-009)

# Compound Class Composite Vina kcal/mol
1 Erucin Sulfide ITC (arugula) — longer plasma t½ vs SFN 0.673 -3.70
2 Sulforaphane Natural ITC / KEAP1 covalent / Nrf2 0.669 -4.04
3 Phenethyl-ITC Natural ITC (watercress) / KEAP1 / HDAC 0.636 -5.20
4 Iberin Sulfoxide ITC (cabbage / broccoli) 0.557 -3.81
5 Benzyl-ITC Natural ITC (papaya / cress) 0.533 -5.13

🔄 Ranking reshuffled in EXP-009 after fixing three wrong PubChem CIDs in seeds.json (Iberin, Erucin, Sulforaphene were silently pointing at unrelated compounds). Erucin narrowly takes #1 on the corrected data — see EXP-009 §0 for the disclosure.

Full ranking →


How it works

flowchart LR
  S[data/compounds/seeds.json] --> B[build_compound_library.py]
  B --> L[MCAS_Compound_Library_v1.csv]
  L --> G[generate_sfn_analogs.py]
  G --> A[outputs/reinvent_generated.csv]
  L --> W[score_warheads.py]
  A --> W
  L --> T[score_against_targets.py]
  A --> T
  L --> Q[run_qsar.py]
  A --> Q
  W --> R[rank_hypotheses.py]
  T --> R
  Q --> R
  R --> RR[outputs/ranked_*.csv]
  R --> H[hypotheses/*.md auto-updated]
Loading

Each script is documented as a standardized experiment report:

ID Experiment Method
EXP-001 SFN-class analog generation RDKit BRICS + bioisostere + warhead-graft, 7 ITC seeds
EXP-002 Ligand-based virtual screening Tanimoto vs curated references, 8 MCAS targets
EXP-003 Covalent-warhead SMARTS + KEAP1 pharmacophore 13 reactive-group patterns
EXP-004 ADMET QSAR RandomForest on PyTDC tasks (hERG / AMES / BBB) — AUC 0.89–0.91
EXP-005 Multi-objective ranking Composite of evidence + target + warhead + safety + drug-likeness
EXP-006 Known Actives Recovery benchmark Blind scoring of 21 held-out clinical drugs — 100% recovery@20
EXP-007 Negative-control benchmark 20 unrelated drugs — 100% precision@10, all correctly rejected
EXP-008 Sensitivity analysis ±50% per-weight sweep — min Spearman ρ = 0.93, SFN #1 stable in 100% of perturbations
EXP-009 KEAP1 Vina docking + data-bug fix Real AutoDock Vina docking on 4L7B; every top-15 by ligand efficiency carries the ITC warhead. Disclosed + fixed three wrong PubChem CIDs
EXP-010 Joint-perturbation Latin-hypercube weight sweep 200-sample LHS — Erucin holds remission #1 in 91.5% of samples; ITC top-5 in remission top-10 in ≥99% of samples
EXP-011 ChEMBL bioassay pull + per-target activity predictors 67,372 records across 11 MCAS targets, CV R² 0.52–0.80 (median 0.69); integrated as +0.10 ChEMBL-validated potency bonus
EXP-015 Audit retread on post-ChEMBL composite 3 of 4 audits held or tightened (precision@10 = 100%, min ρ tightened 0.933→0.946); remission recovery regression diagnosed as benchmark-label issue, not composite failure
EXP-016 Mast-cell-specific bioassay predictor (β-hex / LAD2 / HMC-1 / histamine release) CV AUC 0.916 ± 0.019 — strongest single model in the repo. Luteolin 0.728, Midostaurin 0.840. +0.05 universal bonus across all categories
EXP-012 Covalent KEAP1-C151 dithiocarbamate adduct proxy MMFF94 reaction-energy proxy for the actual SFN mechanism; every ITC produces favorable adduct (ΔE −32 to −76 kcal/mol)
EXP-013 Iterative REINVENT-style generation 4-iter generate-and-select; 265 candidates; drug-like aromatic sulfonyl-ITCs emerge in iter 3 (QED 0.59-0.60)
EXP-017 Procurement check for top generated SFN-class analogs 20 / 25 (80%) novel analogs pass the Enamine REAL Space envelope; vendor lookup URLs published. Wet-lab bridge ready.

Just browse the results (no install)

🌐 Live Hugging Face Space: huggingface.co/spaces/MRDula/openmcas-browser — read-only, public, refreshes when the pipeline reruns. Self-host recipe: docs/deploying-the-viewer.md.

Or locally:

pip install -r requirements-app.txt
streamlit run app.py

Reproduce the whole thing in 3 minutes

git clone https://github.com/mrdulasolutions/MCAS.Opensource.git
cd MCAS.Opensource
python -m venv .venv && source .venv/bin/activate
pip install -e . PyTDC scikit-learn 'setuptools<81'

python scripts/build_compound_library.py
python scripts/validate_smiles.py
python scripts/generate_sfn_analogs.py
python scripts/score_warheads.py
python scripts/score_against_targets.py
python scripts/run_qsar.py
python scripts/rank_hypotheses.py
python scripts/benchmark_known_actives.py    # optional — held-out recovery audit

The hypotheses/*.md files will be re-populated with the latest top-10 tables. Diff against the existing ones to see what changed.


Repo map

audiences/              Audience-segmented onramps (patients / clinicians / researchers / academia / nonprofits / industry / developers / press)
data/                   Curated compound library, injury mechanisms, triggers, targets
scripts/                The 7-script pipeline (build → generate → score → rank)
notebooks/              Same pipeline in Jupyter form (01–05)
experiments/            Standardized experiment reports (EXP-001 … EXP-005)
hypotheses/             Rescue / maintenance / remission / injury / trigger hypothesis docs
outputs/                Pipeline outputs (rankings, predictions, generated analogs)
docs/                   Methods, disclaimers, wet-lab protocols, contributing, FAQ, glossary
.claude/skills/         Claude Code skill plugin for guided contribution
.github/                Issue templates (8 routes) + PR template + CI

Contribute

Add a compound · Report a trigger · Propose a hypothesis · Academic collaboration · Nonprofit partnership · Wet-lab pre-registration

Or in Claude Code, use one of the bundled skills: /openmcas-add-compound, /openmcas-report-trigger, /openmcas-run-experiment, /openmcas-new-experiment-report. See .claude/README.md.

Code of Conduct + governance

CODE_OF_CONDUCT.md · SECURITY.md · ROADMAP.md · CONTACT.md · AGENT_CARD.md

📖 Wiki

Pharmacology primers, per-compound deep dives, and patient-friendly explainers live in the OpenMCAS wiki:

📣 Patient-reported response observations

Anonymous reports of what compound + route + dose pattern has worked or hasn't worked are tracked via the response observation issue template. No PHI; pattern-level data only. See an example issue.

Agent2Agent (A2A) protocol

This project publishes an A2A agent card describing its 10 skills (library search, compound add, trigger report, ligand-based screening, warhead scoring, ADMET QSAR, SFN-class analog generation, multi-objective ranking, experiment-report scaffolding, hypothesis proposal). The canonical machine-readable manifest is at .well-known/agent-card.json (also at /.well-known/agent.json for legacy clients and /a2a.json at root).


Limitations (read before citing)

  • Author-chosen weights. The composite formula in EXP-005 was set by hand, not learned. Sensitivity analysis is on the roadmap.
  • Ligand-based screening, not docking. The score_* files contain Tanimoto similarities to curated reference ligands per target — a defensible early-triage signal, but not a substitute for physics-based pose prediction. Real Vina / DiffDock against KEAP1 Kelch (PDB 4L7B) is queued.
  • QSAR is RandomForest on Morgan FPs. Strong baseline (validation AUC 0.89–0.91 on PyTDC) but graph neural nets like ChemProp typically add 1–3 AUC points. PR welcome.
  • No metabolism / interaction modeling. CYP / GST / UGT effects (the actual major liability for sulforaphane) are an open feature gap.
  • Reference-set self-similarity. Known anchors get Tanimoto = 1.0 against themselves; the ranking script accounts for this but it still caps recovery@5/@10 in some categories (EXP-006 §7).
  • 21-compound recovery benchmark is small. Expansion to 50+ via ChEMBL bioassay pull is in the roadmap.
  • No human validation. Every headline result is a hypothesis. Wet-lab validation campaigns are how this becomes evidence — see audiences/for-academia.md.
  • Negative-control set missing. We have a positive-control benchmark; we have not yet shown that compounds with no plausible MCAS mechanism rank low. That's the next benchmark.

Why this exists

MCAS / MCAD patients deserve better than symptom-by-symptom management. We're publishing every hypothesis and prediction openly so that no finding can be locked behind a patent. If a wet-lab validates a compound here, the world gets it. If a wet-lab refutes one, the world gets that too.

Who's behind it

MR Dula Medical (a DBA of MR Dula Enterprise, LLC), Raleigh, NC, USA. Independent open-research project. Not affiliated with pharma, not VC-backed, not currently a 501(c)(3). See CONTACT.md for all contact routes.

Cite

See CITATION.cff. Cite the repo + the commit hash you used. Quarterly Zenodo DOI snapshots are on the roadmap.

License

MIT. Fork it, remix it, publish on bioRxiv, run wet-lab assays, build better. Attribution appreciated, not required.

About

Open, MIT-licensed hypothesis-generation engine for MCAS/MCAD rescue, maintenance, and remission compounds. Pharma + herbs + supplements + AI-generated analogs, ranked transparently.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors