Skip to content

t-0hmura/mlmm_toolkit

Repository files navigation

ML/MM toolkit — Towards Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

Overview of ML/MM toolkit

mlmm-toolkit is an open-source CLI for ML/MM ONIOM analyses of enzymatic reactions. It replaces the QM region of conventional QM/MM with a machine-learning interatomic potential (MLIP, default: UMA) while keeping the surrounding protein under an analytical Amber force field (hessian_ff), and chains MM parametrization → ML-region selection → MEP search → TS optimization → IRC → thermochemical correction → DFT single-point in one command. A link-atom boundary handles amino-acid residues straddling the ML/MM cut, and a microiteration scheme makes TS optimization and Hessian-based methods tractable on ~10 000-atom systems.

Test a reaction mechanism in a single command:

# Multi-structure MEP (R + P endpoints → MEP, with TS optimization + thermo)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo

For scan-mode on a single structure and the bundled methyltransferase walk-through, see examples/. Each stage is also exposed as an individual subcommand.

Prerequisites: input PDBs must already contain hydrogens; multiple PDBs must share the same atoms in the same order (only coordinates differ). mlmm all runs mm-parm automatically. Match -l RES:CHARGE to the H count actually present (e.g. SAM with 23 H = SAM:1 cation, 22 H = SAM:0 neutral) — full input-prep checklist in docs/getting-started.md.

Related tools

Tool Use case
pdb2reaction Pure-MLIP reaction paths for cluster models and small molecules from PDB / XYZ / GJF.
uma_pysis Lightweight YAML-driven UMA–pysisyphus interface for quick/exploratory reaction-mechanism studies (GS / TS / IRC / ΔG).

mlmm-toolkit bundles a GPU-optimized pysisyphus fork that is not compatible with upstream pysisyphus — do not install it into an environment that already has upstream pysisyphus.

Documentation

System requirements

Component Requirement
OS / Python Linux recommended; native Windows unsupported (AmberTools/tleap unavailable). Python >= 3.11.
GPU / CUDA / VRAM NVIDIA GPU, CUDA >= 12.6 (12.8+ recommended; required for RTX 50-series). 8 GB+ VRAM recommended.
RAM / Disk 32 GB+ RAM recommended; 20 GB free disk for the conda env, AmberTools, UMA cache, and artifacts.

Required external tools: AmberTools (tleap) and pdbfixerconda install -c conda-forge ambertools pdbfixer -y. CPU-only execution works for setup commands but is 10–100× slower for any ML/MM dynamics or Hessian step. Full requirement and tuning details: docs/getting-started.md#installation.

Installation

# 1. New env + AmberTools + CUDA-enabled PyTorch
conda create -n mlmm-toolkit python=3.11 -y && conda activate mlmm-toolkit
conda install -c conda-forge ambertools pdbfixer -y
pip install torch --index-url https://download.pytorch.org/whl/cu129

# 2. mlmm-toolkit (editable from a local clone, or `pip install mlmm-toolkit`)
pip install -e .

# 3. Authenticate Hugging Face once (only required for the default UMA backend)
#    Accept the FAIR Chemistry License v1 at https://huggingface.co/facebook/UMA, then:
hf auth login                               # interactive
# OR: export HF_TOKEN=hf_xxx && hf auth login --token "$HF_TOKEN"   # CI / HPC

Avoid AmberTools conflicts: on clusters with a system AmberTools module loaded, run module unload amber before installing to prevent a ParmEd conflict with the conda-installed AmberTools.

Optional extras (install only what you need):

Extra Adds
[orb] / [aimnet] Orb / AIMNet2 MLIP backend — not HF-gated
[dft] PySCF + GPU4PySCF single-point DFT (--dft / mlmm dft) — practical to ~500 ML-region atoms
[mcp] Model Context Protocol server (mlmm-mcp) for agent clients
[pdbfixer] PDBFixer extra (alternative to the conda install above)

The MACE backend (-b mace) is not a pip extra: mace-torch pins e3nn==0.4.4, which conflicts with fairchem-core's e3nn>=0.5 (UMA), so it needs a dedicated environment — pip uninstall -y fairchem-core && pip install mace-torch (see docs/getting-started.md#installation).

CUDA module-load recipes, alternative-backend installs, DMF / cyipopt, Plotly Chromium, and HPC job-script templates: docs/getting-started.md and docs/device-hpc.md.

Preparing an Enzyme-Substrate System

For most systems the only hard requirement is a PDB with explicit hydrogens (at the intended protonation state). mlmm all then builds the MM topology, selects the ML region, and runs the whole pipeline in one command — see Quick Examples for the three input modes (multi-structure R → P, single-structure scan, TS-only). The preparation steps below are optional.

  1. Build a structural model of the complex. Download coordinates from the Protein Data Bank. If an experimental structure is not available, use structure-prediction programs such as AlphaFold3, Boltz2, or Chai; docking programs; or GUI software such as PyMOL. Add hydrogens at the intended protonation state (or let mm-parm --add-h --ph 7 add them). For multi-structure (R → P) runs, every PDB must share the same atoms in the same order.

  2. (Optional) Build the MM topology yourself — it is automatic by default. mlmm all (via mlmm mm-parm) generates the Amber .parm7 / .rst7 from the PDB automatically; unknown residues (ligands, cofactors) are parameterized with GAFF2 / AM1-BCC — pass formal charges with -l 'RES:CHARGE'. Build the topology by hand when it helps — a custom force field, special solvation, or a system the automatic route cannot handle — then pass it with --parm. To mimic aqueous conditions, solvate the complex and remove water molecules beyond ~6 Å (see the OpenMM cookbook / tleap).

    Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use mlmm add-elem-info to fix this.

  3. (Optional) Define the ML region yourself. mlmm all extracts the ML region from -c/--center and -r/--radius automatically. To define it yourself instead, build an ML-region PDB — with mlmm extract or any molecular viewer — and feed it to mlmm all (or the per-stage subcommands) with --model-pdb; this skips the automatic extraction:

    mlmm extract -i complex.pdb -c 'SAM,GPP' -r 6.0 -l 'SAM:1,GPP:-3' -o ml_region.pdb

    Important: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)

Quick Examples

# Multi-structure MEP (R + P → MEP, with TS + thermo + DFT)
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' --tsopt --thermo --dft

# Scan mode (single structure → staged bond scans → MEP)
mlmm all -i R.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
    --scan-lists "[('SAM 359 CS1','GPP 360 C8',1.3)]"

# TS-only validation (existing TS candidate)
mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 -q 1 --opt-mode grad

For Gaussian-ONIOM / ORCA-QM/MM round-trips use oniom-export / oniom-import. Per-stage walkthrough (mm-parmextractdefine-layeroptpath-searchtsoptfreqircdft): docs/getting-started.md and docs/quickstart-all.md. Working scripts (methyltransferase + toy_system): examples/.

Output

A run writes its deliverables to --out-dir (default ./result_all/):

  • segments/seg_NN/{reactant,ts,product}.pdb — the canonical R / TS / P structures to cite
  • mep.pdb / mep_trj.xyz — the merged reaction path; energy_diagram_MEP.png — barrier diagram
  • summary.log (human-readable) / summary.json (machine-readable)
  • Reusable inputs for follow-up runs: ml_region.pdb (--model-pdb), mm_parm/*.parm7 (--parm), layered/ (B-factor-annotated full-system PDBs)

Pipeline scratch lives under _work/ (safe to delete). Full layout and filename conventions: docs/output-layout.md.

CLI Subcommands

Subcommand Role Doc
all (default) End-to-end: mm-parm → extract → MEP → TS → IRC → freq → DFT all
mm-parm Generate parm7/rst7 via AmberTools mm-parm
extract Extract active-site pocket extract
define-layer Assign 3-layer ML/MM B-factor encoding define-layer
add-elem-info / fix-altloc Repair PDB element columns / resolve altlocs add-elem-info · fix-altloc
opt / tsopt Geometry / TS optimization opt · tsopt
path-opt / path-search MEP via GSM/DMF; recursive refinement path-opt · path-search
scan / scan2d / scan3d 1D / 2D / 3D bond-distance scans scan · scan2d · scan3d
freq / irc Vibrational analysis + thermo / IRC (EulerPC) freq · irc
dft / sp Single-point DFT / single-point ML/MM ONIOM dft · sp
bond-summary Compare structures, report bond changes bond-summary
trj2fig / energy-diagram Energy plot / R→TS→P diagram trj2fig · energy-diagram
oniom-export / oniom-import Gaussian ONIOM / ORCA QM/MM round-trip oniom-export · oniom-import

3-layer system (ML / Movable-MM / Frozen-MM, B-factor encoded), link-atom treatment, units (eV·Å in core / Ha·Bohr in pysisyphus CLI): docs/concepts.md. Python API (MLMMCore, MLMMASECalculator, pysisyphus mlmm calculator): docs/python-api.md.

Getting Help

mlmm --help                       # top-level
mlmm <subcmd> --help              # core options
mlmm <subcmd> --help-advanced     # full option set

Issues: https://github.com/t-0hmura/mlmm_toolkit/issues.

Citation

@article{ohmura2025mlmm,
  author = {Ohmura, Takuto and Inoue, Sei and Terada, Tohru},
  title  = {ML/MM Toolkit -- Towards Accelerated Mechanistic Investigation of Enzymatic Reactions},
  year   = {2025}, journal = {ChemRxiv}, doi = {10.26434/chemrxiv-2025-jft1k}
}

Agent Skills

Agent Skills for Claude Code / Codex / Cursor etc. in skills/ — copy into your project's skill location (e.g. .claude/skills/) to let an agent drive mlmm-toolkit workflows and subcommands.

Known limitations

  • MACE + UMA cannot coexist (e3nn version conflict). Use separate conda envs.
  • DFT single-point is practical to ~500 ML-region atoms; larger regions incur high computational cost.
  • ORB backend sometimes converges TS with extra soft imaginary modes — prefer UMA / MACE for clean single-saddle spectra.
  • CPU-only execution is 10–100× slower than GPU; AmberTools (tleap) is required for mm-parm.

Contributing

Issues and pull requests are welcome — see CONTRIBUTING.md.

License

GNU General Public License v3 (GPL-3.0).

About

Machine‑Learning / Molecular‑Mechanics (ML/MM) hybrid calculator and CLI toolset for Mechanistic Investigation of Enzymatic Reactions.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages