Skip to content

gerageragera39/MofTool

Repository files navigation

LMDB Projects for Materials Property Prediction

Python PyTorch LMDB Apptainer Status

A curated handoff repository for LMDB-based materials-property prediction pipelines developed during an AI-for-chemistry internship.

This directory contains three production-oriented training stacks that share the same high-level philosophy:

preprocess expensive structure features once, store them in LMDB, then train reproducibly on local machines or HPC clusters.

Why this repository exists

This repo is designed to help future developers quickly understand, reuse, and extend the project without reverse-engineering multiple codebases from scratch.

It includes:

  • CGCNN_LMDB for CGCNN-based training on QMOF and ODAC-style data;
  • MGT_LMDB for a Molecular Graph Transformer pipeline with tri-graph inputs;
  • PMT_LMDB for a PMTransformer / MOFTransformer-style multimodal pipeline;
  • results_hub for local checkpoint evaluation, CIF prediction, workflow analytics, and in-browser documentation;
  • full PDF + TeX technical documentation for each pipeline;
  • supporting Markdown runbooks for preprocessing, training, scripts, and containers;
  • research / due-diligence notebooks kept in the root as auxiliary analysis artifacts.

Project map

Project Core model family Current data focus Main entrypoint Full developer documentation
CGCNN_LMDB/ Crystal Graph Convolutional Neural Network QMOF, ODAC23/OpenDAC-style shard LMDBs, plus additional ASR/HMOF job scripts CGCNN_LMDB/code/main.py PDF · TeX
MGT_LMDB/ Molecular Graph Transformer + ALIGNN-style message passing QMOF and ODAC-style LMDB workflows MGT_LMDB/code/training.py PDF · TeX
PMT_LMDB/ PMTransformer / MOFTransformer wrapper QMOF-style CIF-tree datasets packed into LMDB PMT_LMDB/code/trainer.py PDF · TeX
results_hub/ Local Results Hub app Checkpoint registry, CIF prediction, workflow metric comparison, docs browser results_hub/server.py README · in-app docs

Documentation guide

Best place to start

If you are new to the repository, read in this order:

  1. this root README;
  2. the project-level README inside the target subdirectory;
  3. the corresponding PDF handoff document (or TeX source in docs/tex/);
  4. the operational Markdown guides for preprocessing / training / scripts / containers.
  5. optional: launch Results Hub when you want a UI for checkpoint inference, metrics comparison, or documentation browsing.

Full technical handoff docs

Repository layout

LMDB_Projects/
├── CGCNN_LMDB/
├── MGT_LMDB/
├── PMT_LMDB/
├── results_hub/
│   ├── docs/
│   ├── static/
│   ├── server.py
│   └── README.md
├── docs/
│   ├── CGCNN.pdf
│   ├── MGT.pdf
│   ├── PMT.pdf
│   └── tex/
│       ├── CGCNN_LMDB_FULL_DOCUMENTATION.tex
│       ├── MGT_LMDB_FULL_DOCUMENTATION.tex
│       └── PMT_LMDB_FULL_DOCUMENTATION.tex
├── Training_Results/
└── *.ipynb

Quickstart by project

Results Hub

Results Hub is the local browser application for the repository. It lets you:

  • register or select CGCNN, MGT, and PMT checkpoints;
  • upload one CIF file or a folder of CIF files and predict the checkpoint target without requiring ground-truth labels;
  • compare unified training_metrics.csv files across named workflows;
  • browse the Results Hub docs and the model runbooks from one UI.

Start it from the repository root:

python -m results_hub.server

Then open http://127.0.0.1:8877.

Runtime files created by the app are stored under results_hub/data/ (models/, uploads/, evaluate_runs/, and workflows/). These are local working artifacts, not source files.

CGCNN_LMDB

MGT_LMDB

PMT_LMDB

Notebooks

The notebooks are intentionally kept in the repository as supporting analysis / due-diligence artifacts.

They are not part of the maintained production pipeline, but they are still useful context for future developers:

  • CoreMOF_CSD_Modified_Due_Diligence.ipynb
  • MOSAEC_DB_Full_Due_Diligence.ipynb
  • odac23_is2r_analysis.ipynb
  • odac25_ONLY_VALIDATION_SET_is2re_dataset_analysis.ipynb
  • qmof_analysis.ipynb

Per the final handoff scope, these notebooks were not re-audited here; the maintained focus is the code + Markdown/TeX documentation layers.

Production handoff status

The repository was finalized with attention to:

  • code/documentation consistency for Markdown + TeX handoff docs;
  • a local Results Hub UI for evaluation, metrics review, and docs navigation;
  • developer discoverability from the root of the repo;
  • low-risk fixes for obvious path / default-value drift.

Recommended workflow for future maintainers

  1. pick one pipeline;
  2. read its TeX document end-to-end once;
  3. follow the Markdown runbooks for preprocessing and training;
  4. use results_hub when you need quick local inference on CIF files or a visual comparison of training metrics;
  5. treat notebooks as supplemental research context, not as the canonical production interface.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors