LMDB Projects for Materials Property Prediction

A curated handoff repository for LMDB-based materials-property prediction pipelines developed during an AI-for-chemistry internship.

This directory contains three production-oriented training stacks that share the same high-level philosophy:

preprocess expensive structure features once, store them in LMDB, then train reproducibly on local machines or HPC clusters.

Why this repository exists

This repo is designed to help future developers quickly understand, reuse, and extend the project without reverse-engineering multiple codebases from scratch.

It includes:

CGCNN_LMDB for CGCNN-based training on QMOF and ODAC-style data;
MGT_LMDB for a Molecular Graph Transformer pipeline with tri-graph inputs;
PMT_LMDB for a PMTransformer / MOFTransformer-style multimodal pipeline;
results_hub for local checkpoint evaluation, CIF prediction, workflow analytics, and in-browser documentation;
full PDF + TeX technical documentation for each pipeline;
supporting Markdown runbooks for preprocessing, training, scripts, and containers;
research / due-diligence notebooks kept in the root as auxiliary analysis artifacts.

Project map

Project	Core model family	Current data focus	Main entrypoint	Full developer documentation
`CGCNN_LMDB/`	Crystal Graph Convolutional Neural Network	QMOF, ODAC23/OpenDAC-style shard LMDBs, plus additional ASR/HMOF job scripts	`CGCNN_LMDB/code/main.py`	`PDF` · `TeX`
`MGT_LMDB/`	Molecular Graph Transformer + ALIGNN-style message passing	QMOF and ODAC-style LMDB workflows	`MGT_LMDB/code/training.py`	`PDF` · `TeX`
`PMT_LMDB/`	PMTransformer / MOFTransformer wrapper	QMOF-style CIF-tree datasets packed into LMDB	`PMT_LMDB/code/trainer.py`	`PDF` · `TeX`
`results_hub/`	Local Results Hub app	Checkpoint registry, CIF prediction, workflow metric comparison, docs browser	`results_hub/server.py`	`README` · in-app docs

Documentation guide

Best place to start

If you are new to the repository, read in this order:

this root README;
the project-level README inside the target subdirectory;
the corresponding PDF handoff document (or TeX source in docs/tex/);
the operational Markdown guides for preprocessing / training / scripts / containers.
optional: launch Results Hub when you want a UI for checkpoint inference, metrics comparison, or documentation browsing.

Full technical handoff docs

PDF docs:
TeX sources:

Repository layout

LMDB_Projects/
├── CGCNN_LMDB/
├── MGT_LMDB/
├── PMT_LMDB/
├── results_hub/
│   ├── docs/
│   ├── static/
│   ├── server.py
│   └── README.md
├── docs/
│   ├── CGCNN.pdf
│   ├── MGT.pdf
│   ├── PMT.pdf
│   └── tex/
│       ├── CGCNN_LMDB_FULL_DOCUMENTATION.tex
│       ├── MGT_LMDB_FULL_DOCUMENTATION.tex
│       └── PMT_LMDB_FULL_DOCUMENTATION.tex
├── Training_Results/
└── *.ipynb

Quickstart by project

Results Hub

Results Hub is the local browser application for the repository. It lets you:

register or select CGCNN, MGT, and PMT checkpoints;
upload one CIF file or a folder of CIF files and predict the checkpoint target without requiring ground-truth labels;
compare unified training_metrics.csv files across named workflows;
browse the Results Hub docs and the model runbooks from one UI.

Start it from the repository root:

python -m results_hub.server

Then open http://127.0.0.1:8877.

Runtime files created by the app are stored under results_hub/data/ (models/, uploads/, evaluate_runs/, and workflows/). These are local working artifacts, not source files.

Notebooks

The notebooks are intentionally kept in the repository as supporting analysis / due-diligence artifacts.

They are not part of the maintained production pipeline, but they are still useful context for future developers:

CoreMOF_CSD_Modified_Due_Diligence.ipynb
MOSAEC_DB_Full_Due_Diligence.ipynb
odac23_is2r_analysis.ipynb
odac25_ONLY_VALIDATION_SET_is2re_dataset_analysis.ipynb
qmof_analysis.ipynb

Per the final handoff scope, these notebooks were not re-audited here; the maintained focus is the code + Markdown/TeX documentation layers.

Production handoff status

The repository was finalized with attention to:

code/documentation consistency for Markdown + TeX handoff docs;
a local Results Hub UI for evaluation, metrics review, and docs navigation;
developer discoverability from the root of the repo;
low-risk fixes for obvious path / default-value drift.

Recommended workflow for future maintainers

pick one pipeline;
read its TeX document end-to-end once;
follow the Markdown runbooks for preprocessing and training;
use results_hub when you need quick local inference on CIF files or a visual comparison of training metrics;
treat notebooks as supplemental research context, not as the canonical production interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LMDB Projects for Materials Property Prediction

Why this repository exists

Project map

Documentation guide

Best place to start

Full technical handoff docs

Repository layout

Quickstart by project

Results Hub

CGCNN_LMDB

MGT_LMDB

PMT_LMDB

Notebooks

Production handoff status

Recommended workflow for future maintainers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CGCNN_LMDB		CGCNN_LMDB
MGT_LMDB		MGT_LMDB
PMT_LMDB		PMT_LMDB
Training_Results		Training_Results
docs		docs
results_hub		results_hub
.gitignore		.gitignore
CoreMOF_CSD_Modified_Due_Diligence.ipynb		CoreMOF_CSD_Modified_Due_Diligence.ipynb
MOSAEC_DB_Full_Due_Diligence.ipynb		MOSAEC_DB_Full_Due_Diligence.ipynb
README.md		README.md
odac23_is2r_analysis.ipynb		odac23_is2r_analysis.ipynb
odac25_ONLY_VALIDATION_SET_is2re_dataset_analysis.ipynb		odac25_ONLY_VALIDATION_SET_is2re_dataset_analysis.ipynb
qmof_analysis.ipynb		qmof_analysis.ipynb

Folders and files

Latest commit

History

Repository files navigation

LMDB Projects for Materials Property Prediction

Why this repository exists

Project map

Documentation guide

Best place to start

Full technical handoff docs

Repository layout

Quickstart by project

Results Hub

CGCNN_LMDB

MGT_LMDB

PMT_LMDB

Notebooks

Production handoff status

Recommended workflow for future maintainers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages