This project provides a framework for organizing electronic structure (and other) calculations on organic molecules into a rational file structure and relational database.
MESS.DB is designed to facilitate large scale screening. Molecules are represented in the database and file structure by their InChIKeys. MESS.DB is intended to be simple, portable, and human-friendly.
Suppose you import the molecule Morphine. Morphine's InChIKey will be calculated (BQJCRHHNABKAKU-KBQPJGBKSA-N) and a directory will be created for it:
molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi <- the molecule in InChI format
BQJCRHHNABKAKU-KBQPJGBKSA-N.log <- a log tracking what has been done to the molecule
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes <- a blank space for notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png <- a 2D representation of the molecule
sources.tsv <- a table of sources for the molecule, including where to buy if the source is commercial
In addition, morphine, along with its SMILES, InChI, IUPAC name, synonyms, and basic properties (like MW, charge, etc.) will be imported to MESS.DB, an SQLite relational database. For the curious, the schema is in db/schema.sql.
Methods (which, as far as MESS is concerned, are plugins that describe how to run a particular calculation) can be run against the database (or a subset on it). If I apply the balloon141 method, which generates 3D structures from smiles strings, a new folder appears in the molecules folder:
molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
balloon141_FROM_import_PATH_2/ <- contains logs and output from running balloon
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi
BQJCRHHNABKAKU-KBQPJGBKSA-N.log
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png
sources.tsv
If balloon generates any new properties that are not in the database, they are added. Now we can use the balloon 3D coordinates to run another calculation, and get:
molecules/B/QJ/CRHHNABKAKU-KBQPJGBKSA-N/
balloon141_FROM_import_PATH_2/ <- contains logs and output from running balloon
pm7_mopac2012_FROM_balloon141_PATH_3/ <- contains logs and output from running mopac
BQJCRHHNABKAKU-KBQPJGBKSA-N.inchi
BQJCRHHNABKAKU-KBQPJGBKSA-N.log
BQJCRHHNABKAKU-KBQPJGBKSA-N.notes
BQJCRHHNABKAKU-KBQPJGBKSA-N.png
sources.tsv
Even though most relevant properties are imported into the database after a run, all output files are retained for your reading and copying pleasure.
MESS.DB scales happily to thousands, if not millions, of molecules.
First, clone the repository and set up an empty database:
git clone git@github.com:vamin/MESS.DB.git
cd messdb
python mess/scripts/setup_db.py
MESS.DB can be run from the messdb directory without installation:
python mess
or
./bin/mess
If you would like to install MESS for all users on a system (expreimental):
python setup.py
MESS.DB works best with Python 2.7+, though it will work with lower versions of Python so long as they have Python 2.7's default modules installed. Open Babel, and it's python module pybel, are also required for most operations.
Modules also have their own dependencies, which you can learn about by running them.
mess import sources/fda
This imports the "FDA-approved drugs" data set into mess.db and the molecules dir.
mess select 'select * from molecule' | mess calculate -m balloon141
Balloon generates 3D structures from smiles.
mess select 'select * from molecule' | mess calculate -m pm7_mopac2012 -pp 2
Run a semiempirical calculation using the output from path 2 (the balloon 3D structures in this case, if you've been following along).
-import from most common molecule formats (smi, inchi, xyz, sdf, etc.)
-rational file structure with graceful duplicate handling
-relational database of all molecules, sources, methods, and properties
-source tracking
-select molecules based on sql queries of their properties
-apply calculations (methods) to any selection
-report generation
-self-integrity checking
-handling of multiple molecular states (e.g cation, anion, triplet, conformers, etc.)
-database backup/restore
-database pruning
Victor Amin, 2013-