CNN-Based Compound Fingerprint Prediction for Metabolite Annotation
Before running the program, you need to have the PyFingerprint, PubChemPy 1.0.4, Open Babel 3.1.1, tabulate 0.8.9 and Tensorflow installed.
And also, please download the latest CNN model -- MetFID_CNN_40088_5618.h5 and put it in the root directory of MetFID.
This program works with Python 3.5+.
In the folder _files
, there are two required files. The first one is testing_compound.txt
, which is a list of spectrum data; The second file is inchikey_list.txt
, which contains lists of InChIKeys.
Here is an example for testing_compound.txt
and inchikey_list.txt
.
testing_compound.txt
The first row represents the precursor mass and ionization mode, followed by intensity pairs.
#1
389.1626 positive
58.0653 0.1393717711334875
70.0652 0.5051454118468733
72.0808 7.5707678873637185
84.0682 0.25425091544424155
129.0701 0.21914480352209237
130.0732 0.1374693989440672
145.0647 0.13117470906587816
165.0698 9.97754058994659
166.0777 29.57116152422922
183.0805 1.83382519459832
187.1077 1.3620057326501762
193.0761 1.1618155468334308
199.031 0.2417147860532003
201.0465 100
389.1626 17.431109736689535
#2
255.0299 negative
171.0449 1.405280299496928
183.0453 1.869843966679493
227.035 20.56615845658887
255.03 100
inchikey_list.txt
This file contains the InChIKey list. The block followed by #digit
are the lists of InChIKeys that user suspect as the true compounds.
#1
CECDPVOEINSAQG-UHFFFAOYSA-N
SGUAFYQXFOLMHL-UHFFFAOYSA-N
XSJLXZULZZXNJI-UHFFFAOYSA-N
ZKLPARSLTMPFCP-UHFFFAOYSA-N
HEWDOWUUTBCVJP-UHFFFAOYSA-N
#2
VILFVXYKHXVYAB-UHFFFAOYSA-N
XXOYUNQCBYNWNL-UHFFFAOYSA-N
JLZANJDGGAPGOF-UHFFFAOYSA-N
BBNQQADTFFCFGB-UHFFFAOYSA-N
RFHAOTPXVQNOHP-UHFFFAOYSA-N
Notice that, each list is separated by #digit
. There will be an error pop up if the counter does not match in both files.
- For Ressom Lab users
Before running the program, please uncomment line 7 and comment line 10 inmain.py
, then run:
$ python3 main.py
- For public users
After downloading the code, navigating to MetFID folder, then run:
$ python3 main.py
There will be an output file testing_compound_prediction.txt
be created in the folder _files
. Using the example above, we will get:
#1
Inchikey Score
--------------------------- --------
ZKLPARSLTMPFCP-UHFFFAOYSA-N 0.742547
SGUAFYQXFOLMHL-UHFFFAOYSA-N 0.381818
CECDPVOEINSAQG-UHFFFAOYSA-N 0.290323
XSJLXZULZZXNJI-UHFFFAOYSA-N 0.268844
HEWDOWUUTBCVJP-UHFFFAOYSA-N 0.201946
#2
Inchikey Score
--------------------------- --------
BBNQQADTFFCFGB-UHFFFAOYSA-N 0.586826
JLZANJDGGAPGOF-UHFFFAOYSA-N 0.373913
RFHAOTPXVQNOHP-UHFFFAOYSA-N 0.296629
VILFVXYKHXVYAB-UHFFFAOYSA-N 0.290419
XXOYUNQCBYNWNL-UHFFFAOYSA-N 0.167984
The first column represents the InChIKeys
, and the second column represents the Tanimoto similarity score
. Each table will be ranked in a descending order by score.