Repository to help develop SMART
If you want to link to SMART and prepopulate a query from your software, you simply have to encode the string in the URL hash, for example:
The format of the database is a json file, that is a list of records. The following headers are included.
Compound_name - Compound Name Embeddings - 180 dimension embedding SMILES - SMILES Structure MW - exact mass From - indicates the database ID - unique identifier to give the database a pseudo accession. These can be integers or simply uuids, but they must be unique per entry and must not be NULL.
[{'Compound_name': 'micrococcin P1', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'CC=C(NC(=O)c1csc(-c2csc(-c3ccc4c(n3)-c3csc(n3)C(C(C)O)NC(=O)c3csc(n3)C(C(C)C)NC(=O)c3csc(n3)C(=CC)NC(=O)C(C(C)O)NC(=O)c3csc-4n3)n2)n1)C(=O)NCC(C)O', 'MW': 1143.2, 'From': 'Jeol', 'ID': 'v2.1_0'}, {'Compound_name': 'chelerythrine', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'COc1ccc2c(cn+c3c4cc5c(cc4ccc23)OCO5)c1OC', 'MW': 348.1, 'From': 'Jeol', 'ID': 'v2.1_1'},
. . . }]
Although unit tests are automatically run with github actions, to run them yourself, go to test folder and run
nose2 -v
All software is licensed as MIT License.
We acknowledge the use of Advanced Chemistry Development, Inc. for use of their HSQC predictor for calculating data in the SMART tool. The machine learning model here is for Academic Use only.
The search database is based upon open data and is licensed permissively as CC0.