Fingerprint similarity in MongoDB
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
chemblcmpds.csv.zip
fp.txt.zip
fpload.py
profile.py

README.md

This repository accompanies a post where I describe the fingerprint based similarity searches in MongoDB. The post and code was inspired by a Datablend post who described this approach and my code is pretty much what they described (just packaged in a Python wrapper).

To load the fingerprints and then run the benchmark unzip fp.txt.zip and then run the scripts:

unzip fp.txt.zip
python fpload.py
python profile.py

On completion, the time for each query along with the bit length of the query structure will be in times.txt. This assumes you have a MongoDB instance running on the local machine at the default port

The fingerprints were generated using the CDK and are Signature fingerprints. If you prefer another type, the original SMILES from ChEMBL are available in chemblcmpds.csv.zip and can be used to generate a different set of fingerprints.