The tool chain to create colored maps for chemical spaces using incremental PCA and binning.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.vscode
pca_service
.gitignore
README.md
autoencoder.py
check_file.py
checkascii.sh
create_bins.py
create_fp_bins.py
create_map.py
dates2coords.py
doitall.sh
incremental_pca.py
index_file.py
info_from_model.py
limits.sh
logo.png
test.py

README.md

PCPBIM

If you use this code or application, please cite the original paper published by Bioinformatics: 10.1093/bioinformatics/btx760

The PCPBIM (Preparation - Cleanup - PCA - Binning - Info - Mapping) Toolchain

This collection of utility scripts creates files which can be served by the Underdark Go web service which is part of the FUn Framework.

For a complete overview and detailed installation instructions for this project, please visit the project website.

Dependencies

  • Python 3
    • numpy
    • pandas
    • scipy
    • sklearn

Getting Started

It is important to have all input files in a correct file format which is a plain-text file containing information on one molecule per line, structured as follows

c1ccccc1 Benzene 1;0;0;1;0;1;1;1;1;1;0 1.25;6
C1CCCC1 Cyclopentan 1;0;0;1;1;1;1;1;1;0;1 0.75;5

Where each line contains the SMILES, an arbitrary label, a fingerprint vector, and any number of numerical properties for which the colour maps will be generated. The default delimiters are whitespace and ;, both can be changed by modifying the script doitall.sh.

To generate the files for Underdark Go, make sure the files are in the correct format and all dependencies are met and clone this repository

git clone https://github.com/reymond-group/pca.git

Next, make sure the bash script is executable

chmod +x doitall.sh

Finally, run the bash script which will in turn run the necessary python scripts

./doitall.sh inputFile databaseName fingerprintName n

where inputFile is a plain-text file formatted according to the information provided above, databaseName and fingerprintName are arbitrary names chosen for the database and the fingerprint respectively. n is an integer setting the resolution of the final cubic map. It is good practice to provide low (n <= 250) and high (n >= 500) resolution versions of each map. While most maps are probably sparse and do not approach the maximum number of rendered bins n3, these numbers might have to be lowered for densly populated maps.

Example

./doitall.sh my-awesome-data.txt ACMEbase Xfp 250

PCA Service

PCA models generated by the above process can be exposed via web services to process additional fingerprints in order to project them onto the database space. Faerun can make use of this service to directly project data points on the currently loaded database. In order to enable this functionality please set the pcaUrl option in the Faerun configuration file to the address of this service. See here how to change Faeruns configuration.

The easiest way to run the PCA service is running the docker container for the service

docker run -d -p 80:80 -v /your/host/dir:/models --name pcaservice daenuprobst/planes

where /your/host/dir contains the models you wish to provide via the service. Model files are generated by the above mentioned script doitall.sh and are named as databaseName.fingerprintName.3.pkl.

Usage

Once the service is up and running, you are able to post json data to it, the message content should have the following format

{
    "database": "databaseName",
    "fingerprint": "fingerprintName",
    "dimensions": 3,
    "binning": true,
    "resolution": 250,
    "data": [
        [2,2,4,5,3,0,0,0,2,0,0,0,0,19,14,0,0,0,0,0,0,1,1,0,3,5,2,1,1,6,4,0,2,0,0,6,5,0,8,1,0,1],
        [0,0,3,4,11,0,0,0,1,0,0,0,0,22,16,1,0,0,0,0,0,0,1,0,4,5,7,4,0,5,1,0,0,0,0,3,3,0,15,1,0,0],
        ...
    ]
}

on success, the service will return a message containing the x,y,z coordinates of the fingerprints that were submitted

{
    "success": true,
    "database": "surechembl",
    "fingerprint": "mqn",
    "dimensions": 3,
    "data": [
        [153.12, -23.35653, 27.12],
        [282.162, 35.47863, -2.64],
        ...
    ]
}

on error the following message is returned

{
    "success": false,
    "error": 'Oops! Something went wrong.'
}

an example can be found here.