Pharmahacks_Chemical_Reconstruction

Winning Project of Pharmahacks (2019) sponsored by Boehringer Ingelheim, Novartis, Student Pharma, Invivo AI, and MILA.

The Challenge

To develop an algorithm to determine the minimum number of descriptors required to reverse engineer a chemical structure.

Our Approach

We identifed 17 core chemical structural mordred descriptors that would allow a researcher to effectively deduce the structure of a molecule. Without any of these core structural descriptors, a researcher would not be able to accurately sythesize the structure. The remaining descriptors were identified as non-core morded descriptors.

We used a multiple output linear regression model. The model trains using all of the non-core descriptors to predict the value of the core descriptions. The model is versatile and is able to integrate other core structural descriptors as predictors, should the user choose to do so.

Methods

We converted the data from .sdf files to. csv using the code provided with the data (sample data file and sdf to csv conversion provided in the sdf_to_csv file)
We cleaned the data. We removed any irregularities from it, and extracted only nummberical values.
We extracted the 17 core structural descriptors.
We trained a Multiple Output Regressor to predict the 17 core structural descriptors. Using all non-core Mordred descriptors, the model was able to predict of all these descriptors with 100% 5-fold cross-validation accuracy.
We iteratively thresholded the data, removing the descriptors with the least variance, to find the minimum number of descriptors for the model to accurately predict the 17 core structural descriptors.

Results

Using less than around 50 Mordred desciptors, the model poorly predicts all 17 core mordred descriptors.

Try it out!

To run the model:

Download the mordred compound sets (1 through 3) and place them in a data/ folder
Open the src/config.py folder and change the data_path and results_path.
Run the main function in src/thresholding.py.
To visualize, run results_visualization.ipynb on jupyter notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
img		img
sdf_to_csv		sdf_to_csv
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

sdf_to_csv

sdf_to_csv

src

src

README.md

README.md

Repository files navigation

Pharmahacks_Chemical_Reconstruction

The Challenge

Our Approach

Methods

Results

Try it out!

About

Releases

Packages

Languages

sibamoussa/Pharmahacks_Chemical_Reconstruction

Folders and files

Latest commit

History

Repository files navigation

Pharmahacks_Chemical_Reconstruction

The Challenge

Our Approach

Methods

Results

Try it out!

About

Resources

Stars

Watchers

Forks

Languages