Skip to content

sawsimeon/MLSF-PDL1

Repository files navigation

Structure-based virtual screening for PD-L1 dimerizers is boosted by inactive-enriched machine-learning models exploiting patent data

Python


Logo

Based on Patent Data

Powered by DeepCoy Generator »


Table of Contents
  1. About The Project
  2. Getting Started

About The Project

We hypothesise that applying the latest advances observed in studies based on other targets will lead to highly accurate target-specific MLSFs for PDL1. For instance, a large number of decoys (assumed inactives) in the training set boosts SBVS performance of MLSFs, but this has never been investigated for PDL1. Thus, it is not known if training should be carried out with actives only, or supplementing the latter with experimentally validated inactives, property-matched decoys or random property-unmatched decoys. Likewise, regression-based MLSFs are still to be applied to PDL1 despite the dependent variable to predict, pIC50, being real-valued. This is probably due to the most popular SBVS benchmarks not having, by contrast, employed real-valued potency to evaluate performance, but only sets of actives and decoys with binary classification metrics. As a real-valued variable contains more information than any dichotomised version of that variable, it stands to reason that regression models should perform better than classification models, other things being equal. We will thus evaluate regression models that also exploit the information about the chemical diversity of inactives, which we call inactive-enriched regression-based ML SFs. Another novel aspect of our study is investigating which combinations of featurisation schemes and supervised learning algorithms are most predictive for SBVS on PDL1.

Tools Used

Getting Started

To get a local copy up and running make sure that you have installed Anaconda on your machine. If not check the link of installation: https://docs.anaconda.com/anaconda/install/index.html

OS requirements

This script is supported for Linux. It has been tested on the following systems:

  • Linux: Ubuntu 20.04

1. Create a miniconda environment

  • Create an environment and install all the dependencies with a Python version 3.6
conda env create -f requirement.yml python=3.6

when the installation is done activate the environment

conda activate pdl1_sbvs

2. Clone the repository

git clone https://github.com/sawsimeon/MLSF-PDL1.git

cd MLSF-PDL1

3. Run the Test Set

These test codes require 157 and 163 seconds
python script/DeepCoys.py
python script/True_Inactives.py

Saved Models

Selected SFs, including GRID SVM SF build from training actives + RandomDecoys and also training actives + TrueInactives were saved as pickle files in here. The notebook folder contains jupyter notebooks for obtaining the PR-AUC and EF1% on these two test set TrueInactives and DeepCoys. We have also added these SFs trained on all actives and these same inactives + script to generate features for other docked complexes. This is to be able to use the SFs on other docked molecules. Please see the data folder.

Calculated Descriptors

We have pre-calculated features that were utilized to build target specific machine learning scoring functions. Due to the limited data size allowed on GitHub, we have uploaded our dataset to Zenodo. for public access.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

saw.simeon@inserm.fr or p.ballester@imperial.ac.uk

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published