# Tutorial on QSAR modeling
Febrary 1, 2023

### Mgr. Alina Kutlushina
Ph.D. condidate at IMTM, UPOL, Czech Republic

Contact email:
alina.meddwl@gmail.com 

### The necessary prerequisites for the tutorial

**Tools:** chembl_webresource_client, rdkit, scikit-learn, scipy, pmapper, matplotlib, seaborn

**Target protein:** Cytochrome P450 3A4

**Dataset:** small molecules with documented activity towards the Cytochrome P450 3A4 enzyme, sourced from the ChEMBL database

If you do not have all the necessary Python packages in your environment, the example below shows you how to install them from a Jupyter notebook

In [None]:
# import sys
# !{sys.executable} -m pip install chembl_webresource_client
# !{sys.executable} -m pip install scikit-learn

## Collecting of a dataset

In step 1 of the tutorial, we will

- gather a dataset of compounds from the ChEMBL database by utilizing the official ChEMBL server client - **chembl_webresource_client**

- analyze and clean the dataset

[Click here to start](https://github.com/meddwl/skillbox/blob/main/QSAR/Dataset_Selection.ipynb)

### References:

- ChEMBL webresource client https://github.com/chembl/chembl_webresource_client
- Molecular Standardization by Prof. Garrett M. Morris https://www.blopig.com/blog/2022/05/molecular-standardization/

## Similarity agorithms 

There you will learn more about

- molecular fingerprints
- similarity metrics
- how to perform Maximum Common Substructure on Python using RDKit
- how to perform Murcko decomposition on Python using RDKit

[Click here to start](https://github.com/meddwl/skillbox/blob/main/QSAR/Similarity_Agorithms.ipynb)

### References:

- Holliday JD, Hu CY, Willett P., Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings (2002), DOI: https://doi.org/10.2174/1386207024607338
- Visualizing Chemical Space by Pat Walters http://practicalcheminformatics.blogspot.com/2019/11/visualizing-chemical-space.html
- Metrics https://en.wikipedia.org/wiki/Precision_and_recall
- Daylight, Fingerprints - Screening and Similarity https://www.daylight.com/dayhtml/doc/theory/theory.finger.html

![Presentation1.jpg](attachment:Presentation1.jpg)

## Unsupervised Learning

There you will learn more about

- Dimensionality Reduction
- Clustering

[Click here to start](https://github.com/meddwl/skillbox/blob/main/QSAR/Unsupervised_Learning.ipynb)


### References:

- Scikit-learn, clustering https://scikit-learn.org/stable/modules/clustering.html
- Scikit-learn, PCA https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- Scikit-learn, t-SNE https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

## Supervised Learning

There you will learn more about

- machine learning models valudation algorithms
- how to perform classification algorithms using Scikit-learn
- how to perform regression algorithms using Scikit-learn

[Click here to start](https://github.com/meddwl/skillbox/blob/main/QSAR/Supervised_Learning.ipynb)


### References:

- Supervised learning https://scikit-learn.org/stable/supervised_learning.html#supervised-learning
- Cross Validation https://scikit-learn.org/stable/modules/cross_validation.html#
- Automated Machine Learning (AutoML) https://machinelearningmastery.com/automl-libraries-for-python/

## The place for your personal growth

[https://github.com/meddwl/skillbox/blob/main/QSAR/Challenge.ipynb)

### Literature:
- RDKit Tutorials by Greg Landrum: https://github.com/rdkit/rdkit-tutorials
- Scikit-learn API: https://scikit-learn.org/stable/modules/classes.html
- Book "Python Data Science Handbook" by Jake VanderPlas