#  Comparing methods for identifying drug-like molecules


- In this notebook, we will compare different methods for identifying drug-like molecules. We will use the following datasets:
- 
- 1. ZINC database: A database of over 100 million compounds with known chemical structures and properties. 
- 2. PubChem BioAssay: A database of bioassays performed on PubChem compounds. 
- 3. ChEMBL: A database of over 1.5 million drug-like molecules with known chemical structures and properties. 
- 
- We will compare the following methods:
- 
- 1. Morgan fingerprints: A method for generating a fixed-length vector representation of a molecule based on its chemical structure. 
- 2. Random forest: A machine learning algorithm that can classify molecules based on their chemical structure. 
- 3. Convolutional neural network: A deep learning algorithm that can classify molecules based on their chemical structure. 
- 
- We will use the following metrics to evaluate the performance of each method:
- 
- 1. Matthews correlation coefficient (MCC): A measure of the quality of the predicted binary classification. 
- 2. Area under the receiver-operating characteristic curve (AUC-ROC): A measure of the quality of the predicted probability distribution. 
- 3. Precision-recall curve: A measure of the quality of the predicted binary classification. 
- 4. ROC curve: A measure of the quality of the predicted probability distribution. 
- 
- We will also compare the performance of each method on the same datasets.

In [1]:
from pathlib import Path
import os

# 获取当前工作目录
HERE = Path(os.getcwd())
DATA = HERE / 'data'
if not DATA.exists():
    DATA.mkdir(parents=True, exist_ok=True)
print(DATA)

/Users/wangyang/Desktop/AI-drug-design/list/05_workshop/05_Comparing_methods_for_identifying_drug-like-molecules/data


In [2]:
import sys
import os
import pandas as pd
from rdkit import Chem
from rdkit.Chem.QED import qed
from rdkit.Chem import RDConfig
import seaborn as sns
sys.path.append(os.path.join(RDConfig.RDContribDir,'SA_Score'))
import sascorer
from tqdm.auto import tqdm
import useful_rdkit_utils as uru

ModuleNotFoundError: No module named 'useful_rdkit_utils'