<a href="https://colab.research.google.com/github/jyryu3161/DrugDiscovery/blob/main/lec8_code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setting environment

In [None]:
!pip install -q condacolab # install the condacolab package
import condacolab # Import and initialize condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:12
🔁 Restarting kernel...


In [None]:
import condacolab
condacolab.check() # verification of the installation

✨🍰✨ Everything looks OK!


In [None]:
# Create a new environment (optional)
!conda create -n myenv python=3.9 -y

# Install packages
!conda install -c conda-forge numpy pandas matplotlib rdkit -y
# List installed packages
!conda list

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
Solving environment: - \ | done


    current version: 24.11.2
    latest version: 25.3.1

Please update conda by running

    $ conda update -n base -c conda-forge conda



## Package Plan ##

  environment location: /usr/local/envs/myenv

  added / updated specs:
    - python=3.9


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2025.4.26  |       hbd8a1cb_0         149 KB  conda-forge
    ld_impl_linux-64-2.43      |       h712a8e2_4         656 KB  conda-forge
    libexpat-2.7.0             |       h5888daf_0          73 KB  conda-forge
    libffi-3.4.6               |       h2dba641_1

In [None]:
!pip install Bio
!pip install openai


Collecting rdkit-pypi
  Downloading rdkit_pypi-2022.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading rdkit_pypi-2022.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m57.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit-pypi
Successfully installed rdkit-pypi-2022.9.5


In [None]:
!pip install oddt

Collecting oddt
  Downloading oddt-0.7.tar.gz (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpydoc (from oddt)
  Downloading numpydoc-1.8.0-py3-none-any.whl.metadata (4.3 kB)
Collecting scipy>=0.17 (from oddt)
  Downloading scipy-1.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting scikit-learn>=0.18 (from oddt)
  Downloading scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting joblib>=0.9.4 (from oddt)
  Downloading joblib-1.5.0-py3-none-any.whl.metadata (5.6 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=0.18->oddt)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Collecting sphinx>=6 (from numpydoc->oddt)
  Downloading sphinx-8.2.3-py3-none-any.whl.metadata (7.0 kB)
Collecting tabulate>=0.8.10 (from numpydoc->od

In [None]:
import pandas as pd
from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem import AllChem
from rdkit import RDLogger
import warnings
import tqdm
import numpy as np

# RDKit 로거 레벨 설정 - 경고 메시지 숨기기
RDLogger.DisableLog('rdApp.*')  # 모든 RDKit 경고 메시지 비활성화

# Python 경고 메시지도 필터링
warnings.filterwarnings('ignore')

# 기존 코드
lib_df = pd.read_csv('Enamine_Discovery_Diversity_Set_50_plated_50240cmpds_20250504.smiles', sep='\t')
lib_smiles_list = lib_df['SMILES']

ref_df = pd.read_csv('./Reference.txt', sep='\t')
ref_smiles_list = ref_df['Ligand SMILES']

max_sim_list = []
for each_smiles in tqdm.tqdm(lib_smiles_list):
    lib_mol = Chem.MolFromSmiles(each_smiles)  # 첫 번째 참조 SMILES 사용

    if lib_mol:
        lib_fp = AllChem.GetMorganFingerprintAsBitVect(lib_mol, 2, nBits=2048)

        # 각 라이브러리 SMILES에 대한 Tanimoto 유사도 계산
        tanimoto_scores = []
        for smiles in ref_smiles_list:
            mol = Chem.MolFromSmiles(smiles)
            if mol:
                fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)
                similarity = DataStructs.TanimotoSimilarity(lib_fp, fp)
                tanimoto_scores.append(similarity)
            else:
                tanimoto_scores.append(None)  # 유효하지 않은 SMILES에 대해 None 값 할당

        max_sim = np.max(tanimoto_scores)
        # 유사도 점수를 데이터프레임에 추가
        max_sim_list.append(max_sim)

    else:
        max_sim_list.append(0.0)
        print("참조 SMILES를 분자 객체로 변환할 수 없습니다.")

# 결과 확인
lib_df['Tanimoto_Similarity'] = max_sim_list
lib_df

100%|██████████| 50240/50240 [04:40<00:00, 179.19it/s]


Unnamed: 0,SMILES,Catalog ID,MW,MW (desalted),ClogP,logS,HBD,HBA,TPSA,RotBonds,AnalogsFromREAL,Tanimoto_Similarity
0,COCCN1C(=NN=C1N2CCCN(CC(F)(F)F)CC2)C3=CC=C(Cl)...,Z2183228266,435.847,435.847,3.187,-4.954,0,5,46.42,7,https://real.enamine.net/public-enum-files/Z21...,0.153846
1,CC(C)C(C)(CO)NC(=O)NCC1CCN(CC(O)C=2C=CC=CC2)CC1,Z2183030760,377.522,377.522,2.174,-2.602,4,4,84.83,8,https://real.enamine.net/public-enum-files/Z21...,0.181818
2,CC=1C(=O)C=2C=CC=C(C(=O)NC=3N=CC=CC3N(C)C)C2OC...,Z1688208702,399.443,399.443,4.326,-5.904,1,5,71.53,4,https://real.enamine.net/public-enum-files/Z16...,0.202532
3,COC=1C=CC=2OC(C)CN(C(=O)C=3C=C(C4CC4)N(N3)C(C)...,Z1265493266,369.458,369.458,3.297,-4.099,0,4,56.59,4,https://real.enamine.net/public-enum-files/Z12...,0.161290
4,CC(C)(C)OC(=O)N1CCC2(CC(=NO2)C(=O)NC3=CC=CC4=C...,Z1688503325,409.479,409.479,4.302,-5.963,1,4,80.23,4,https://real.enamine.net/public-enum-files/Z16...,0.202532
...,...,...,...,...,...,...,...,...,...,...,...,...
50235,CC1CCN(CC1C=2C=CC=CC2)C(=O)CN3N=NC=4C=CC=CC43,Z5177343348,334.416,334.416,3.729,-3.974,0,3,51.02,3,https://real.enamine.net/public-enum-files/Z51...,0.159574
50236,CC1=CC(C)=C(Br)C(C)=C1CC(=O)NC=2NN=C3CCOCC23,Z5177344292,378.264,378.264,3.083,-4.670,2,3,67.01,3,https://real.enamine.net/public-enum-files/Z51...,0.170455
50237,CN1C(=O)C=2C=C(SC2N(C)C1=O)C(=O)NC=3C=NNC3,Z5177344371,305.314,305.314,0.681,-2.833,2,4,98.40,2,https://real.enamine.net/public-enum-files/Z51...,0.195122
50238,CN1N=CC(NC(=O)C=2C=CC3=C(Br)C=NN3C2)=C1C=4C=CC...,Z5177344416,396.241,396.241,2.589,-4.379,1,3,64.22,3,https://real.enamine.net/public-enum-files/Z51...,0.200000


In [None]:
lib_df2 = lib_df[lib_df['Tanimoto_Similarity']>0.2]
lib_df2 = lib_df[lib_df['Tanimoto_Similarity']<0.7]
lib_df2 = lib_df2.nlargest(1000, 'Tanimoto_Similarity')
lib_df2.to_csv('./Enamine_lib_sim_selected.csv', index=False)


# Binding Site Prediction

In [None]:
!apt-get update -y
!apt-get install -y openjdk-11-jre-headless
!java -version
!wget https://github.com/rdk/p2rank/releases/download/2.4/p2rank_2.4.tar.gz
!tar -xzf p2rank_2.4.tar.gz


Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,665 kB]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [8,920 kB]
Get:13 https://r2u.stat.illinois.edu/ubuntu jammy/

In [None]:
!./p2rank_2.4/prank predict -f swiss_model_02.pdb -o output_folder2


----------------------------------------------------------------------------------------------
 P2Rank 2.4
----------------------------------------------------------------------------------------------

predicting pockets for proteins from dataset [swiss_model_02.pdb]
processing [swiss_model_02.pdb] (1/1)
predicting pockets finished in 0 hours 0 minutes 38.355 seconds
results saved to directory [/content/output_folder2]

----------------------------------------------------------------------------------------------
 finished successfully in 0 hours 0 minutes 41.007 seconds
----------------------------------------------------------------------------------------------
