# **Computational Drug Discovery**

Om Rabadia

Using data science to build a machine learning model.

Data is given using the [*ChEMBL bioactivity*](https://www.ebi.ac.uk/chembl/) database (Version 33, Data as of September 30, 2023)



In [1]:
! pip install chembl_webresource_client
# Using Conda environment
# Command Purpose : Pull data from the database



In [3]:
import pandas as pd #library for working with datasets
from chembl_webresource_client.new_client import new_client

We will be searching for Target proteins that cause Alzheimers

In [4]:
target_search = new_client.target.search('Alzheimers')
targets = pd.DataFrame.from_dict(target_search)
targets


Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Nucleosome-remodeling factor subunit BPTF,6.0,False,CHEMBL3085621,"[{'accession': 'Q12830', 'component_descriptio...",SINGLE PROTEIN,9606
1,"[{'xref_id': 'Q92542', 'xref_name': None, 'xre...",Homo sapiens,Nicastrin,5.0,False,CHEMBL3418,"[{'accession': 'Q92542', 'component_descriptio...",SINGLE PROTEIN,9606
2,[],Homo sapiens,Gamma-secretase,5.0,False,CHEMBL2094135,"[{'accession': 'Q96BI3', 'component_descriptio...",PROTEIN COMPLEX,9606
3,[],Rattus norvegicus,Amyloid beta A4 protein,4.0,False,CHEMBL3638365,"[{'accession': 'P08592', 'component_descriptio...",SINGLE PROTEIN,10116
4,[],Mus musculus,Amyloid-beta A4 protein,4.0,False,CHEMBL4523942,"[{'accession': 'P12023', 'component_descriptio...",SINGLE PROTEIN,10090
5,"[{'xref_id': 'P05067', 'xref_name': None, 'xre...",Homo sapiens,Beta amyloid A4 protein,3.0,False,CHEMBL2487,"[{'accession': 'P05067', 'component_descriptio...",SINGLE PROTEIN,9606


We will be selecting Entry 5 (Beta amyloid A4 protein), as this protein contributes to Alzheimer pathogenesis.

In [5]:
BetaA4Amyloid = targets.target_chembl_id[5]
BetaA4Amyloid

'CHEMBL2487'

Now we want to filter and retrieve data for the Beta amyloid A4 protein, which is identified as CHEMBL2487, that is reported as IC50 values (measure of drug efficacy).

IC50 is reported in nM.

In [7]:
resulting_list = new_client.activity.filter(target_chembl_id=BetaA4Amyloid)
resulting_listIC50filter = resulting_list.filter(standard_type="IC50")
dataframe = pd.DataFrame.from_dict(resulting_listIC50filter)
dataframe.head(4) #show first 4

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,357577,[],CHEMBL678443,Inhibition of A-beta-42 production by inhibiti...,B,,,BAO_0000190,...,Homo sapiens,Beta amyloid A4 protein,9606,,,IC50,uM,UO_0000065,,5.0
1,,,357580,[],CHEMBL678443,Inhibition of A-beta-42 production by inhibiti...,B,,,BAO_0000190,...,Homo sapiens,Beta amyloid A4 protein,9606,,,IC50,uM,UO_0000065,,2.7
2,,,358965,[],CHEMBL678443,Inhibition of A-beta-42 production by inhibiti...,B,,,BAO_0000190,...,Homo sapiens,Beta amyloid A4 protein,9606,,,IC50,uM,UO_0000065,,1.8
3,,,368887,[],CHEMBL678443,Inhibition of A-beta-42 production by inhibiti...,B,,,BAO_0000190,...,Homo sapiens,Beta amyloid A4 protein,9606,,,IC50,uM,UO_0000065,,11.0
