## **Installing libraries**

In [33]:
! pip install chembl_webresource_client



## **Importing libraries**

In [34]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

## **Target search for Tryptase**

In [35]:
# Target search for Tryptase protein
target = new_client.target
target_query = target.search('tryptase')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Tryptase,20.0,False,CHEMBL2095193,"[{'accession': 'Q9BZJ3', 'component_descriptio...",PROTEIN FAMILY,9606
1,"[{'xref_id': 'Q15661', 'xref_name': None, 'xre...",Homo sapiens,Tryptase beta-1,19.0,False,CHEMBL2617,"[{'accession': 'Q15661', 'component_descriptio...",SINGLE PROTEIN,9606
2,"[{'xref_id': 'TPSG1', 'xref_name': None, 'xref...",Homo sapiens,Tryptase gamma,19.0,False,CHEMBL4955,"[{'accession': 'Q9NRR2', 'component_descriptio...",SINGLE PROTEIN,9606
3,[],Homo sapiens,Tryptase beta-2,19.0,False,CHEMBL4523196,"[{'accession': 'P20231', 'component_descriptio...",SINGLE PROTEIN,9606
4,"[{'xref_id': 'P27435', 'xref_name': None, 'xre...",Rattus norvegicus,Tryptase alpha/beta-1,18.0,False,CHEMBL3320,"[{'accession': 'P27435', 'component_descriptio...",SINGLE PROTEIN,10116
5,"[{'xref_id': 'P15944', 'xref_name': None, 'xre...",Canis lupus familiaris,Tryptase,17.0,False,CHEMBL4700,"[{'accession': 'P15944', 'component_descriptio...",SINGLE PROTEIN,9615
6,"[{'xref_id': 'Q02844', 'xref_name': None, 'xre...",Mus musculus,Tryptase alpha/beta-1,17.0,False,CHEMBL4749,"[{'accession': 'Q02844', 'component_descriptio...",SINGLE PROTEIN,10090
7,[],Mus musculus,Tryptase beta-2,16.0,False,CHEMBL4523201,"[{'accession': 'P21845', 'component_descriptio...",SINGLE PROTEIN,10090
8,"[{'xref_id': 'P49864', 'xref_name': None, 'xre...",Rattus norvegicus,Granzyme K,14.0,False,CHEMBL4557,"[{'accession': 'P49864', 'component_descriptio...",SINGLE PROTEIN,10116
9,"[{'xref_id': 'P49863', 'xref_name': None, 'xre...",Homo sapiens,Granzyme K,12.0,False,CHEMBL4930,"[{'accession': 'P49863', 'component_descriptio...",SINGLE PROTEIN,9606


## **Select and retrieve bioactivity data for Tryptase (first entry)**

We will assign the first entry (which corresponds to the target protein, Tryptase in Homo sapiens) to the selected_target variable

In [36]:
selected_target = targets.target_chembl_id[0]
selected_target

'CHEMBL2095193'

Here, we will retrieve only bioactivity data for Tryptase (CHEMBL2095193) that are reported as $IC_{50}$ values in nM (nanomolar) unit. $IC_{50}$ is used for the measure of antagonist drug potency in pharmacological research.

In [37]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [38]:
df = pd.DataFrame.from_dict(res)

In [39]:
df.head(3)

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,266666,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCN(C(...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '16.27', 'le': '0.31', 'lle': '6.99', ...",CHEMBL432835,,CHEMBL432835,8.05,False,http://www.openphacts.org/units/Nanomolar,207080,=,1,True,=,,IC50,nM,,9.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,9.0
1,,269175,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '14.06', 'le': '0.27', 'lle': '5.70', ...",CHEMBL324621,,CHEMBL324621,7.15,False,http://www.openphacts.org/units/Nanomolar,207082,=,1,True,=,,IC50,nM,,71.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,71.0
2,,270490,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '13.15', 'le': '0.25', 'lle': '5.24', ...",CHEMBL324621,,CHEMBL324621,6.69,False,http://www.openphacts.org/units/Nanomolar,207083,=,1,True,=,,IC50,nM,,204.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,204.0


We want a 'standard_value' to be as low as possible to see which compounds or pharmacological drugs interact the most with Tryptase, which is the main biomarker for mast cells. High Tryptase levels indicate either allergic reactions or mastocytosis (mast cell cancer).

Using the 'unique' function, we can see that only $IC_{50}$ is present in the DataFrame

In [40]:
df.standard_type.unique()

array(['IC50'], dtype=object)

Finally, we will save the resulting bioactivity data to a CSV file **tryptase_bioactivity_data.csv**.

In [41]:
df.to_csv('tryptase_bioactivity_data.csv', index=False)

## **Copying files to Google Drive**

In [42]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)


Mounted at /content/gdrive/


In [43]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/tryptase_data"

mkdir: cannot create directory ‘/content/gdrive/My Drive/Colab Notebooks/tryptase_data’: File exists


In [44]:
! cp tryptase_bioactivity_data.csv "/content/gdrive/My Drive/Colab Notebooks/tryptase_data"

In [45]:
! ls -l "/content/gdrive/My Drive/Colab Notebooks/tryptase_data"

total 150
-rw------- 1 root root 77656 Aug 22 09:28 tryptase_bioactivity_data.csv
-rw------- 1 root root 15509 Aug 22 09:26 tryptase_bioactivity_preprocessed_data.csv
-rw------- 1 root root 59136 Aug 22 09:26 Tryptase_data.ipynb


In [46]:
! head tryptase_bioactivity_data.csv

activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
,266666,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCN(C(=N)N)C3)[C@H]2C(=O)O)CC1)C(C)C,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '16.27', 'le': '0.31', 'lle': '6.99', 'sei': '5.01'}",CHEMBL432835,,CHEMBL4

In [47]:
df2 = df[df.standard_value.notna()]
df2

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,266666,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCN(C(...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '16.27', 'le': '0.31', 'lle': '6.99', ...",CHEMBL432835,,CHEMBL432835,8.05,False,http://www.openphacts.org/units/Nanomolar,207080,=,1,True,=,,IC50,nM,,9.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,9.0
1,,269175,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '14.06', 'le': '0.27', 'lle': '5.70', ...",CHEMBL324621,,CHEMBL324621,7.15,False,http://www.openphacts.org/units/Nanomolar,207082,=,1,True,=,,IC50,nM,,71.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,71.0
2,,270490,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '13.15', 'le': '0.25', 'lle': '5.24', ...",CHEMBL324621,,CHEMBL324621,6.69,False,http://www.openphacts.org/units/Nanomolar,207083,=,1,True,=,,IC50,nM,,204.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,204.0
3,,271838,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,N=C(N)N1CCCC(C[C@H]2C(=O)N(C(=O)N3CCN(C(=O)CCC...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '15.95', 'le': '0.31', 'lle': '6.03', ...",CHEMBL109504,,CHEMBL109504,8.72,False,http://www.openphacts.org/units/Nanomolar,207085,=,1,True,=,,IC50,nM,,1.9,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,1.9
4,,275325,[],CHEMBL817298,Inhibitory concentration against human tryptase,B,,,BAO_0000190,BAO_0000224,protein format,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,,,CHEMBL1134958,Bioorg. Med. Chem. Lett.,2002,"{'bei': '14.16', 'le': '0.27', 'lle': '5.75', ...",CHEMBL324621,,CHEMBL324621,7.20,False,http://www.openphacts.org/units/Nanomolar,207094,=,1,True,=,,IC50,nM,,63.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,63.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
162,,1217854,[],CHEMBL817287,Inhibition of human tryptase.,B,,,BAO_0000190,BAO_0000224,protein format,NCCCCCCNC(=O)CN1CCCCC(NC(=O)c2ccc(-c3ccccc3)cc...,,,CHEMBL1139445,Bioorg. Med. Chem. Lett.,2004,,CHEMBL107226,,CHEMBL107226,,False,http://www.openphacts.org/units/Nanomolar,198099,>,1,True,>,,IC50,nM,,33000.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,33000.0
163,,1220510,[],CHEMBL817287,Inhibition of human tryptase.,B,,,BAO_0000190,BAO_0000224,protein format,NCCc1ccc(NC(=O)CN2CCCCC(NC(=O)c3ccc(-c4ccccc4)...,,,CHEMBL1139445,Bioorg. Med. Chem. Lett.,2004,"{'bei': '10.23', 'le': '0.19', 'lle': '1.36', ...",CHEMBL326694,,CHEMBL326694,4.96,False,http://www.openphacts.org/units/Nanomolar,198073,=,1,True,=,,IC50,nM,,11000.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,11000.0
164,,1222962,[],CHEMBL817287,Inhibition of human tryptase.,B,,,BAO_0000190,BAO_0000224,protein format,NCc1cccc(NC(=O)CN2CCCCC(NC(=O)c3ccc(Cc4ccccc4)...,,,CHEMBL1139445,Bioorg. Med. Chem. Lett.,2004,"{'bei': '12.40', 'le': '0.23', 'lle': '2.52', ...",CHEMBL320473,,CHEMBL320473,6.01,False,http://www.openphacts.org/units/Nanomolar,198084,=,1,True,=,,IC50,nM,,980.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,nM,UO_0000065,,980.0
165,,15679184,[],CHEMBL3607633,Inhibition of tryptase (unknown origin) using ...,B,,,BAO_0000190,BAO_0000224,protein format,CN1C(=O)c2ccc(OC(=O)CCc3ccc(N)nc3)cc2C1=O.Cl,,,CHEMBL3603840,Bioorg. Med. Chem. Lett.,2015,"{'bei': '20.66', 'le': '0.38', 'lle': '5.29', ...",CHEMBL3604626,,CHEMBL1171655,6.72,False,http://www.openphacts.org/units/Nanomolar,2494589,=,1,True,=,,IC50,nM,,190.0,CHEMBL2095193,Homo sapiens,Tryptase,9606,,,IC50,uM,UO_0000065,,0.19


# **Data pre-processing of bioactivity data**

### **Labeling compounds aas either being active, inactive, or intermediate**

The bioactivity data is in the $IC_{50}$ unit. Compounds having values of less than 1,000 nM will be considered to be **active** while those greater than 10,000 nM will be considered **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**.

In [48]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

### **Iterate the *molecule_chembl_id* to a list because we don't want duplicates**

In [49]:
mol_cid = []
for i in df2.molecule_chembl_id:
  mol_cid.append(i)

### **Iterate *canonical_smiles* to a list because we still don't want duplicates**

In [50]:
canonical_smiles = []
for i in df2.canonical_smiles:
  canonical_smiles.append(i)

### **Iterate the *standard_value* to a list because again, we don't want duplicates**

In [51]:
standard_value = []
for i in df2.standard_value:
  standard_value.append(i)

### **Combine the 4 lists into 1 dataframe**

In [52]:
data_tuples = list(zip(mol_cid, canonical_smiles, bioactivity_class, standard_value))
df3 = pd.DataFrame( data_tuples,  columns=['molecule_chembl_id', 'canonical_smiles', 'bioactivity_class', 'standard_value'])

In [53]:
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,bioactivity_class,standard_value
0,CHEMBL432835,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCN(C(...,active,9.0
1,CHEMBL324621,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,active,71.0
2,CHEMBL324621,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,active,204.0
3,CHEMBL109504,N=C(N)N1CCCC(C[C@H]2C(=O)N(C(=O)N3CCN(C(=O)CCC...,active,1.9
4,CHEMBL324621,CC(C)C(OC(=O)N1CCN(C(=O)N2C(=O)[C@H](CC3CCCN(C...,active,63.0
...,...,...,...,...
162,CHEMBL107226,NCCCCCCNC(=O)CN1CCCCC(NC(=O)c2ccc(-c3ccccc3)cc...,inactive,33000.0
163,CHEMBL326694,NCCc1ccc(NC(=O)CN2CCCCC(NC(=O)c3ccc(-c4ccccc4)...,inactive,11000.0
164,CHEMBL320473,NCc1cccc(NC(=O)CN2CCCCC(NC(=O)c3ccc(Cc4ccccc4)...,active,980.0
165,CHEMBL3604626,CN1C(=O)c2ccc(OC(=O)CCc3ccc(N)nc3)cc2C1=O.Cl,active,190.0


Save the new dataframe to a pre-processed CSV file

In [54]:
df3.to_csv('tryptase_bioactivity_preprocessed_data.csv', index=False)

In [55]:
! ls -l

total 100
drwx------ 5 root root  4096 Aug 22 09:28 gdrive
drwxr-xr-x 1 root root  4096 Aug 13 13:35 sample_data
-rw-r--r-- 1 root root 77656 Aug 22 09:28 tryptase_bioactivity_data.csv
-rw-r--r-- 1 root root 15509 Aug 22 09:28 tryptase_bioactivity_preprocessed_data.csv


In [56]:
! cp tryptase_bioactivity_preprocessed_data.csv "/content/gdrive/My Drive/Colab Notebooks/tryptase_data"

In [57]:
! ls "/content/gdrive/My Drive/Colab Notebooks/tryptase_data"

tryptase_bioactivity_data.csv		    Tryptase_data.ipynb
tryptase_bioactivity_preprocessed_data.csv
