Install dependencies first

In [1]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.9-py3-none-any.whl.metadata (1.4 kB)
Collecting requests-cache~=1.2 (from chembl_webresource_client)
  Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Collecting easydict (from chembl_webresource_client)
  Downloading easydict-1.13-py3-none-any.whl.metadata (4.2 kB)
Collecting cattrs>=22.2 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading cattrs-24.1.2-py3-none-any.whl.metadata (8.4 kB)
Collecting url-normalize>=1.4 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl.metadata (3.1 kB)
Downloading chembl_webresource_client-0.10.9-py3-none-any.whl (55 kB)
Downloading requests_cache-1.2.1-py3-none-any.whl (61 kB)
Downloading easydict-1.13-py3-none-any.whl (6.8 kB)
Downloading cattrs-24.1.2-py3-none-any.whl (66 kB)
Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)
Installing collected packages: easydict,

In [2]:
import pandas as pd
from chembl_webresource_client.new_client import new_client

In [5]:
target = new_client.target
target_query = target.search('coronavirus')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Coronavirus,Coronavirus,17.0,False,CHEMBL613732,[],ORGANISM,11119
1,[],Feline coronavirus,Feline coronavirus,14.0,False,CHEMBL612744,[],ORGANISM,12663
2,[],Murine coronavirus,Murine coronavirus,14.0,False,CHEMBL5209664,[],ORGANISM,694005
3,[],Canine coronavirus,Canine coronavirus,14.0,False,CHEMBL5291668,[],ORGANISM,11153
4,[],Human coronavirus 229E,Human coronavirus 229E,13.0,False,CHEMBL613837,[],ORGANISM,11137
5,[],Human coronavirus OC43,Human coronavirus OC43,13.0,False,CHEMBL5209665,[],ORGANISM,31631
6,[],Severe acute respiratory syndrome-related coro...,SARS coronavirus 3C-like proteinase,10.0,False,CHEMBL3927,"[{'accession': 'P0C6U8', 'component_descriptio...",SINGLE PROTEIN,694009
7,[],Middle East respiratory syndrome-related coron...,Middle East respiratory syndrome-related coron...,9.0,False,CHEMBL4296578,[],ORGANISM,1335626
8,[],Severe acute respiratory syndrome-related coro...,Replicase polyprotein 1ab,4.0,False,CHEMBL5118,"[{'accession': 'P0C6X7', 'component_descriptio...",SINGLE PROTEIN,694009
9,[],Severe acute respiratory syndrome coronavirus 2,Replicase polyprotein 1ab,4.0,False,CHEMBL4523582,"[{'accession': 'P0DTD1', 'component_descriptio...",SINGLE PROTEIN,2697049


We select the coronavirus "Single Protein", index 6

In [20]:
selected_target = targets["target_chembl_id"][6]
selected_target

'CHEMBL3927'

### TARGET ACTIVITY

Target activity refers to the biological activity measurements between chemical compounds and your selected target.

### WHAT TYPES OF ACTIVITY EXIST?

- ***decrease some biological effect*** - INHIBITION!
- ***increasing some biological effect*** - STIMULATION!
- ***modifying some biological effect*** - MODULATION!

### TYPES OF TARGET ACTIVITIES

- **Potency** - A measure of how much drug is needed to achieve a desired effect, typically concentration.
- **Activity** - A general term describing the biological effect of a compound, typically % activity compared to no drug 

WE CAN HAVE A DRUG THAT HAS GOOD % OF ACTIVITY/EFFECTIVENESS BUT WITH HIGH DRUG CONCENTRATION (LOW POTENCY)



### EXAMPLE

Drug A:
- 1µM gives 80% inhibition (High potency, High activity)
- Very effective at low dose!

Drug B:
- 1µM gives 10% inhibition
- 100µM gives 90% inhibition (Low potency, High activity)
- Works well but needs higher dose


In [23]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target)
df = pd.DataFrame.from_dict(res)
print(df.columns)

Index(['action_type', 'activity_comment', 'activity_id', 'activity_properties',
       'assay_chembl_id', 'assay_description', 'assay_type',
       'assay_variant_accession', 'assay_variant_mutation', 'bao_endpoint',
       'bao_format', 'bao_label', 'canonical_smiles', 'data_validity_comment',
       'data_validity_description', 'document_chembl_id', 'document_journal',
       'document_year', 'ligand_efficiency', 'molecule_chembl_id',
       'molecule_pref_name', 'parent_molecule_chembl_id', 'pchembl_value',
       'potential_duplicate', 'qudt_units', 'record_id', 'relation', 'src_id',
       'standard_flag', 'standard_relation', 'standard_text_value',
       'standard_type', 'standard_units', 'standard_upper_value',
       'standard_value', 'target_chembl_id', 'target_organism',
       'target_pref_name', 'target_tax_id', 'text_value', 'toid', 'type',
       'units', 'uo_units', 'upper_value', 'value'],
      dtype='object')


So the idea is that this protein has different targets that have different degrees of activity. If we look at the standard_type column we can see the different activity types

In [27]:
standard_type = df["standard_type"].unique()
print("Standard types: ", standard_type)

Standard types:  ['Inhibition' 'IC50' 'kinact' 'T1/2' 'Activity' 'Ki' 'Km' 'Kcat' 'Kcat/Km'
 'EC50' 'Ratio' 'Kd']


## There exist two types of drugs: Agonist or Antagonists. 

**Agonist** drugs basically activate or enhance the target function -> increase the efect -> for Activity measurement we use **EC50**


**Antagonist** they block/reduce the target function -> for Activity measurement we use **IC50**

IC50 is commonly used as a measure of antagonist drug potency in pharmacological research. IC50 is comparable to other measures of potency, such as EC50 for excitatory drugs. EC50 represents the dose or plasma concentration required for obtaining 50% of a maximum effect in vivo.[1]

In [31]:
activity = new_client.activity
#we filter by the standard type==IC50 so we are getting drugs that are antagonists. They block the target function
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")
res
print(type(res))

<class 'chembl_webresource_client.query_set.QuerySet'>


In [32]:
df = pd.DataFrame.from_dict(res)

In [34]:
df.to_csv('bioactivity_coronavirus_protein_raw.csv', index=False)

## Handling missing data

We just get rid of missing data

In [37]:
clean_df = df[df.standard_value.notna()]
clean_df.to_csv('bioactivity_coronavirus_protein_clean.csv', index=False)