In [2]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.7-py3-none-any.whl (55 kB)
[?25l[K     |██████                          | 10 kB 24.5 MB/s eta 0:00:01[K     |███████████▉                    | 20 kB 28.2 MB/s eta 0:00:01[K     |█████████████████▊              | 30 kB 30.2 MB/s eta 0:00:01[K     |███████████████████████▋        | 40 kB 32.7 MB/s eta 0:00:01[K     |█████████████████████████████▌  | 51 kB 34.1 MB/s eta 0:00:01[K     |████████████████████████████████| 55 kB 3.9 MB/s 
Collecting requests-cache~=0.7.0
  Downloading requests_cache-0.7.4-py3-none-any.whl (38 kB)
Collecting url-normalize<2.0,>=1.4
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)
Collecting pyyaml>=5.4
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 31.9 MB/s 
[?25hCollecting itsdangerous>=2.0.1
  Downloading itsdangerous-2.0.1-py3-none-any.whl (18 kB)
Installing collected packages:

In [3]:
import csv
from chembl_webresource_client.new_client import new_client

In [4]:
# Mapping resulting structure with ChEMBL IDs into target uniprot IDs
compounds2targets = dict()

In [6]:
# Parse the CSV file to extract compounds ChEMBL IDs:
with open('pdl1_bioactivity_preprocessed_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        compounds2targets[row[0]] = set()

In [7]:
# Process in chunks since we have 192 rows of ChEMBL IDs
chunk_size = 50
keys = list(compounds2targets.keys())

for i in range(0, len(keys), chunk_size):
  # Jumping from compounds to targets through activities
  activities = new_client.activity.filter(molecule_chembl_id__in=keys[i:i + chunk_size]).only(
      ['molecule_chembl_id', 'target_chembl_id'])
  for act in activities:
    compounds2targets[act['molecule_chembl_id']].add(act['target_chembl_id'])

In [8]:
# Now that our dictionary maps from compound ChEMBL IDs have been changed into target ChEMBL IDs
# We will replace target ChEMBL IDs with uniprot IDs

for key, val in compounds2targets.items():
    # Process in chunks
    lval = list(val)
    genes = set()
    for i in range(0, len(val), chunk_size):
        targets = new_client.target.filter(target_chembl_id__in=lval[i:i + chunk_size]).only(
            ['target_components'])
        for target in targets:
          for component in target['target_components']:
            for synonym in component['target_component_synonyms']:
              if synonym['syn_type'] == 'GENE_SYMBOL':
                genes.add(synonym['component_synonym'])

    compounds2targets[key] = genes

In [10]:
# Write it to the output CSV file
with open('pdl1_compounds_2_genes.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    for key, val in compounds2targets.items():
        writer.writerow([key] + list(val))

After looking at the *'pdl1_compounds_2_genes.csv'*, we can see that only 2 genes are associated with this molecule, **CD274** and **PDCD1**. After a quick google search, we can see that **MEDI4736**, **MPDL3280A**, **BMS-936559**, and **Avelumab** treat a number of cancers that are related to PD-L1 or CD274 expression. All of these drugs are PD-L1/B7-H1/CD274 inhibitors. For **PDCD1**, there are **Atezolizumab**, **Avelumab**, and **Durvalumab**.

**Avelumab** is the only drug that targets both PD-1, PD-L1/B7-H1/CD274, and PDCD1.

But there is no common drug that inhibits Tryptase and PD-L1...