# A.3.2: Predicting Off-Targets

### Retrieve Imatinib-Binding Proteins from the PDB

In [3]:
# imports
import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx
import os
from biotite.structure import superimpose_homologs, rmsd, superimpose_structural_homologs
import numpy as np

In [4]:
# Define queries
xrayChrist = rcsb.FieldQuery("exptl.method", exact_match="X-RAY DIFFRACTION")
resolution = rcsb.FieldQuery("rcsb_entry_info.resolution_combined", less_or_equal=3)
ImatinibLigand = rcsb.FieldQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", exact_match="STI")
#ImatinibLigand = rcsb.FieldQuery("rcsb_nonpolymer_entity_annotation.comp_id", exact_match="STI")
ranking = rcsb.FieldQuery("rcsb_nonpolymer_instance_validation_score.ranking_model_fit", greater_or_equal=0.5) 

# Combine queries
composite_query = xrayChrist & resolution & ImatinibLigand & ranking

In [5]:
# Run Query
pdb_ids = rcsb.search(composite_query)
print(pdb_ids)
print(len(pdb_ids))

['1IEP', '1OPJ', '1T46', '2HYY', '2OIQ', '2PL0', '3FW1', '3GVU', '3K5V', '3MS9', '3MSS', '3OEZ', '3PYY', '4BKJ', '4CSV', '4R7I', '6HD4', '6HD6', '6JOL', '6NPE', '6NPU', '6NPV', '7N9G']
23


### Read & Filter Structures

In [6]:
# Create directory for structure downloads
os.makedirs("pdb_files", exist_ok=True)

filtered_structures = []

for pdb_id in pdb_ids:
    try:
        file_path = rcsb.fetch(pdb_id, "cif", "pdb_files") # fetch entry
        pdbx_file = pdbx.CIFFile.read(file_path)
        structure = pdbx.get_structure(pdbx_file, model=1)
        
        # Only keep first chain
        first_chain = structure[structure.chain_id == "A"]
        filtered_structures.append((pdb_id, first_chain))
    except Exception as e:
        print(f"Error processing {pdb_id}: {e}")

print(f"\nSuccessfully filtered {len(filtered_structures)} structures.")


Successfully filtered 23 structures.


### Structural Comparison

In [12]:
n = len(filtered_structures)

# initialize matrix to store rmsd values
rmsd_matrix = np.zeros((n, n))

for i in range(len(filtered_structures)):
    for j in range(i + 1, len(filtered_structures)):
        try:
            fixed = filtered_structures[i][1]
            mobile = filtered_structures[j][1]

            # superimpose 
            fitted, transform, fixed_anchor_indices, mobile_anchor_indices = superimpose_homologs(fixed, mobile)
            rmsd_val = rmsd(fixed[fixed_anchor_indices], mobile[mobile_anchor_indices])
            rmsd_matrix[i][j] = rmsd_val
            rmsd_matrix[j][i] = rmsd_val
        except Exception as e:
            print(f"Error processing protein {i} & {j}: {filtered_structures[i][0]} & {filtered_structures[j][0]}: {e}")

Error processing protein 5 & 6: 2PL0 & 3FW1: Tried fallback due to low anchor number, but number of backbone atoms does not match
Error processing protein 6 & 13: 3FW1 & 4BKJ: Tried fallback due to low anchor number, but number of backbone atoms does not match
Error processing protein 6 & 14: 3FW1 & 4CSV: Tried fallback due to low anchor number, but number of backbone atoms does not match


In [15]:
missing_i = 6
missing_j = (5, 13, 14)

for j in missing_j:
    try:
        fixed = filtered_structures[missing_i][1]
        mobile = filtered_structures[j][1]

        # superimpose (allow less sequence similarity)
        fitted, transform, fixed_anchor_indices, mobile_anchor_indices = superimpose_structural_homologs(fixed, mobile)
        rmsd_val = rmsd(fixed[fixed_anchor_indices], mobile[mobile_anchor_indices])
        rmsd_matrix[missing_i][j] = rmsd_val
        rmsd_matrix[j][missing_i] = rmsd_val
    except Exception as e:
        print(f"Error processing protein {missing_i} & {j}: {filtered_structures[missing_i][0]} & {filtered_structures[j][0]}: {e}")

# calculate average RMSD for every protein
rmsd_matrix[rmsd_matrix == 0] = 'nan' # Set diagonal elements to nan to exclude them in the mean calculation
average_rmsds = np.nanmean(rmsd_matrix, axis=1)

print("\nAverage RMSD for each protein structure:")
for i, (pdb_id, _) in enumerate(filtered_structures):
    print(f"{pdb_id}: {average_rmsds[i]:.3f} Å")


Average RMSD for each protein structure:
1IEP: 61.188 Å
1OPJ: 58.745 Å
1T46: 56.156 Å
2HYY: 50.810 Å
2OIQ: 70.555 Å
2PL0: 54.992 Å
3FW1: 52.445 Å
3GVU: 53.317 Å
3K5V: 59.616 Å
3MS9: 61.539 Å
3MSS: 62.822 Å
3OEZ: 54.824 Å
3PYY: 61.605 Å
4BKJ: 58.097 Å
4CSV: 66.677 Å
4R7I: 88.569 Å
6HD4: 59.237 Å
6HD6: 60.094 Å
6JOL: 155.163 Å
6NPE: 62.155 Å
6NPU: 61.955 Å
6NPV: 61.822 Å
7N9G: 74.579 Å


### Identify Oﬀ-Target Candidates

In [16]:
mean_rmsd = np.mean(average_rmsds)
std_rmsd = np.std(average_rmsds)

# identify outliers
outlier_indices = np.where(average_rmsds > mean_rmsd + 2 * std_rmsd)[0]

print("\nPotential outlier structures/off-target candidates:")
for idx in outlier_indices:
    pdb_id = filtered_structures[idx][0]
    print(f"{pdb_id}: avg RMSD = {average_rmsds[idx]:.3f} Å")


Potential outlier structures/off-target candidates:
6JOL: avg RMSD = 155.163 Å


Since Imatinib is a kinase inhibitor which selectively binds to the structurally conserved ATP-binding site of tyrosine kinases, its on-target structures comprise kinases. These structures resemble each other since they have the same function. Off-target structures still bind Imatinib unintendedly, probably because they share similar binding sites or structural motifs that allow ligand-protein interactions. Because these off-target structures have a different function, also their structure is different. Thus, the average RMSD of pairwise superimposed structures is higher. 

### Oﬀ-Target Validation

6J0L is the PDB ID for intracellular B30.2 domain of butyrophilin 3A3 mutant [1]. Upon binding of phosphoantigens to the domain, γδ T cells were shown to be activated [2]. The crystal structure of PDGFRA (6JOL) shows, that there is a ATP-binding site, which is occupied by Imatinib in the PDGFRA-imatinib complex. Certain gastrointestinal stromal tumors (GISTs) have a gain-of-function mutation of the platelet-derived growth factor receptor alpha (PDGFRα) kinases. An off-target activity with Imatinib could have inhibitory effects on this abnormal activation and could therefore have threapeutic importance [3]. This is an interesting finding, since Imatinib is usually used in the context of Chronic myelogenous leukemia (CML).

As 3FW1 was involved in all cases of failed homolog superposition, we also looked deeper into this. Probably, the second approach using superimpose_structural_homologs() was not the right thing to do, since this led to shadowing of an off-target. 3FW1 is the PDB ID for Quinone Reductase 2 [4]. It is classified as oxidoreductase (NQO2) and therefore an off-target. Imatinib competitively inhibits NQO2, making treatment of chronic myeloid leukemia and other cancers with Imatinib more challenging [5].

[1] https://www.rcsb.org/structure/6J0L \
[2] Yang Y. et al. A Structural Change in Butyrophilin upon Phosphoantigen Binding Underlies Phosphoantigen-Mediated Vγ9Vδ2 T Cell Activation. Immunity. 2019 Apr 16;50(4):1043-1053.e5. doi: 10.1016/j.immuni.2019.02.016. Epub 2019 Mar 19. PMID: 30902636. \
[3] Keretsu, S. et al. Molecular Modeling Study of c-KIT/PDGFRα Dual Inhibitors for the Treatment of Gastrointestinal Stromal Tumors. International Journal of Molecular Sciences. 2020 Nov 3. doi: 10.3390/ijms21218232. \
[4] https://www.rcsb.org/structure/3FW1 \
[5] Winger J. A. et al. The structure of the leukemia drug imatinib bound to human quinone reductase 2 (NQO2). BMC Struct Biol. 2009 Feb 24;9:7. doi: 10.1186/1472-6807-9-7. PMID: 19236722; PMCID: PMC2655291.