# A.3.2: Predicting Off-Targets

### Retrieve Imatinib-Binding Proteins from the PDB

In [16]:
# imports
import biotite.database.rcsb as rcsb
import biotite.structure.io.pdbx as pdbx
import os
from biotite.structure import superimpose, rmsd
import numpy as np

In [17]:
# Define queries
xrayChrist = rcsb.FieldQuery("exptl.method", exact_match="X-RAY DIFFRACTION")
resolution = rcsb.FieldQuery("rcsb_entry_info.resolution_combined", less_or_equal=3)
ImatinibLigand = rcsb.FieldQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", exact_match="STI")
#ImatinibLigand = rcsb.FieldQuery("rcsb_nonpolymer_entity_annotation.comp_id", exact_match="STI")
ranking = rcsb.FieldQuery("rcsb_nonpolymer_instance_validation_score.ranking_model_fit", greater_or_equal=0.5) 

# Combine queries
composite_query = xrayChrist & resolution & ImatinibLigand & ranking

In [18]:
# Run Query
pdb_ids = rcsb.search(composite_query)
print(pdb_ids)
print(len(pdb_ids))

['1IEP', '1OPJ', '1T46', '2HYY', '2OIQ', '2PL0', '3FW1', '3GVU', '3K5V', '3MS9', '3MSS', '3OEZ', '3PYY', '4BKJ', '4CSV', '4R7I', '6HD4', '6HD6', '6JOL', '6NPE', '6NPU', '6NPV', '7N9G']
23


### Read & Filter Structures

In [19]:
# Create directory for structure downloads
os.makedirs("pdb_files", exist_ok=True)

filtered_structures = []

for pdb_id in pdb_ids:
    try:
        file_path = rcsb.fetch(pdb_id, "cif", "pdb_files") # fetch entry
        pdbx_file = pdbx.CIFFile.read(file_path)
        structure = pdbx.get_structure(pdbx_file, model=1)
        
        # Only keep first chain
        first_chain = structure[structure.chain_id == "A"]
        filtered_structures.append((pdb_id, first_chain))
    except Exception as e:
        print(f"Error processing {pdb_id}: {e}")

print(f"\nSuccessfully filtered {len(filtered_structures)} structures.")


Successfully filtered 23 structures.


### Structural Comparison

In [20]:
n = len(filtered_structures)

# initialize matrix to store rmsd values
rmsd_matrix = np.zeros((n, n))

for i in range(len(filtered_structures)):
    for j in range(i + 1, len(filtered_structures)):
        fixed = filtered_structures[i][1]
        moving = filtered_structures[j][1]

        # ensure same number of atoms (cut to shortest protein of both)
        min_len = min(len(fixed), len(moving))
        fixed_coordinates = fixed.coord[:min_len]
        moving_coordinates = moving.coord[:min_len]

        # superimpose 
        fitted, transformation = superimpose(fixed_coordinates, moving_coordinates)
        rmsd_val = rmsd(fixed_coordinates, fitted)
        rmsd_matrix[i][j] = rmsd_val
        rmsd_matrix[j][i] = rmsd_val

# calculate average RMSD for every protein
average_rmsds = rmsd_matrix.sum(axis=1) / (n - 1)

print("\nAverage RMSD for each protein structure:")
for i, (pdb_id, _) in enumerate(filtered_structures):
    print(f"{pdb_id}: {average_rmsds[i]:.3f} Å")


Average RMSD for each protein structure:
1IEP: 15.446 Å
1OPJ: 15.904 Å
1T46: 19.888 Å
2HYY: 16.215 Å
2OIQ: 16.222 Å
2PL0: 15.161 Å
3FW1: 20.147 Å
3GVU: 16.259 Å
3K5V: 16.050 Å
3MS9: 15.649 Å
3MSS: 15.420 Å
3OEZ: 16.491 Å
3PYY: 15.354 Å
4BKJ: 18.246 Å
4CSV: 16.570 Å
4R7I: 17.095 Å
6HD4: 16.205 Å
6HD6: 16.000 Å
6JOL: 15.282 Å
6NPE: 15.303 Å
6NPU: 14.793 Å
6NPV: 15.737 Å
7N9G: 16.463 Å


### Identify Oﬀ-Target Candidates

In [21]:
mean_rmsd = np.mean(average_rmsds)
std_rmsd = np.std(average_rmsds)

# identify outliers
outlier_indices = np.where(average_rmsds > mean_rmsd + 2 * std_rmsd)[0]

print("\nPotential outlier structures/off-target candidates:")
for idx in outlier_indices:
    pdb_id = filtered_structures[idx][0]
    print(f"{pdb_id}: avg RMSD = {average_rmsds[idx]:.3f} Å")


Potential outlier structures/off-target candidates:
1T46: avg RMSD = 19.888 Å
3FW1: avg RMSD = 20.147 Å


Since Imatinib is a kinase inhibitor which selectively binds to the structurally conserved ATP-binding site of tyrosine kinases, its on-target structures comprise kinases. These structures resemble each other since they have the same function. Off-target structures still bind Imatinib unintendedly, probably because they share similar binding sites or structural motifs that allow ligand-protein interactions. Because these off-target structures have a different function, also their structure is different. Thus, the average RMSD of pairwise superimposed structures is higher. 

### Oﬀ-Target Validation

1T46 is the PDB ID for c-Kit [1]. c-Kit was crystallized in an autoinhibited state, where the juxtamembrane region of c-Kit inserts into the active site. This way, structural rearrangements needed for activation are blocked [2]. Since this autoinhibited conformation is distinct from the active kinase conformations, and c-Kit belongs to the Type III transmembrane RPTK subfamily, 1T46 was identified as off-target.

3FW1 is the PDB ID for Quinone Reductase 2 [3]. It is classified as oxidoreductase (NQO2) and therefore an off-target. Imatinib competitively inhibits NQO2, making treatment of chronic myeloid leukemia and other cancers with Imatinib more challenging [4].

[1] https://www.rcsb.org/structure/1T46 \
[2] Mol C. D. et al. Structural Basis for the Autoinhibition and STI-571 Inhibition of c-Kit Tyrosine Kinase. Journal of Biological Chemistry. 2004 Jul. Vol 279, Issue 30, P31655-31663. doi: 10.1074/jbc.M403319200 \
[3] https://www.rcsb.org/structure/3FW1 \
[4] Winger J. A. et al. The structure of the leukemia drug imatinib bound to human quinone reductase 2 (NQO2). BMC Struct Biol. 2009 Feb 24;9:7. doi: 10.1186/1472-6807-9-7. PMID: 19236722; PMCID: PMC2655291.