
# Designing a High Affinity Ligand for the A<sub>1</sub> Adenosine Receptor

Students: Zeynep Tufan (s1773453), Lili Hu (s2263424), Katie Lynch (s3863956)

Course: Advanced Computational Methods in Drug Discovery: AI and Physics Based Simulations

### Introduction
The human body holds a complex network of various signaling receptors that are involved in a wide range of physiological processes. Amongst them, the adenosine A<sub>1</sub> receptor (A<sub>1</sub>R), a G-protein coupled receptor (GPCR) family, is a pivotal mediator. This receptor is predominantly expressed in the brain, heart, and vasculature, and plays a crucial role in modulating neurotransmission, neuronal activity, and cardiovascular homeostasis.<sup>1</sup> Considering its involvement in these processes, A<sub>1</sub>R has shown to be involved in numerous disorders, including neurological and cardiovascular conditions.<sup>2,3</sup> For instance, chronic agonistic activation of A<sub>1</sub>R has been linked with A1R-dependent accumulation of a-synuclein, which on long-term promotes neurodegeneration.<sup>4,5</sup> Consequently, A<sub>1</sub>R has been suggested as a potential target in neurodegenerative disorders such as Parkinson’s disease.<sup>6</sup> Furthermore, the implications of A<sub>1</sub>R overexpression also extend to cardiovascular conditions like ischemic strokes. Activation of the A<sub>1</sub> receptor during ischemic events has shown to aggravate neuronal damage and lead to increased infarct size.<sup>7</sup> Thus, the use of A<sub>1</sub> receptor antagonists may be a potential treatment option to alleviate the detrimental effects of an ischemic stroke.

As previously mentioned, the A<sub>1</sub>R is a G-protein coupled receptor (GPCR) consisting of a single polypeptide chain that transverses the membrane from the extracellular side beginning at the N-terminus to form seven transmembrane helices (TMs).<sup>8</sup> It is made up of 326 amino acids and has an approximate mass of 36.5 kDa.<sup>9</sup> Besides the A<sub>1</sub>R, there are 3 other subtypes of adenosine receptor, namely, A<sub>2</sub>A, A<sub>2</sub>B and A<sub>3</sub>. All of these adenosine receptor subtypes are nonselectively activated by the endogenous ligand adenosine.<sup>10</sup> The A<sub>1</sub> receptor preferentially couples to the G<sub>i</sub> protein to inhibit adenylate cyclase after activation, and consequently the production of cyclic AMP (cAMP).<sup>2</sup>

Despite the A<sub>1</sub>R playing a pivotal role in cardiac, renal and neuronal processes, it remains poorly targeted by drugs. There are very few A<sub>1</sub>R drugs that have successfully progressed through clinical trials and currently no A<sub>1</sub>R drugs actually on the market.<sup>9</sup> The downfall of A<sub>1</sub>R drugs is often considered to be a consequence of widespread distribution of the receptor which can cause a lot of off-target effects. Some A<sub>1</sub>R ligands that have been developed are n6-cyclopentyl-adenosine (CPA), naxifylline and rolofylline. CPA is an A<sub>1</sub>R agonist that was developed as an anti-nociceptive agent, however it has a lot of cardiovascular side effects.<sup>11</sup> Naxifylline was created for the treatment of congestive heart failure, but it has side effects including headache, irregular heart beat, irregular breathing and difficulty sleeping.<sup>8</sup> Rolofylline is a diuretic for acute heart failure, but causes off-target strokes.<sup>9</sup> Seeing as none of the A<sub>1</sub>R drugs are on the market, there are no ATC codes from which to gather more information about the drugs.

In conclusion, the Adenosine 1 receptor is intricately linked to neurological and cardiovascular conditions, including Parkinson's disease and ischemic stroke. The use of A<sub>1</sub> receptor antagonists has shown promise in managing these conditions, but off-target effects often limit entrance into the market. In order to identify novel therapeutic options, investigating new possible antagonists using machine learning and docking simulations has garnered considerable interest. They facilitate the screening of large libraries of compounds and aid in identifying potential ligands that can interact with the A<sub>1</sub> receptor with high affinity, ultimately leading to the development of more effective therapeutic interventions. For this reason, both of these computational methods will be utilized, aiming to design novel ligands that can target the A<sub>1</sub>R with a higher affinity than the co-crystallized ligand DU172 (DU1).<sup>9</sup> In short, machine learning and molecular docking will be employed to predict the affinity of DU1 towards A<sub>1</sub>R. To this end, two different docking software will be used, namely, Autodock Vina and ICM pro of Molsoft. After the prediction of the docking scores, binding poses and the binding site of DU1 and ligand candidates will be evaluated.

### Related proteins (off-targets), based on sequence

First, the A<sub>1</sub>R receptor sequence was acquired through UniProt. Using this sequence, similar targets were found. This was done to evaluate through which receptors off-target effects could potentially be induced. The A<sub>1</sub>R expressed in humans has an accession number of P30542, and a PDB ID of 5UEN. Based on the sequence of the A<sub>1</sub>R receptor (Target 0), a similar receptor was found through a BLAST run on UniProt. This was Target 1 with an accession number Q5RF57. Another BLAST run was done based on the sequence of Target 1 generating Target 2. An overview of these three targets can be found in *Table 1*.

When comparing the similarity of the targets, the A<sub>1</sub>R of the sumatran orangutan (Target 1) was more similar to Target 0, the human A<sub>1</sub>R, than the human A<sub>3</sub>R (Target 2). Target 0 and the sumatran orangutan Target 1 were 99.1% similar while the human Target 0 and Target 2 were only 47.8% similar. This was expected as the same protein in different species was compared against each other, and two different proteins in the same species. Therefore, it is no surprise that the same proteins have a high similarity score, even if they are from two different species.

**Table 1** An overview of A<sub>1</sub>R and potential off-targets

|        |Target 0|Target 1|Target 2|
|--------|:---|:---|:---|
|**Protein**|Adenosine receptor A<sub>1</sub>|Adenosine receptor A<sub>1</sub>|Adenosine receptor A<sub>3</sub>|
|**Accession number**|P30542|Q5RF57|P0DMS8|
|**Seq. length**|326 aa|326 aa|318 aa|
|**Identity**|-|99.1%|47.8%|
|**Species**|Human|Pongo abelli|Human|
|**Status**|UniProtKB reviewed|UniProtKB reviewed|UniProtKB reviewed|
|**Protein existence**|Evidence at protein level|Evidence at protein level|Evidence at protein level|
|**Mass**|36.5 kDa|36.5 kDa|36.2 kDa|


Next, the three proteins were aligned on their similarity, and the following physical properties: hydrophobicity, negativity, positivity, and aromaticity, as shown in *Figure 1*. The Sumatran orangutan A<sub>1</sub>R is more similar to the human A<sub>1</sub>R in the following fields: similarity, hydrophobicity, negativity, positivity and aromaticity. As the alignment shows, the A<sub>1</sub>R sumatran orangutan and human proteins have the same number of amino acids in their structure, while the human A<sub>3</sub>R has only 318 amino acids which contributes to its significant difference from both of the A<sub>1</sub>Rs.


![Image info](img/align_targets_uniprot.png)
**Figure 1** Alignment of targets. *Target 0, 1, and 2 aligned on their hydrophobicity, and the following physical properties: hydrophobicity, negativity, positivity, and aromaticity*.

### Related proteins (off-targets), based on structure

Besides sequence-based off-target search, a structure-based similarity search was performed through the RCSB Protein Data Bank. This was done based on the tertiary structure of the human A<sub>1</sub>R crystal structure. The most similar protein, the human serotonin 2A receptor (PDB ID: 7WC4), was then selected for alignment with the human A<sub>1</sub>R (PDB ID: 5UEN) using the iCn3D Structure Viewer. Alignment of the two protein complexes was done followed by realignment of the A-chains based on sequence alignment (Figure 2). The realignment root mean square deviations (RMSD) were 2.916 Å and 4.576 Å, respectively, suggesting general similarity between the two structures.

![Image info](img/alignment_ncbi.png)

**Figure 2** Alignment of the target protein and the structurally-similar off-target. *Alignment of protein complexes 5UEN and 7WC4, corresponding to the human A<sub>1</sub> and human serotonin 2AR respecively, and re-alignment of the A-chains based on sequence alignment. The realignment RMSDs are 2.916 Å and 4.576 Å, respectively.*



As previously mentioned, the human A<sub>1</sub>R was crystallized with the antagonistic ligand DU1. To design an antagonist with higher affinity, it is of importance to investigate the key interactions between DU1 and the receptor. To this end, the 3D structure of both entities was downloaded from the RCSB Protein Data Bank and examined on protein viewer software. Receptor-ligand binding was evaluated using an NGL Viewer and MolStar. Binding evaluation shows that DU1 binding occurs between transmembrane 6 and 2, and according to literature this is potentially positioned in the primary binding site (Figure 3).<sup>9</sup> The location of the binding site and DU1 orientation in the binding pocket are both of importance for the validation of the molecular docking model. Furthermore, key interactions between the ligand and target residues within 5 Å of the ligand were investigated with a protein-ligand interaction profiler (PLIP) as shown in *Figure 4*. Multiple key interactions were observed, namely, hydrophobic interactions, hydrogen bonds, and π-stacking. Notably, most interactions were hydrophobic in nature, with one protein-ligand interaction classified as π-stacking. The π-stacking occurs between an aromatic group of the receptor and an aromatic ring in the xanthine of the ligand. Regarding the hydrogen bonds, the carboxyl groups in DU1 are particularly important for these interactions.

![Image info](img/NGL_viewer_binding.png)

**Figure 3** Protein-ligand binding between DU1 and A<sub>1</sub>R. *3D structure of the A<sub>1</sub>R with its co-crystallized ligand DU1 including the residues within 5 Å of DU1.*

![Image info](img/PLIP_with_legend.png)

**Figure 4** Protein-ligand interactions between DU1 and A<sub>1</sub>R. *A profile-ligand interaction profiler was utilized to evaluate the interactions between DU1 and A<sub>1</sub>R. The ligand DU1 is presented in orange and the receptor residues in blue. Interactions include hydrophobic interactions, hydrogen bonds, and π-stacking.*

Subsequently, the ligand and target were separated to achieve the target protein with an unoccupied orthosteric binding site, using the coordinates from RCSB and BioPython. Prior to further evaluation and modelling, hydrogen atoms were added with LePro as the files retrieved from the RCSB PDB do not contain these atoms due to the limitations in the resolution of the experimental methods.

Furthermore, a 3D structure of the off-target 7WC4 with DU1 was generated using NGL viewer as shown in *Figure 5*. Sequence alignment by BioPython resulted in an alignment score of 209.0.



### Structure-activity relationships of DU1

Previous structure-activity relationship (SAR) studies have uncovered associations between the chemical structure of A<sub>1</sub>R antagonists, such as DU1, and their biological activity. For instance, x et al. suggests that high affinity A<sub>1</sub>R antagonists substantially increase the thermal and chemical stability of the receptor. In addition, key interactions between the ligand and receptor are mainly formed at the residues in transmembrane 1, 3, 6, and 7. Furthermore, the scaffold of DU172 containing a xanthine ring appears to be crucial for its high affinity to the target protein. Other A1R antagonists such as DPCPX and roxofylline have a similar scaffold, thus, further confirming the importance of the xanthine ring. Previous research by Glukhova et al. suggest that the cyclohexyl and propyl side chain of DU1 contributes to the specificity of DU1 to the A<sub>1</sub>R (*Figure 6*).<sup>9</sup> Thus, the incorporation of the xanthine ring, a propyl side chain, and a cycloalkyl group will be considered while establishing a library of ligand candidates.

![Image info](img/structure_DU1_paper.png)

**Figure 6** Structure of DU1. *DU1 contains a xanthine ring with three side chains. The cycloalkyl and the propyl group both contribute to the specificity of DU1 to the A<sub>1</sub>R. Image retrieved from Glukhova et al.<sup>9</sup>

Different databases were used in the search of similar compounds to DU1, namely, PubChem, ChEMBL, and Zinc. To this end, different identifiers were used as shown in *Table 2*. Through a similarity search on PubChem, 69 similar compounds to DU1 were retrieved. Lowering of tanimoto threshold to 80% resulted in retrieval of over 1000 similar compounds. Another similarity search on ChEMBL brought about 3 similar compounds with 90% similarity, and 2 similar compounds with 95% similarity. Lastly, a similarity search with tanimoto 70 found 10 hits. For each search, a representative molecule is shown in *Figure 7*. These results suggest that the similarity threshold has an impact on the number of similar compounds found. However, this was expected as there are less molecules that look very similar to each other compared to molecules that share the same scaffold or same side chains. Thus, the higher the similarity threshold, the lower the number of similar compounds is.

**Table 2** Overview of DU1 identifiers

| |Identifier|
|---|:---|
|**InChI key**|KAJVJPLKXGLLDA-UHFFFAOYSA-N|
|**CHEMBLID**|CHEMBL144360|
|**Canonical SMILES**|CCCN1C(=O)C2=C(N=C(N2)C3CCCCC3)N(C1=O)CCCNC(=O)C4=CC=C(C=C4)S(=O)(=O)F|

![Image info](img/similaritysearch.png)

**Figure 7** Representative molecule of similarity search. Similarity searches with varying similarity tresholds were performed in PubChem, ChEMBL, and Zinc.

Based on the SAR of DU1 and other A<sub>1</sub>R antagonists, a library of 69 ligand candidates was established for further investigation on their affinity to A<sub>1</sub>R through machine learning and molecular docking. These candidates all harbor the same scaffold but have a different combination of modifications on their side chains (R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub>) (*Figure 8*).

![Image info](img/scaffold.png)

**Figure 8** General structure of ligand candidates. *Ligand candidates all harbor the same scaffold with different modifications on their R<sub>1</sub>, R<sub>2</sub>, and R<sub>3</sub> side chains.*

### Machine learning to classify ligands

The development of a machine learning-based prediction model for ligand affinity to the A<sub>1</sub>R required the retrieval of an activity dataset on ChEMBL. This set contained 16163 datapoints. The dataset was classified into active and inactive compounds, classifying for active if the pChEMBL value of the compound is over the activity threshold 6.5. As the functional assay readout is not needed, it was dropped from the dataset resulting in 13010 remaining datapoints. The compounds present in this dataset are identified by their molecule ChEMBL ID and their SMILES. For the machine learning model, however, molecular fingerprints are used instead of SMILES to represent their molecular structure. Subsequently, a function was defined to generate fingerprints from SMILES using a default method of MACCS keys.

Several ML approaches were used to classify compounds, namely, random forest (RF), support vector machine (SVM), and artificial neural network (ANN). First, the predictive ability of the model was tested. The dataset was randomly split into a train and test set, corresponding to a data size of 10356 and 2589, respectively. A function was defined which fits a model on this random train-test data split, and calculates measures as accuracy, sensitivity, specificity, and AUC. In addition, receiver operating characteristic (ROC) curves were plotted to assess the performance of these classification models (*Figure 9*). All models generated an ROC curve that hugs the top left corner of the plot indicating a high level of true positive rate over false positive rate. Besides this, the area under the curve (AUC) of all three plots amounts to values close to 1, implying that the models can make accurate predictions. Subsequently, RF and ANN models were cross-validated in 3 folds using Morgan fingerprint, and statistics were measured. Mean accuracy, specificity, and AUC are al above 0.85 with low standard deviation. However, the sensitivity is relatively low (*Figure 10*).

![Image info](img/ROC_curves.png)

**Figure 9** Plotted ROC curves. *The ROC curves of the FR, SVM, and ANN models were plotted, and the AUC was measured.*

![Image info](img/crossvalidation.png)

**Figure 10** Cross-validation of RF and ANN model. *Cross-validation was done in 3 folds. Mean accuracy, sensitivity, specificity, and AUC were calculated.*

For the affinity prediction, a trained RF regressor model was used. Two measures of accuracy were generated, namely, the mean absolute error (MAE) and the root mean square error (RMSE). These were 0.49 (std: 0.01) and 0.69 (std: 0.01), respectively. Generally, an MAE or an RMSE below 0.6 suggests that the model’s predictions are of high accuracy. The affinity of DU1 to the A<sub>1</sub>R was predicted using this model, corresponding to a pChEMBL of 7.34. In addition this, the affinity of DU1 to A<sub>1</sub> was also predicted using ICM-Pro. Through this software, a value of -28.15 was predicted after redocking. Regarding the candidates library, ligand candidates with a higher predicted value (pChEMBL > 8.0) will be considered for molecular docking. Affinity to A<sub>1</sub>R was predicted by the RF regression model for a total of 69 candidates. After consideration of the affinity treshold (pChEMBL > 8), 14 candidates remained for molecular docking.

### Molecular docking of ligand candidates to the A<sub>1</sub>R

For the molecular docking experiments, the docking engine AutoDock Vina was used. This software can predict the preferred orientation of the ligand within the binding site. First, the target protein and ligand candidates had to be prepared. Subsequently, a box size was defined in which the ligand is likely to interact with the receptor, using the radius of gyration, as well as the center of geometry (COG) of the ligand. The radius of gyration was calculated to be 4.655, and the coordinates that represented the COG were x=-20.174, y=8.76, and z=16.973. After docking the ligand to the receptor, the affinity is predicted by the docking software. This value is then converted into a pChEMBL value. Besides the affinity score, 3D visualization of the docking is generated too. Notably, the pChEMBL values from the ML and the docking did not show a trend as candidates with a pChEMBL higher than that of DU1 through ML did not always have a higher pChEMBL than DU1 in the molecular docking. For this reason, an average was calculated between both pChEMBLs to compare the 14 candidates. Ligand_59, ligand_65, and ligand_63 had the highest average scores (Figure 11). In addition, ligand_59 and ligand_65 had a higher pChEMBL value than DU1 for both computational methods separately. Notably, all three leads had similar modifications in their side chains, namely, a cyclopentyl group in R<sub>1</sub>, and a double bound in their R<sub>3</sub>-group (*Figure 12*). Remarkably, docking of DU1 in the receptor shows a different binding orientation than the reference ligand, which is also DU1 (*Figure 13A*). Furthermore, docking of the most promising lead, ligand_59, shows that the scaffold of the ligand binds the receptor in a different way than DU1 (*Figure 13B*).

![Image info](img/prediction_score.png)

**Figure 11** Overview of predicted affinity scores. *Through ML and molecular docking simulations, affinity scores were predicted of DU1 and newly designed ligand candidates. The 3 most promising candidates were ligand_59, ligand_63, and ligand_65. DU1 is presented as ligand_00.*

![Image info](img/modifications.png)

**Figure 12** Structure of DU1 and ligand candidates. *The candidates with the highest average pChEMBL values showed similar modifications, namely, a cyclopentyl group, and a double bond in the side chain.*

![Image info](img/moleculardocking.png)

**Figure 13** Molecular docking of DU1 and ligand_59. *A 3D visualization of the molecular docking simulation of (A) DU1 and (B) promising lead ligand_59. Reference ligand is DU1 presented in magenta, while the docked ligand is presented in cyan.*

### Discussion and Conclusion

In this project, we have investigated the A<sub>1</sub>R and its interaction to its co-crystallized ligand DU1, designed novel ligands with a potentially higher affinity to the receptor, and used machine learning methods and molecular docking to predict the affinity of these ligands on the A<sub>1</sub>R.

Notably, different affinity scores were obtained from Autodock Vina and the RF regression model. One possible reason for this can be the fact that both docking software use a different algorithm to calculate the strength of the binding between the ligand and receptor. Therefore, another docking software can be used that uses a similar algorithm as Autodock Vina. This can also be used as a validation of the docking scores. In addition, the redocking of DU1 to A<sub>1</sub>R resulted in a different binding orientation than expected. This could be due to AutoVina dock not being an optimal method to dock this ligand to the receptor. For this reason, optimization of this model should be done, or other software should be considered.

Regarding the molecular structure of DU1 and the three promising candidates (59, 65, and 63), two important modifications were observed. One of these was a cyclopentyl group instead of a cyclohexyl on the R<sub>1</sub> side chain. Previous literature states that the cyclohexyl and propyl group both are of importance for the selectivity of DU1 to the A<sub>1</sub>R. For this reason, further investigation is needed for the selectivity of the novel ligands on the A<sub>1</sub>R to touch upon the possible off-target effects.

Moreover, it is known that most of the proteins in the human body are glycosylated, which implies an addition of a glycan. It is known that the A1R is also glycosylated at three sites at the N-terminal. Those glycans can have an effect on the binding of the ligands, therefore the novel designed ligands, but also DU1, needs to be docked on the glycosylated structure of the A<sub>1</sub>R to have a more relevant predictive value.12,13

In conclusion, the affinity of DU1 and novel candidate ligands were predicted using a machine learning method and molecular docking. The cyclopentyl modifications in the R<sub>1</sub>-group, and the double bond in the R<sub>3</sub>-group were most important for a higher affinity to A<sub>1</sub>R than DU1. However, further optimization to the models and investigation on glycosylated proteins has to be done to achieve a more relevant affinity score. The novel ligand (ligand_59) can be produced and tested in in vitro studies for the treatment of diseases in the future.

