<a href="https://colab.research.google.com/github/glevans/7ADD-workshop-2024/blob/main/Example_1_of_structures_available.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**<H1>Structures available</H1>**
**Using PDBe-KB & 3D-Beacons to gain understanding on genetic variants**
<img src="https://www.ebi.ac.uk/pdbe/docs_dev/logos/images/RGB/PDBe-logo-RGB_2013.png" height="300" align="right">

#Welcome to this notebook!

**To use this notebook:**
* you will need to have a Google account
* you will need to be logged-in to your Google account

<br>

**To access this notebook:**

(1) From Github repository (https://github.com/PDBeurope/pdbe-notebooks/)
, click on link at top of the page when viewing the file in Github

<br>

(2) Visit the Colaboratory page:<br>
https://colab.research.google.com/<br>
and access the following Github repository *via* the interface:<br>
https://github.com/PDBeurope/pdbe-notebooks/

--or--

upload the file into the Colaboratory interface

(for example, after downloading from:
https://github.com/PDBeurope/pdbe-notebooks/)



---

## How to use this notebook
1. To run a code cell, click on the cell to select it. You will notice a play button (▶️) on the left side of the cell. Click on the play button or **press Shift+Enter** to run the code in the selected cell.
2. The code will start executing, and you will see the output, if any, displayed below the code cell.
3. Move to the next code cell and repeat steps 2 and 3 until you have executed all the desired code cells in sequence.
4. The currently running step is indicated by a circle with a stop sign next to it.
If you need to stop or interrupt the execution of a code cell, you can click on the stop button (■) located next to the play button.

*Remember to run the code cells in the correct order, as their execution might depend on variables or functions defined in previous cells. You can modify the code in a code cell and re-run it to see updated results.*

<br>


---

## Contact us

If you experience any bugs please contact pdbehelp@ebi.ac.uk and put "Help with" and the title of the notebook in the subject line of the message.


---
# Why use this notebook

<img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/API_image.png?raw=true" height="300" align="right">

This interactive Python notebook demonstrates accessing **Protein Data Bank (PDB)**,

using some of the tools developed by the **Protein Data Bank in Europe (PDBe)**:

<br>

- [PDBe Knowledge Base -> pdbe-kb.org](https://www.ebi.ac.uk/pdbe/pdbe-kb/)

- [3D Beacons -> 3d-beacons.org](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/)

<br>

<p>An API (Application Programming Interface) is a programmatic way to obtain information.</p>

<p>Virtually every webpage you visit is using one or more APIs.</p>

<p>We will be tapping into the potential of APIs to more quickly access and assess information.</p>

<p> </p>

---
*We will briefly cover:*
### **3D-Beacons**

The main purpose of 3D-Beacons is to provide programmatic access from one web location to BOTH:
* **predicted / theoretical structures** (from AlphaFold Database & more)
* **experimentally-determined structures** (from Protein Data Bank & more)


<br>

Some example predicted / theoretical model providers:

* AlphaFold DB (AFDB)
* AlphaFill
* ModelArchive
* Protein Ensemble Database (PRD)†
* SWISS-MODEL

Experimentally-determined model providers:

* Protein Data Bank (PDB)
* Small angle scattering Biological Data Bank (SASBDB)

*† Some models have some experimental evidence, others were generated without explicit experimental data. For this data bank each id corresponds to multiple models of the same protein or protein complex and has been developed to capture insight on intrinsically disordered proteins.*

<br>

**MORE INFORMATION:**

Comprehensive list is available here of current [data providers](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/guidelines)

A link to main webpage:
[3d-beacons.org](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/)
<br>

For direct link to 3D-Beacons API interface, visit [3D-Beacons API page](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/)

If you use this resource, please cite:
<br>
[M. Varadi, S. Nair, I. Sillitoe et al. 3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources. GigaScience (2022).](https://doi.org/10.1093/gigascience/giac118)

---
*We will briefly cover:*
### **Protein Data Bank in Europe Knowledge Base (PDBe-KB)**

**PDBe-KB** provides more information and analysis specifically for experimentally-determined structures deposited in the **Protein Data Bank**.

**3D Beacons** is being used <i>'behind-the-scenes'</i> on the **PDBe-KB** pages to enable predicted macromolecule models from AlphaFoldDB to be displayed alongside experimental models.

**PDBe-KB** provides information on 3D macromolecular structures in the context of other macromolecular structures.

With **PDBe-KB** webpages and its APIs, the following information can be accessed:

* Identify PDB ids (aka multiple structures) known to contain related data due to one or more protein chains being associated with the same UniProt ID.
* Insight on compounds & residues from considering related structures in the context with each other.
* Mappings to CATH, EC, GO, InterPro, Pfam, SCOP, etc
* and more...


**MORE INFORMATION:**

A link to main webpage:
[pdbe-kb.org](https://www.ebi.ac.uk/pdbe/pdbe-kb/)
<br>

For direct link to PDBe-KB's Aggregated API, visit [PDBe-KB Aggregated API page](https://www.ebi.ac.uk/pdbe/graph-api/pdbe_doc/)
<br>

If you use this resource, please cite:
<br>
[PDBe-KB consortium, PDBe-KB: collaboratively defining the biological context of structural data , Nucleic Acids Research, Database Issue (2022).](https://doi.org/10.1093/nar/gkab988)

# 1)  Setting up

In [None]:
from os import ST_APPEND
#@title Press shift+enter to start

# @markdown This section installs libraries and sets up Python code to extract information from PDBe-KB's APIs.
# @markdown <br>
# @markdown <br>
# @markdown After running this cell, this Notebook will be able to connect 3D-Beacons and PDBe-KB's APIs.
# @markdown <br>
# @markdown It will create summary tables for some information about the structures associated with a UniProt ID.
# @markdown <br>
# @markdown <br>
# @markdown The information listed below can also be found either on PDBe-KB pages and/or utilizing 3D Beacons.
# @markdown <br>


######## LIBRARIES

import requests, sys, json
import pandas as pd
from pprint import pprint

print("Succesfully installed!")

####### 3D BEACONS FUNCTIONS
#defining functions for search and summary

def _search_3DBeacons(UniProt_ID):
    try:
        api_url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/v2/uniprot/summary/{UniProt_ID}.json"
        response = requests.get(api_url)
        response.raise_for_status()
        return response.json().get("structures", [])
    except requests.exceptions.RequestException as err:
        print("Double check if the UniProt id is correct.")
        print()
        raise Exception(f"Error fetching data: {err}")

def _create_structure_dataframe(structures):
    if structures:
        details_list = [
            [
                s["summary"]["model_identifier"],
                s["summary"]["model_category"],
                s["summary"]["provider"],
                s["summary"]["uniprot_start"],
                s["summary"]["uniprot_end"],
                s["summary"]["coverage"],
            ]
            for s in structures
        ]
        columns = ["ID", "Type", "Provider", "UniProt start", "UniProt end","Coverage"]
        df = pd.DataFrame(details_list, columns=columns)
        df["Coverage as %"] = df["Coverage"] * 100
        df["Coverage"] = df["Coverage"].round(5)
        df["Coverage as %"] = df["Coverage as %"].round(2)
        return df
    else:
        raise Exception("No structures found for the given UniProt ID.")

def all_structure_data(UniProt_ID):
    try:
        structures = _search_3DBeacons(UniProt_ID)
        return _create_structure_dataframe(structures)
    except Exception as e:
        return f"Error: {e}"

def displaying_3dbeacon_summary(UniProt_ID):
    df_3D_beacons = all_structure_data(UniProt_ID)
    if type(df_3D_beacons) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif df_3D_beacons is not None:
        num_rows = df_3D_beacons.shape[0]
        print(f"There are {num_rows} structures available associated with UniProt ID {UniProt_ID}")
        print()
        return df_3D_beacons.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

def displaying_3dbeacon_pred_summary(UniProt_ID):
    df_3D_beacons = all_structure_data(UniProt_ID)
    if type(df_3D_beacons) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif df_3D_beacons is not None:
        df_3D_beacons2 = df_3D_beacons.drop(df_3D_beacons[df_3D_beacons['Type'] == 'EXPERIMENTALLY DETERMINED'].index)
        df_3D_beacons3 = df_3D_beacons2.drop(df_3D_beacons[df_3D_beacons['Type'] == 'CONFORMATIONAL ENSEMBLE'].index)
        df_3D_beacons4 = df_3D_beacons3.reset_index(drop=True)
        num_rows = df_3D_beacons4.shape[0]
        print(f"There are {num_rows} different predicted structures, generated using template-based or Ab initio methods, without explicit experimental data, associated with UniProt ID {UniProt_ID}")
        print()
        return df_3D_beacons4.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

def displaying_3dbeacon_exp_summary(UniProt_ID):
    df_3D_beacons = all_structure_data(UniProt_ID)
    if type(df_3D_beacons) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif df_3D_beacons is not None:
        df_3D_beacons2 = df_3D_beacons.drop(df_3D_beacons[df_3D_beacons['Type'] == 'TEMPLATE-BASED'].index)
        df_3D_beacons3 = df_3D_beacons2.drop(df_3D_beacons[df_3D_beacons['Type'] == 'CONFORMATIONAL ENSEMBLE'].index)
        df_3D_beacons4 = df_3D_beacons3.drop(df_3D_beacons[df_3D_beacons['Type'] == 'AB-INITIO'].index)
        df_3D_beacons5 = df_3D_beacons4.reset_index(drop=True)
        num_rows = df_3D_beacons5.shape[0]
        print(f"There are {num_rows} different structures, where experimental data was used for the structure, associated with UniProt ID {UniProt_ID}")
        print()
        return df_3D_beacons5.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

def displaying_3dbeacon_pdb_summary(UniProt_ID):
    df_3D_beacons = all_structure_data(UniProt_ID)
    if type(df_3D_beacons) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif df_3D_beacons is not None:
        df_3D_beacons2 = df_3D_beacons[df_3D_beacons['Provider'] == 'PDBe']
        df_3D_beacons3 = df_3D_beacons2.reset_index(drop=True)
        num_rows = df_3D_beacons3.shape[0]
        print(f"There are {num_rows} different structures, from the Protein Data Bank (PDB), associated with UniProt ID {UniProt_ID}.")
        print()
        return df_3D_beacons3.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

def displaying_3dbeacon_pdb_filtered(UniProt_ID,number_value):
    df_3D_beacons = all_structure_data(UniProt_ID)
    if type(df_3D_beacons) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif df_3D_beacons is not None:
        df_3D_beacons2 = df_3D_beacons[df_3D_beacons['Provider'] == 'PDBe']
        df_3D_beacons3 = df_3D_beacons2.reset_index(drop=True)
        df_3D_beacons3['InRange'] = (df_3D_beacons3['UniProt start'] < number_value) & (number_value < df_3D_beacons3['UniProt end'])
        df_3D_beacons4 = df_3D_beacons3[df_3D_beacons3['InRange'] == True]
        num_rows = df_3D_beacons4.shape[0]
        print(f"There are {num_rows} different structures, from the Protein Data Bank (PDB) mapped to UniProt ID {UniProt_ID}, with {number_value}.")
        print()
        return df_3D_beacons4.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

####### 3D BEACONS FUNCTIONS
#defining functions for reporting urls

def links_for_structures(UniProt_ID):
    try:
        structures = _search_3DBeacons(UniProt_ID)
        print(f"There are {len(structures)} listed associated with UniProt id {UniProt_ID}")
        list_of_lists =[]
        for index in range(0, len(structures)):
            structure_details = structures[index]['summary']
            model_identifier = structure_details['model_identifier']
            model_catergory = structure_details['model_category']
            provider = structure_details['provider']
            model_url = structure_details['model_url']
            model_format = structure_details['model_format']
            label = f"Click on url to download the coordinate file - {model_format} format:"
            details_list = {model_identifier : [label, model_url]}
            print()
            pprint(details_list)
            print()
    except Exception as e:
        return f"Error: {e}"

def display_urls_for_structures(UniProt_ID):
    try:
        link_list = links_for_structures(UniProt_ID)
    except Exception as e:
        return f"An error occurred: {e}"

######## PDBe-KB FUNCTIONS
#defining functions for search and summary for variants

def api_structure_PDBeKB(UniProt_ID):
    try:
        requestURL = f"https://www.ebi.ac.uk/pdbe/graph-api/uniprot/unipdb/{UniProt_ID}"
        response = requests.get(requestURL)
        response.raise_for_status()
        return json.loads(response.text)
    except requests.exceptions.RequestException as err:
        print(f"Request Error: {err}")
        return None

def _parse_variant_data(structure_data, UniProt_ID):

    if type(structure_data) == dict:
        structure_details = structure_data[UniProt_ID]['data']
        amount = len(structure_details)
        UniProt_ID = UniProt_ID
        UniProt_summary = []
        UniProt_summary_list = []
        row_list = []

        for index in range(0, amount): # PDB ids

            pdb_id = structure_details[index]['accession']
            accession_item = structure_details[index]

            modified_residues = _modified_list(accession_item['residues'])
            mutated_residues = _mutated_list(accession_item['residues'])

            if modified_residues == None and mutated_residues == None:
                pass
            elif modified_residues == None and mutated_residues != None:
                row_list = []
                for index in range(0, len(mutated_residues)):
                    mutated_residues[index].update({"PDB_ID" : pdb_id})
                row_list.append(mutated_residues)
            elif modified_residues != None and mutated_residues == None:
                row_list = []
                for index in range(0, len(modified_residues)):
                    modified_residues[index].update({"PDB_ID" : pdb_id})
                row_list.append(modified_residues)
            elif modified_residues != None and mutated_residues != None:
                row_list = []
                for index in range(0, len(modified_residues)):
                    modified_residues[index].update({"PDB_ID" : pdb_id})
                row_list.append(modified_residues)
                for index in range(0, len(mutated_residues)):
                    mutated_residues[index].update({"PDB_ID" : pdb_id})
                row_list.append(mutated_residues)
            UniProt_summary_list.extend(row_list)

        for number in range(0,len(UniProt_summary_list)):
            UniProt_summary += UniProt_summary_list[number]
        return UniProt_summary

def _modified_list(residues):
    modified_list = []  # Initialize the list outside of the loop
    for residue_info in residues:
        if 'modification' in residue_info:
            modified = _modification_info(residue_info)
            modified_list.append(modified)  # Append the mutation to the list
    if modified_list == []:
        return None
    else:
        return modified_list  # Return the list

def _mutated_list(residues):
    mutated_list = []  # Initialize the list outside of the loop
    for residue_info in residues:
        if 'mutation' in residue_info:
            mutation = _mutation_info(residue_info)
            mutated_list.append(mutation)  # Append the mutation to the list
    if mutated_list == []:
        return None
    else:
        return mutated_list  # Return the list

def _mutation_info(residue_info):
    if residue_info['startIndex'] == residue_info['endIndex']:
        residue_position = residue_info['startIndex']
        UniProt_amino_acid = residue_info['startCode']
        coordinate_amino_acid = residue_info['pdbCode']
        change = residue_info['mutationType']
        mutated_residue = {'change': change, 'position' : residue_position, 'from_aa' : UniProt_amino_acid, 'to_aa' : coordinate_amino_acid}
        return mutated_residue

def _modification_info(residue_info):
    if residue_info['startIndex'] == residue_info['endIndex']:
        residue_position = residue_info['startIndex']
        UniProt_amino_acid = residue_info['startCode']
        coordinate_amino_acid = residue_info['pdbCode']
        change = "Modification"
        modified_residue = {'change': change, 'position' : residue_position, 'from_aa' : UniProt_amino_acid, 'to_aa' : coordinate_amino_acid}
        return modified_residue

def variant_summary(UniProt_ID):
    try:
        structure_data = api_structure_PDBeKB(UniProt_ID)
        UniProt_info = _parse_variant_data(structure_data, UniProt_ID)
        return UniProt_info
    except Exception as e:
        print("Double check if the UniProt id is correct.")
        print()
        return f"Error: {e}"

def displaying_variant_summary(UniProt_ID):
    list_of_dict = variant_summary(UniProt_ID)
    if type(list_of_dict) == str:
        print("Something went wrong. Check the UniProt ID.")
    elif list_of_dict== []:
        print("Something went wrong. Check the UniProt ID.")
    elif list_of_dict is not None:
        print(f"This is the output from entering the UniProt ID {UniProt_ID}.")
        df_variants = pd.DataFrame.from_dict(list_of_dict)
        df_variants2 = df_variants.sort_values(['position'], na_position='first', ascending=True, key=pd.to_numeric)
        df_variants3 = df_variants2.drop_duplicates() # Remove duplicate instance of same conflicts for same PDB ids
        df_variants4 = df_variants3.reset_index()
        df_variants5 = df_variants4.drop(['index'], axis=1)
        num_rows = df_variants5.shape[0]
        print()
        return df_variants5.head(num_rows)
    else:
        print("Something went wrong. Check the UniProt ID.")

######## PDBe-KB FUNCTIONS
#defining functions for search and summary

def parse_general_structure_data(structure_data, UniProt_ID):
    structure_details = structure_data[UniProt_ID]['data']

    accession_id_list = []
    UniProt_summary = {}
    print(f"There are {len(structure_details)} experimentally-determined structures associated with UniProt id {UniProt_ID}.")
    print()

    for number in range(0,len(structure_details)):
        accession_item = structure_details[number]
        accession_id = accession_item['accession']
        accession_id_list.append(accession_id)

        additionalData = accession_item['additionalData']
        resolution = additionalData['resolution']
        resolution = round(resolution,3)
        experiment = additionalData['experiment']

        if experiment == 'X-ray diffraction':
            concatenated_list = [experiment,resolution]
            UniProt_row = {accession_item['accession'] : concatenated_list}
        elif experiment == 'Electron Microscopy':
            concatenated_list = [experiment,resolution]
            UniProt_row = {accession_item['accession'] : concatenated_list}
        elif experiment == 'Solution NMR':
            concatenated_list = [experiment,None]
            UniProt_row = {accession_item['accession'] : concatenated_list}
        else:
            concatenated_list = [experiment,resolution]
            UniProt_row = {accession_item['accession'] : concatenated_list}
        UniProt_summary.update(UniProt_row)
    #print(UniProt_summary)
    return UniProt_summary

def general_summary(UniProt_ID):
  try:
    structure_data = api_structure_PDBeKB(UniProt_ID)
    UniProt_info = parse_general_structure_data(structure_data, UniProt_ID)
    return(UniProt_info)
  except Exception as e:
    print("Double check if the UniProt id is correct.")
    print()
    return f"Error: {e}"

def display_exp_structure_summary(UniProt_ID):
    try:
        data = general_summary(UniProt_ID)
        if type(data) == str:
            print()
            print("Something has gone wrong. Double-check if the UniProt ID is okay.")
        else:
            df1 = pd.DataFrame.from_dict(data, orient="index", columns=['Experimental Method', 'Resolution (in Angstroms)'])
            df2 = df1.reset_index()
            df3 = df2.rename(columns = {'index':'PDB_ID'})
            num_rows = df3.shape[0]
            return df3.head(num_rows)
    except Exception as e:
        print(f"An error occurred: {e}")

def display_exp_structure_summary2(UniProt_ID,resolution):
    try:
        data = general_summary(UniProt_ID)
        if type(data) == str:
            print()
            print("Something has gone wrong. Double-check if the UniProt ID is okay.")
        elif resolution == str:
            print()
            print("Something has gone wrong. Make sure resolution is given as a number with or without decimals.")
        else:
            df1 = pd.DataFrame.from_dict(data, orient="index", columns=['Experimental Method', 'Resolution (in Angstroms)'])
            df2 = df1[df1['Resolution (in Angstroms)'] <= resolution]
            df3 = df2.reset_index()
            df4 = df3.rename(columns = {'index':'PDB_ID'})
            num_rows = df4.shape[0]
            print(f"There are {num_rows} experimentally-determined structures, with resolution of {resolution} Angstrom or better, associated with UniProt id {UniProt_ID}.")
            return df4.head(num_rows)
    except Exception as e:
        print(f"An error occurred: {e}")

def comma_pdb_id_list(UniProt_ID):
    try:
        data = general_summary(UniProt_ID)
        if type(data) == str:
            print()
            print("Something has gone wrong. Double-check if the UniProt ID is okay.")
        else:
            df1 = pd.DataFrame.from_dict(data, orient="index", columns=['Experimental Method', 'Resolution (in Angstroms)'])
            index_as_list = df1.index.tolist()
            print(', '.join(map(str, index_as_list)))
    except Exception as e:
        print(f"An error occurred: {e}")

def comma_pdb_id_list2(UniProt_ID, resolution):
    try:
        data = general_summary(UniProt_ID)
        if type(data) == str:
            print()
            print("Something has gone wrong. Double-check if the UniProt ID is okay.")
        else:
            df1 = pd.DataFrame.from_dict(data, orient="index", columns=['Experimental Method', 'Resolution (in Angstroms)'])
            df2 = df1[df1['Resolution (in Angstroms)'] <= resolution]
            num_rows = df2.shape[0]
            print(f"There are {num_rows} experimentally-determined structures, with resolution of {resolution} Angstrom or better, associated with UniProt id {UniProt_ID}.")
            index_as_list = df2.index.tolist()
            print(', '.join(map(str, index_as_list)))
    except Exception as e:
        print(f"An error occurred: {e}")

######## PDBe-KB FUNCTION
#generate comma-delimited list of nonmodified PDB IDs

def nonmodified_PDB_IDs(UniProt_ID):
    general = general_summary(UniProt_ID)
    variant = variant_summary(UniProt_ID)

    PDB_ID_list = []
    for item in range (0, len(variant)):
        PDBid = variant[item]["PDB_ID"]
        PDB_ID_list.append(PDBid)
    PDB_IDs_with_variant_list = list(dict.fromkeys(PDB_ID_list))
    all_PDB_ID = list(general.keys())

    s = set(PDB_IDs_with_variant_list)
    PDB_IDs_without_variant_list = [x for x in all_PDB_ID if x not in s]
    print(f"Of these there are {len(PDB_IDs_without_variant_list)} experimentally-determined structures without any modifications or discrepancies with UniProt id {UniProt_ID}.")

    print()

    PDB_ID_list = []
    for item in range (0, len(PDB_IDs_without_variant_list)):
        PDBid = PDB_IDs_without_variant_list[item]
        PDB_ID_list.append(PDBid)
    str_PDB_ID_list = [str(item) for item in PDB_ID_list]
    delimiter_comma = ", "
    print(delimiter_comma.join(str_PDB_ID_list))

######## GENERAL FUNCTION
#converting a json into a comma-delimited list

def getting_list_of_ids(load):
    ID_list = []
    if type(load) == int or type(load) == float:
        print("Something went wrong. Check input is JSON.")
    elif load == None:
        print("Something went wrong. Check input is JSON.")
    elif "PDB_ID" in list(load[0].keys()):
        for item in range (0, len(load)):
            PDBid = load[item]["PDB_ID"]
            ID_list.append(PDBid)
        str_PDB_ID_list = [str(item) for item in ID_list]
        delimiter_comma = ", "
        print(delimiter_comma.join(str_PDB_ID_list))
    elif "ID" in list(load[0].keys()):
        for item in range (0, len(load)):
            id = load[item]["ID"]
            ID_list.append(id)
        str_ID_list = [str(item) for item in ID_list]
        delimiter_comma = ", "
        print(delimiter_comma.join(str_ID_list))
    else:
      print(f"Examine JSON. Should be from table with column header of either 'ID' or 'PDB_ID'.")

Succesfully installed!


# 2)  Structures available using **3D-Beacons**
This section reports on predicted and experimentally-determined structures structures for a UniProt ID that are available *via* **3D-Beacons**.

---
These structures/models have a *'type'* assigned such as:

* EXPERIMENTALLY DETERMINED
* CONFORMATIONAL ENSEMBLE
* TEMPLATE-BASED
* AB-INITIO

If the *'Type'* is **NOT** 'EXPERIMENTALLY DETERMINED' or 'CONFORMATIONAL ENSEMBLE' then it is a predicted structure.

If the *'Type'* is 'CONFORMATIONAL ENSEMBLE' then each result is a set of structures that may or may not have explicit experimental evidence.

In [None]:
#@title 2.2. Get predicted / theoretical structural data only
#@markdown  In this sub-section you will be able to retrieve predicted / theoretical structures associated with an Uniprot accession ID.
#@markdown  <br>
#@markdown  <br>
#@markdown  <br>


#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and load result into a DataFrame
displaying_3dbeacon_pred_summary(UniProt_ID)

There are 8 different predicted structures, generated using template-based or Ab initio methods, without explicit experimental data, associated with UniProt ID P49768



Unnamed: 0,ID,Type,Provider,UniProt start,UniProt end,Coverage,Coverage as %
0,P49768_75-467:6lqg.1.B,TEMPLATE-BASED,SWISS-MODEL,75,467,0.842,84.2
1,AF-P49768-F1,AB-INITIO,AlphaFold DB,1,467,1.0,100.0
2,P49768,TEMPLATE-BASED,AlphaFill,1,467,1.0,100.0
3,CHS.15884.1,AB-INITIO,isoform.io,1,467,1.0,100.0
4,CHS.15884.10,AB-INITIO,isoform.io,1,467,1.0,100.0
5,CHS.15884.13,AB-INITIO,isoform.io,1,467,1.0,100.0
6,CHS.15884.15,AB-INITIO,isoform.io,1,467,1.0,100.0
7,CHS.15884.4,AB-INITIO,isoform.io,1,467,1.0,100.0


In [None]:
#@title 2.3. Get experimentally-determined structural data associated with a UniProt ID
#@markdown  In this sub-section you will be able to retrieve experimentally determined data (including the wwPDB and a curated repository of small angle scattering data and models) associated with an Uniprot accession ID.
#@markdown  <br>
#@markdown  <br>
#@markdown  <br>


#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and load result into a DataFrame
displaying_3dbeacon_exp_summary(UniProt_ID)

There are 26 different structures, where experimental data was used for the structure, associated with UniProt ID P49768



Unnamed: 0,ID,Type,Provider,UniProt start,UniProt end,Coverage,Coverage as %
0,8kcs,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
1,8kct,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
2,7d8x,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
3,6iyc,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
4,8k8e,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
5,8kcu,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
6,6idf,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
7,8kco,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
8,8x52,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
9,8x54,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0


In [None]:
#@title 2.1.  Get predicted structures, as well as experimentally determined structures associated with a UniProt ID
#@markdown  In this sub-section you will be able to retrieve BOTH predicted structures, as well as experimentally determined structures associated with an Uniprot accession ID.
#@markdown  <br>
#@markdown  <br>
#@markdown  With 3D-Beacons there are options to search based on sequence and Ensembl id as well.
#@markdown  <br>
#@markdown  <br>

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and load result into a DataFrame
displaying_3dbeacon_summary(UniProt_ID)


There are 34 structures available associated with UniProt ID P49768



Unnamed: 0,ID,Type,Provider,UniProt start,UniProt end,Coverage,Coverage as %
0,8kcs,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
1,8kct,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
2,7d8x,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
3,6iyc,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
4,8k8e,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
5,8kcu,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
6,6idf,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
7,8kco,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
8,8x52,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
9,8x54,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0


In [None]:
#@title 2.4. Get experimentally-determined structural data from Protein Data Bank
#@markdown  In this sub-section you will be able to retrieve experimentally determined data associated with an Uniprot accession ID for the Protein Data Bank only (aka excluding structures from small angle scattering, etc).
#@markdown  <br>
#@markdown  <br>

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and load result into a DataFrame
displaying_3dbeacon_pdb_summary(UniProt_ID)

There are 26 different structures, from the Protein Data Bank (PDB), associated with UniProt ID P49768.



Unnamed: 0,ID,Type,Provider,UniProt start,UniProt end,Coverage,Coverage as %
0,8kcs,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
1,8kct,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
2,7d8x,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
3,6iyc,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
4,8k8e,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
5,8kcu,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
6,6idf,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
7,8kco,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
8,8x52,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0
9,8x54,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0


In [None]:
#@title 2.5.  Get clickable links to download coordinates
#@markdown  In this sub-section you will be able to retrieve urls for predicted and experimentally-determined structures associated with a UniProt ID.

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and gets urls and display the urls
display_urls_for_structures(UniProt_ID)


There are 34 listed associated with UniProt id P49768

{'8kcs': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/8kcs_updated.cif']}


{'8kct': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/8kct_updated.cif']}


{'7d8x': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/7d8x_updated.cif']}


{'6iyc': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/6iyc_updated.cif']}


{'8k8e': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/8k8e_updated.cif']}


{'8kcu': ['Click on url to download the coordinate file - MMCIF format:',
          'https://www.ebi.ac.uk/pdbe/static/entry/8kcu_updated.cif']}


{'6idf': ['Click on url to download the coordinate file - MMCIF

#3)  Structures available using **PDBe-KB**

This section reports on the PDB ids for the experimentally-determined structures where at least one of the protein chains in the structure is associated with a UniProt ID.

The identity of these structures as well as more analysis of structures are available *via* **PDBe-KB**.

---
*More details:*
<p>The experimentaly-determined structures/data available <i>via</i> <strong>PDBe-KB</strong> are associated with UniProt IDs are due to both manual and semi-automated curation.</p>
<p>That is a scientific curator is checking sequence alignments, with a feedback loop with the scientist(s) depositing the data confirming the right UniProt IDs are assigned before the data is made public.</p>
<p>If there are structures where there was a lot of engineered mutations or where two or more proteins were engineered to make a new polymer chain to gain scientific insight -- these may not be as findable with certain sequence-based search approaches but will be found listed on <strong>PDBe-KB</strong> pages.</p>

<p> </p>

<p>If the appropriate UniProt IDs are not available at the time of publishing, then after the structures are published, and there is a semi-automated process to assign unmapped structures to the appropriate UniProt ID. There is a synching process that means ultimately all structures associated with a UniProt ID end up listed on both <strong>PDBe-KB</strong> and UniProt pages</p>
<p>
This mapping between protein structure and UniProt IDs is part of a process called:</p>

[Structure Integration with Function, Taxonomy and Sequence (SIFTS)](https://www.ebi.ac.uk/pdbe/docs/sifts/).

In [None]:
#@title 3.1. Get experimentally-determined coordinates associated with a UniProt ID
#@markdown  In this sub-section you will be able to retrieve experimentally-determined coordinates where based on sequence and reported taxonomy there is one or more chains associated with the same specified Uniprot accession ID.

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and load result into a DataFrame
display_exp_structure_summary(UniProt_ID)

There are 26 experimentally-determined structures associated with UniProt id P49768.



Unnamed: 0,PDB_ID,Experimental Method,Resolution (in Angstroms)
0,7d8x,Electron Microscopy,2.6
1,6lr4,Electron Microscopy,3.0
2,8kct,Electron Microscopy,2.6
3,7c9i,Electron Microscopy,3.1
4,7y5t,Electron Microscopy,2.9
5,6lqg,Electron Microscopy,3.1
6,8kcp,Electron Microscopy,3.0
7,6iyc,Electron Microscopy,2.6
8,6idf,Electron Microscopy,2.7
9,8x54,Electron Microscopy,2.9


In [None]:
#@title 3.2. Press shift+enter to generate a list for copying-and-pasting
#@markdown Here is a copy-and-paste comma list of all the PDB ids for one UniProt id.
#@markdown <br>
#@markdown <br>

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and shows results as comma-delimited list
comma_pdb_id_list(UniProt_ID)

There are 26 experimentally-determined structures associated with UniProt id P49768.

7d8x, 6lr4, 8kct, 7c9i, 7y5t, 6lqg, 8kcp, 6iyc, 6idf, 8x54, 8kcs, 8kco, 8k8e, 8im7, 8kcu, 4uis, 8x53, 8oqy, 8x52, 8oqz, 5a63, 5fn4, 5fn5, 5fn3, 5fn2, 2kr6


<img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/Mol_Viewer_Loading_Multiples.png?raw=true" align="right">

The above list can be pasted into the [Molstar viewer](https://molstar.org/viewer/) and upon *'Apply'* all the coordinates will be loaded into the viewer.

In [None]:
#@title 3.3. Get experimentally-determined coordinates associated with a UniProt ID of resolution 'X' or better
#@markdown  In this sub-section you will be able to retrieve experimentally-determined coordinates where based on sequence and reported taxonomy there is one or more chains associated with the same specified Uniprot accession ID.
#@markdown
#@markdown  This result is additional filtered based on the resolution of the structures.
#@markdown  Resolution values correlates to the level of detail in the experimental data that supports the model.
#@markdown  The smaller the number for resolution, the better the level of detail in the experimental data.

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}
Resolution =  3 #@param {type:"number"}

#Run this code to perform a API call and load result into a DataFrame
display_exp_structure_summary2(UniProt_ID,Resolution)

There are 26 experimentally-determined structures associated with UniProt id P49768.

There are 14 experimentally-determined structures, with resolution of 3 Angstrom or better, associated with UniProt id P49768.


Unnamed: 0,PDB_ID,Experimental Method,Resolution (in Angstroms)
0,7d8x,Electron Microscopy,2.6
1,6lr4,Electron Microscopy,3.0
2,8kct,Electron Microscopy,2.6
3,7y5t,Electron Microscopy,2.9
4,8kcp,Electron Microscopy,3.0
5,6iyc,Electron Microscopy,2.6
6,6idf,Electron Microscopy,2.7
7,8x54,Electron Microscopy,2.9
8,8kcs,Electron Microscopy,2.4
9,8kco,Electron Microscopy,2.8


In [None]:
#@title 3.4. Press shift+enter to generate a list for copying-and-pasting
#@markdown Here is a copy-and-paste comma list of all the PDB ids for one UniProt id.
#@markdown <br>
#@markdown <br>

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}
Resolution =  3 #@param {type:"number"}

#Run this code to perform a API call and shows results as comma-delimited list
comma_pdb_id_list2(UniProt_ID,Resolution)

There are 26 experimentally-determined structures associated with UniProt id P49768.

There are 14 experimentally-determined structures, with resolution of 3 Angstrom or better, associated with UniProt id P49768.
7d8x, 6lr4, 8kct, 7y5t, 8kcp, 6iyc, 6idf, 8x54, 8kcs, 8kco, 8k8e, 8kcu, 8x53, 8x52


#4)  Variant structures available using **PDBe-KB**

This section reports on the PDB ids for the experimentally-determined structures with a focus on the presence of protein sequence variants.

Structures of variants / engineered mutations of proteins are some times part of research studies. This section provides options to explores whether these exist and what they are.

In [None]:
#@title 4.1. Get summary of sequence discrepancy (*e.g.* variants) compared to the UniProt sequence
#@markdown In this sub-section the variants, engineered mutations, modified residues, etc for experimental structures will be reported.
#@markdown <br>
#@markdown <br>
#@markdown These have been identified as positions that are in conflict with the UniProt sequence.

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#@markdown <br>
#@markdown Use filter button and imput the residue number to see if there is an experimental structure for a particular genetic variant.
#@markdown
#@markdown <br>
#@markdown <br>

displaying_variant_summary(UniProt_ID)

This is the output from entering the UniProt ID P49768.



Unnamed: 0,change,position,from_aa,to_aa,PDB_ID
0,Engineered mutation,112,GLN,CYS,6iyc
1,Engineered mutation,112,GLN,CYS,6idf
2,Conflict,256,TYR,THR,5fn3
3,Conflict,256,TYR,THR,5fn2
4,Engineered mutation,257,ASP,ALA,8oqz
5,Engineered mutation,385,ASP,ALA,6idf
6,Modification,424,LEU,UNK,4uis
7,Modification,425,LEU,UNK,4uis
8,Modification,426,ALA,UNK,4uis
9,Modification,427,ILE,UNK,4uis


In [None]:
#@title 4.2. Press shift+enter to generate a list of PDB ids with no modifications
#@markdown Here is a copy-and-paste comma list of the PDB ids with no modifications or discrepancy compared UniProt id.
#@markdown <br>
#@markdown <br>

#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}

#Run this code to perform a API call and shows results as comma-delimited list
nonmodified_PDB_IDs(UniProt_ID)

There are 26 experimentally-determined structures associated with UniProt id P49768.

Of these there are 20 experimentally-determined structures without any modifications or discrepancies with UniProt id P49768.

7d8x, 6lr4, 8kct, 7c9i, 7y5t, 6lqg, 8kcp, 8x54, 8kcs, 8kco, 8k8e, 8im7, 8kcu, 8x53, 8oqy, 8x52, 5a63, 5fn4, 5fn5, 2kr6


In [None]:
#@title 4.3. Get experimentally-determined structural data from Protein Data Bank where sample contains variant position.
#@markdown  In this sub-section you will be able to retrieve experimentally determined data associated with an Uniprot accession ID where sample sequence has a particular position.
#@markdown  <br>
#@markdown  <br>
#@markdown  <br>


#Run this code to display the widget
UniProt_ID = "P49768" #@param {type:"string"}
variant_position = 255 #@param {type:"number"}

#Run this code to perform a API call and load result into a DataFrame
displaying_3dbeacon_pdb_filtered(UniProt_ID,variant_position)

There are 24 different structures, from the Protein Data Bank (PDB) mapped to UniProt ID P49768, with 255.



Unnamed: 0,ID,Type,Provider,UniProt start,UniProt end,Coverage,Coverage as %,InRange
0,8kcs,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
1,8kct,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
2,7d8x,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
3,6iyc,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
4,8k8e,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
5,8kcu,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
6,6idf,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
7,8kco,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
8,8x52,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True
9,8x54,EXPERIMENTALLY DETERMINED,PDBe,1,467,1.0,100.0,True


#5)  Convert table output from JSON into a comma-delimited list of IDs


In [None]:
#@title 5.1. Filter table for subset (*e.g.* variants) and convert from JSON to a comma-delimited list
#@markdown In this sub-section the table from any of the sections above can be input as JSON to generate a comma-delimited list of ids.
#@markdown <br>
#@markdown For example the PDB ids with a specific variant / engineered mutation / modified residues / etc for experimental structures.
#@markdown <br>

#Run this code to display the widget
JSON = [{"index":0,"ID":"7tbw","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":1,"ID":"7tc0","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":2,"ID":"7tdt","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":3,"ID":"7tby","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":4,"ID":"7roq","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":5,"ID":"5xjy","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"},{"index":6,"ID":"7tbz","Type":"EXPERIMENTALLY DETERMINED","Provider":"PDBe","UniProt start":1,"UniProt end":2261,"Coverage":"1.0","Coverage as %":"100.0"}] #@param {type:"raw"}

getting_list_of_ids(JSON)

7tbw, 7tc0, 7tdt, 7tby, 7roq, 5xjy, 7tbz


It should be also noted that there are Cryo Electron Microscopy (CryoEM) maps, where coordinates/models are NOT built, due to limitations of resolution or other reasons. These are not findable using **PDBe-KB** or **3D Beacons** APIs. These Cryo EM maps where coordinates are NOT built do not always have protein sequences listed (so generally less findable from a protein sequence point of view). However, these CryoEM maps may provide insight -- in particular for protein systems where there is lots of flexibility / dynamics and/or few to no experimentally-determined coordinates available.

To find CryoEM Maps without coordinates please visit:
<br>
[Electron Microscopy Data Bank (EMDB)](https://www.ebi.ac.uk/emdb/)

Copyright 2024 EMBL - European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.