<a href="https://colab.research.google.com/github/paulynamagana/AFDB_notebooks/blob/main/AFDB_3DBeacons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Accessing AlphaFold DB structures through 3D-Beacons**
<img src="https://raw.githubusercontent.com/3D-Beacons/3D-Beacons/main/assets/3D-Beacons-logo.png" height="100" align="right">

### Introduction

Welcome to our Google Colab tutorial on accessing AlphaFold Db structures using the 3D-Beacons API. In this tutorial, we will explore the powerful combination of 3D-Beacons API to access all structures deposited in different databases.

This notebook serves as a practical resource to fetch predicted structures through the 3D-Beacons API.
To supplement your learning, we have provided links to the full paper as well as documentation resources that will assist you in navigating the API effectively.

Documentation Link: [3D-Beacons API Documentation](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/#/default/get_uniprot_summary_uniprot_summary__qualifier__json_get)

<br>


**Reference**

*Varadi, Mihaly, et al. “3D-beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources.” GigaScience, vol. 11, 2022, https://doi.org/10.1093/gigascience/giac118.

<br>

  ## How to use Google Colab <a name="Quick Start"></a>
1. To run a code cell, click on the cell to select it. You will notice a play button (▶️) on the left side of the cell. Click on the play button or press Shift+Enter to run the code in the selected cell.
2. The code will start executing, and you will see the output, if any, displayed below the code cell.
3. Move to the next code cell and repeat steps 2 and 3 until you have executed all the desired code cells in sequence.
4. The currently running step is indicated by a circle with a stop sign next to it.
If you need to stop or interrupt the execution of a code cell, you can click on the stop button (■) located next to the play button.

*Remember to run the code cells in the correct order, as their execution might depend on variables or functions defined in previous cells. You can modify the code in a code cell and re-run it to see updated results.*







---



In [None]:
#@title #1.&nbsp; Run this code to import and install libraries
### 2.1.&nbsp; Initialisation

#@markdown Run the cell code to install dependencies and create a function for searching in the 3D-Beacons Network
!pip install ijson gwpy &> /dev/null
import requests, sys, json
import ipywidgets as wgt
from prettytable import PrettyTable
import ijson
import pandas as pd
from decimal import Decimal
import urllib
from urllib.request import urlopen


def search_3d_beacons(id):
    base_api = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"

    try:
        response = requests.get(f"{base_api}{id}.json")
        response.raise_for_status()  # Raises an HTTPError for bad responses

        data = response.json()  # Assumes the response is in JSON format
        # Process the data as needed

        return data

    except requests.exceptions.RequestException as e:
        print(f"Error during request: {e}")
        return None

### 2.2&nbsp; Single search

#@markdown The following block retrieves all available structures in 3D-Beacons from a single Uniprot accession ID.
#@markdown As an example, let's retrieve protein structure entries for the **human Cellular tumor antigen p53**. The Uniprot identifier for this protein is **P04637**. You can find more information about this protein on [Uniprot](https://www.uniprot.org/uniprotkb/P04637).

Uniprot_ID =  "P04637" #@param {type:"string"}

Filter_by = "TEMPLATE-BASED" #@param {type:"string"}
#@markdown You can filter by:
#@markdown * EXPERIMENTALLY DETERMINED
#@markdown * CONFORMATIONAL ENSEMBLE
#@markdown * TEMPLATE-BASED
#@markdown * AB-INITIO

# Example usage:
result = search_3d_beacons(Uniprot_ID)

# Set up the table
table = PrettyTable()
table.field_names = ["Model Category", "coverage","Model Identifier", "Provider","Entity Type"]

# Filter structures and add to the table
for structure in result["structures"]:
    model_category = structure["summary"]["model_category"]
    #if model_category == "CONFORMATIONAL ENSEMBLE":
    model_identifier = structure["summary"]["model_identifier"]
    provider = structure["summary"]["provider"]
    coverage = structure["summary"]["coverage"]

        # Extract information from entities
    entities = structure["summary"]["entities"]
    for entity in entities:
        entity_type = entity["entity_type"]
        #entity_description = entity["description"]

            # Add the information to the table
        table.add_row([model_category, coverage, model_identifier, provider, entity_type])

# Print the table
print(table)


+---------------------------+----------+------------------------+--------------+-------------+
|       Model Category      | coverage |    Model Identifier    |   Provider   | Entity Type |
+---------------------------+----------+------------------------+--------------+-------------+
| EXPERIMENTALLY DETERMINED |  0.036   |          9c5s          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.036   |          9c5s          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.509   |          3d06          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.509   |          3d06          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.031   |          

In [None]:
#@title #2.&nbsp; SEQUENCE-BASED SEARCH

#@markdown The 3D-Beacons Network has introduced sequence similarity search functionality which allows you to query the network using the amino acid sequence of a protein.

#@markdown The Sequence Similarity Search option available through the network uses the Basic Local Alignment Search Tool (BLAST, Altschul et al., 1990) to find regions of sequence similarity by aligning them with a query sequence. By evaluating the match between the network and query sequence, valuable insights into the structure, function, and evolutionary aspects can be obtained, thus facilitating targeted and systematic exploration of protein structures.

#@markdown The code presented below allows you to search the network by performing a sequence-based search via API.

sequence_query = "MNMLVINGTPRKHGRTRIAASYIAALYHTA" #@param {type:"string"}


import os
import csv
from google.colab import drive

import requests, json
# Defining function for sequence search
def sequence_search(sequence):
    global job_id

    post_url = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/v2/sequence/search"
    query_sequence = {"sequence": sequence}

    try:
        response = requests.post(post_url, json=query_sequence)
        response.raise_for_status()  # Raise an exception if it fails
        if response.status_code == 200:
            print("Your search was successful")
            job_id = response.json()["job_id"]
            retrieve_results(job_id)
        else:
            print(f"Request failed with status code {response.status_code}")
            exit()
    except requests.RequestException as e:
        print(f"Request failed with status code: {response.status_code}")
        print(f"Response text: {response.text}")
        exit()


def retrieve_results(job_id):
  global parsed_items

  parsed_items = []
  get_url = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/v2/sequence/result"
  try:
    response = requests.get(f"{get_url}?job_id={job_id}")
    response.raise_for_status()

        # Parse the JSON response with ijson
    items = ijson.items(response.content, 'item')
    for item in items:
      parsed_items.append(item)
  except requests.RequestException as e:
    print(f"An error occurred during the GET request: {e}")
    exit()

results= sequence_search(sequence_query)
parsed_items

Mounted at /content/drive
Your search was successful


[{'accession': 'A0ABD4A0C4',
  'id': 'A0ABD4A0C4_BACIU',
  'description': 'NADPH-dependent FMN reductase-like domain-containing protein OS=Bacillus subtilis subsp. subtilis OX=135461 GN=B4067_1132 PE=3 SV=1',
  'hit_length': 174,
  'hit_hsps': [{'hsp_score': Decimal('150.0'),
    'hsp_bit_score': Decimal('62.4'),
    'hsp_align_len': 29,
    'hsp_identity': Decimal('100.0'),
    'hsp_positive': Decimal('100.0'),
    'hsp_qseq': 'MNMLVINGTPRKHGRTRIAASYIAALYHT',
    'hsp_hseq': 'MNMLVINGTPRKHGRTRIAASYIAALYHT',
    'hsp_mseq': 'MNMLVINGTPRKHGRTRIAASYIAALYHT',
    'hsp_expect': Decimal('5.9E-11')}],
  'summary': {'uniprot_entry': {'ac': 'A0ABD4A0C4',
    'id': 'A0ABD4A0C4_BACIU',
    'uniprot_checksum': '0005A4421A9C9F0A',
    'sequence_length': 174,
    'segment_start': 1,
    'segment_end': 174,
    'description': None},
   'structures': [{'summary': {'model_identifier': 'AF-A0ABD4A0C4-F1',
      'model_category': 'AB-INITIO',
      'model_url': 'https://alphafold.ebi.ac.uk/files/AF-A0AB

In [None]:
#turn the results into a dataframe
rows = []

for item in parsed_items:
    acc = item.get("accession")
    desc = item.get("description")

    for struct in item.get("summary", {}).get("structures", []):
        s = struct.get("summary", {})

        # extract first HSP score safely
        hsp_list = item.get("hit_hsps", [])
        if hsp_list:
            raw_hsp = hsp_list[0].get("hsp_score")
            hsp_score = float(raw_hsp) if isinstance(raw_hsp, Decimal) else raw_hsp
            raw_evalue = hsp_list[0].get("hsp_expect")
            hsp_expect = float(raw_evalue)
            hsp_expect = f"{hsp_expect:.3g}" if isinstance(raw_evalue, Decimal) else raw_evalue
        else:
            hsp_score = None
            hsp_expect = None

        rows.append({
            "accession": acc,
            "description": desc,
            "model_id": s.get("model_identifier"),
            "provider": s.get("provider"),
            "HSP_score": hsp_score,
            "E-value": hsp_expect,
            "category": s.get("model_category"),
            "coverage": float(s.get("coverage")) if isinstance(s.get("coverage"), Decimal) else s.get("coverage"),
            "confidence": float(s.get("confidence_avg_local_score")) if isinstance(s.get("confidence_avg_local_score"), Decimal) else s.get("confidence_avg_local_score"),
        })

df = pd.DataFrame(rows)
df


Unnamed: 0,accession,description,model_id,provider,HSP_score,E-value,category,coverage,confidence
0,A0ABD4A0C4,NADPH-dependent FMN reductase-like domain-cont...,AF-A0ABD4A0C4-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.38
1,A0AAQ3EST2,FMN-dependent NADPH-azoreductase OS=Bacillus s...,AF-A0AAQ3EST2-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.5
2,A0AAE2V9U0,FMN-dependent NADPH-azoreductase OS=Bacillus s...,A0AAE2V9U0_3-169:3gfr.1.B,SWISS-MODEL,150.0,5.9e-11,TEMPLATE-BASED,0.96,0.9
3,A0AAE2V9U0,FMN-dependent NADPH-azoreductase OS=Bacillus s...,AF-A0AAE2V9U0-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.38
4,A0A0A1M1A3,Azoreductase OS=Bacillus subtilis OX=1423 GN=y...,A0A0A1M1A3_3-169:3gfr.1.B,SWISS-MODEL,150.0,5.9e-11,TEMPLATE-BASED,0.96,0.9
5,A0A0A1M1A3,Azoreductase OS=Bacillus subtilis OX=1423 GN=y...,AF-A0A0A1M1A3-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.5
6,A0A6M3Z8V9,NAD(P)H-dependent oxidoreductase OS=Bacillus s...,A0A6M3Z8V9_3-169:3gfr.1.B,SWISS-MODEL,150.0,5.9e-11,TEMPLATE-BASED,0.96,0.9
7,A0A6M3Z8V9,NAD(P)H-dependent oxidoreductase OS=Bacillus s...,AF-A0A6M3Z8V9-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.44
8,Q8RR37,Azoreductase OS=Geobacillus stearothermophilus...,AF-Q8RR37-F1,AlphaFold DB,150.0,5.9e-11,AB-INITIO,1.0,96.31
9,O07529,FMN-dependent NADPH-azoreductase OS=Bacillus s...,3gfs,PDBe,150.0,5.9e-11,EXPERIMENTALLY DETERMINED,1.0,


After running the next code, you may be prompted to grant access to your Google Drive. This is necessary for Google Colab to download the model and save it to your Drive.
<br>
<br>
Please follow the on-screen instructions to provide the necessary permissions, as it enables seamless integration between Colab and your Drive. Rest assured that your data and files are secure and will not be accessed without your explicit permission. Let's proceed with the code and grant the required access to initiate the model download to your Drive.

In [None]:
#@title Download
#@markdown Once you found the model/ models you need, you can customise this code to download the MMCIF file to Google Drive

#@markdown If multiple models are required, separated them with a comma.
Uniprot_ID = "O07529"# @param {type:"string"}
model_download = "3gfs,3gfr"  # @param {type:"string"}
model_download_modified = ", ".join(model_download.strip().split(","))
#@markdown The model_download can be found in the key 'model_identifier' within every summary
def Search3DBeacons(ID):
  WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/v2/uniprot/summary/"

  r = ijson.parse(urlopen(f"{WEBSITE_API}{ID}.json"))
  structures = list(ijson.items(r, "structures.item", use_float=True))
  return structures


def download_file(model_url, destination_file_path):
  try:
    urllib.request.urlretrieve(model_url, destination_file_path)
    print(f'File downloaded successfully to: {destination_file_path}')
  except urllib.error.URLError as e:
    print(f'Error downloading file: {e}')  # Handle URL-related errors
  except Exception as e:  # Catch other general exceptions
    print(f'An unexpected error occurred: {e}')

structures = Search3DBeacons(Uniprot_ID)

import os
from google.colab import drive

 # Mounting the Google Drive to access files and directories
drive.mount('/content/drive')
destination_path = '/content/drive/MyDrive/3DBeacons_files'

isExist = os.path.exists(destination_path)
if not isExist:

   # Create a new directory because it does not exist
   os.makedirs(destination_path)
   print("The new directory is created!")

for structure in structures:
  model = structure.get('summary', {}).get('model_identifier')
  if model in model_download_modified:
    model_url = structure.get("summary", {}).get("model_url")
    model_format = structure.get("summary", {}).get("model_format")
    file_extension = '.cif' if model_format == 'MMCIF' else '.pdb'
    destination_file_path = os.path.join(destination_path, model + file_extension)
    download_file(model_url, destination_file_path)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
File downloaded successfully to: /content/drive/MyDrive/3DBeacons_files/3gfs.cif
File downloaded successfully to: /content/drive/MyDrive/3DBeacons_files/3gfr.cif


## Contact us

If you experience any bugs please contact afdbhelp@ebi.ac.uk


