<a href="https://colab.research.google.com/github/paulynamagana/AFDB_notebooks/blob/main/AFDB_3DBeacons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Accessing AlphaFold DB structures through 3D-Beacons**
<img src="https://raw.githubusercontent.com/3D-Beacons/3D-Beacons/main/assets/3D-Beacons-logo.png" height="100" align="right">

### Introduction

Welcome to our Google Colab tutorial on accessing AlphaFold Db structures using the 3D-Beacons API. In this tutorial, we will explore the powerful combination of 3D-Beacons API to access all structures deposited in different databases.

This notebook serves as a practical resource to fetch predicted structures through the 3D-Beacons API.
To supplement your learning, we have provided links to the full paper as well as documentation resources that will assist you in navigating the API effectively.

Documentation Link: [3D-Beacons API Documentation](https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/#/default/get_uniprot_summary_uniprot_summary__qualifier__json_get)

<br>


**Reference**

*Varadi, Mihaly, et al. “3D-beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources.” GigaScience, vol. 11, 2022, https://doi.org/10.1093/gigascience/giac118.

<br>

  ## How to use Google Colab <a name="Quick Start"></a>
1. To run a code cell, click on the cell to select it. You will notice a play button (▶️) on the left side of the cell. Click on the play button or press Shift+Enter to run the code in the selected cell.
2. The code will start executing, and you will see the output, if any, displayed below the code cell.
3. Move to the next code cell and repeat steps 2 and 3 until you have executed all the desired code cells in sequence.
4. The currently running step is indicated by a circle with a stop sign next to it.
If you need to stop or interrupt the execution of a code cell, you can click on the stop button (■) located next to the play button.

*Remember to run the code cells in the correct order, as their execution might depend on variables or functions defined in previous cells. You can modify the code in a code cell and re-run it to see updated results.*







---



In [None]:
# @title Run this code to import and install libraries
### 2.1.&nbsp; Initialisation

#@markdown Run the cell code to install dependencies and create a function for searching in the 3D-Beacons Network

import requests, sys, json
import ipywidgets as wgt
from prettytable import PrettyTable


def search_3d_beacons(id):
    base_api = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"

    try:
        response = requests.get(f"{base_api}{id}.json")
        response.raise_for_status()  # Raises an HTTPError for bad responses

        data = response.json()  # Assumes the response is in JSON format
        # Process the data as needed

        return data

    except requests.exceptions.RequestException as e:
        print(f"Error during request: {e}")
        return None

### 2.2&nbsp; Single search

#@markdown The following block retrieves all available structures in 3D-Beacons from a single Uniprot accession ID.
#@markdown As an example, let's retrieve protein structure entries for the **human Cellular tumor antigen p53**. The Uniprot identifier for this protein is **P04637**. You can find more information about this protein on [Uniprot](https://www.uniprot.org/uniprotkb/P04637).

Uniprot_ID =  "P04637" #@param {type:"string"}

Filter_by = "TEMPLATE-BASED" #@param {type:"string"}
#@markdown You can filter by AB-INITIO, CONFORMATIONAL ENSEMBL, EXPERIMENTALLY DETERMINED OR TEMPLATE-BASED

# Example usage:
result = search_3d_beacons(Uniprot_ID)

# Set up the table
table = PrettyTable()
table.field_names = ["Model Category", "coverage","Model Identifier", "Provider","Entity Type"]

# Filter structures and add to the table
for structure in result["structures"]:
    model_category = structure["summary"]["model_category"]
    #if model_category == "CONFORMATIONAL ENSEMBLE":
    model_identifier = structure["summary"]["model_identifier"]
    provider = structure["summary"]["provider"]
    coverage = structure["summary"]["coverage"]

        # Extract information from entities
    entities = structure["summary"]["entities"]
    for entity in entities:
        entity_type = entity["entity_type"]
        #entity_description = entity["description"]

            # Add the information to the table
        table.add_row([model_category, coverage, model_identifier, provider, entity_type])

# Print the table
print(table)


+---------------------------+----------+------------------------+--------------+-------------+
|       Model Category      | coverage |    Model Identifier    |   Provider   | Entity Type |
+---------------------------+----------+------------------------+--------------+-------------+
| EXPERIMENTALLY DETERMINED |  0.509   |          3d06          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.509   |          3d06          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.031   |          5mhc          |     PDBe     | NON-POLYMER |
| EXPERIMENTALLY DETERMINED |  0.557   |          6ggc          |     PDBe     |   POLYMER   |
| EXPERIMENTALLY DETERMINED |  0.557   |          

In [None]:
#@title #2.&nbsp; SEQUENCE-BASED SEARCH

#@markdown The 3D-Beacons Network has introduced sequence similarity search functionality which allows you to query the network using the amino acid sequence of a protein.

#@markdown The Sequence Similarity Search option available through the network uses the Basic Local Alignment Search Tool (BLAST, Altschul et al., 1990) to find regions of sequence similarity by aligning them with a query sequence. By evaluating the match between the network and query sequence, valuable insights into the structure, function, and evolutionary aspects can be obtained, thus facilitating targeted and systematic exploration of protein structures.

#@markdown The code presented below allows you to search the network by performing a sequence-based search via API.

sequence_query = "RVKALRWQCIECKTCSSCRDQGKNADNMLFCDSCDRGFHMECCDPPLTRM" #@param {type:"string"}


import os
import csv
from google.colab import drive

folder_save = "KAT6A"  #@param {type:"string"}
drive.mount('/content/drive', force_remount=True)
destination_path = f"/content/drive/MyDrive/AFDB_3DB/{folder_save}"


isExist = os.path.exists(destination_path)
if not isExist:
    os.makedirs(destination_path)
    print("The new directory was created!")



import requests, json
# Defining function for sequence search
def sequence_search(sequence):
    global job_id

    post_url = "https://wwwdev.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/sequence/search"
    query_sequence = {"sequence": sequence}

    try:
      #send POST request to perform sequence search
        response = requests.post(post_url, json=query_sequence)
        response.raise_for_status() #Raise an exception if it fails
        if response.status_code == 200:
            print("Your search was successful")
            job_id = response.json()["job_id"]
        else:
            print(f"Request failed with status code {response.status_code}")
            exit()
    except requests.RequestException as e:
        print(f"An error occurred during the POST request: {e}")
        exit()

    retrieve_results(job_id)

def retrieve_results(job_id):
    get_url = "https://wwwdev.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/sequence/result"
    try:
        # send GET request to retrieve the results from the search
        response = requests.get(f"{get_url}?job_id={job_id}")
        response.raise_for_status()  # Raise an exception if the request fails

        r = response.json()
        # Save output to a file
        json_file_path = f"/content/drive/MyDrive/AFDB_3DB/{folder_save}/summary_seqsearch.json"

        with open(json_file_path, 'w') as file:
            json.dump(r, file, indent=3)

        print("Output saved to `summary_seqsearch.json`")
    except Exception as e:
        print(f"An error occurred during the GET request: {e}")
        exit()




sequence_search(sequence_query)
print(job_id)

Mounted at /content/drive
Your search was successful
Output saved to `summary_seqsearch.json`
bd37d526a01b96eefa64ce6c46523a00


### 2.4.&nbsp; Download model

In order to download the model, you will need to provide the `model identifier` in the input below

In [None]:
model_retrieve = "P34809" #@param {style:"string"}

structures = search_3d_beacons(model_retrieve)
### Run this code to filter for specific model
for structure in structures:
    model = structure.get('summary', {}).get('model_identifier')
    if model == model_retrieve.value:
      model_url = structure.get("summary", {}).get("model_url")

AttributeError: ignored

After running the next code, you may be prompted to grant access to your Google Drive. This is necessary for Google Colab to download the model and save it to your Drive.
<br>
<br>
Please follow the on-screen instructions to provide the necessary permissions, as it enables seamless integration between Colab and your Drive. Rest assured that your data and files are secure and will not be accessed without your explicit permission. Let's proceed with the code and grant the required access to initiate the model download to your Drive.

In [None]:
# Importing the necessary libraries: os for interacting with the operating system, and drive from google.colab for mounting Google Drive
import os
from google.colab import drive

 # Mounting the Google Drive to access files and directories
drive.mount('/content/drive')
destination_path = '/content/drive/MyDrive/3DBeacons_files'

isExist = os.path.exists(destination_path)
if not isExist:

   # Create a new directory because it does not exist
   os.makedirs(destination_path)
   print("The new directory is created!")

# Function to download a file from a given URL and save it to the Google Drive
def download_file(url):

  os.chdir(destination_path)
  !wget "$url"

# Calling the download_file function to download a file from the specified model_url
download_file(model_url)

Mounted at /content/drive
The new directory is created!
--2023-09-01 14:40:04--  https://alphafill.eu/v1/aff/P34809
Resolving alphafill.eu (alphafill.eu)... 192.87.97.253
Connecting to alphafill.eu (alphafill.eu)|192.87.97.253|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 833770 (814K) [text/plain]
Saving to: ‘P34809’


2023-09-01 14:40:06 (636 KB/s) - ‘P34809’ saved [833770/833770]



## Contact us

If you experience any bugs please contact afdbhelp@ebi.ac.uk


