<a href="https://colab.research.google.com/github/glevans/7ADD-workshop-2024/blob/main/1_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Leveraging the power of PDBe's APIs**
<img src="https://github.com/paulynamagana/AFDB_notebooks/blob/main/api.png?raw=true" height="80" align="right">

Welcome to this notebook!

To use this notebook in Colab (link at top of the page):

*   you will need to have a Google account
*   be logged in to Google Colab (by being logged into Google account)

<br>

You can also download this notebook and view it *via* a local installation of [Jupyter](https://jupyter.org/) (*i.e.* latest Jupyterlab or original Juptyer Notebook) or a browser instance of [JupyterLab](https://jupyter.org/try-jupyter/lab/).

<br>

---

This interactive Python notebook is part of a series that will guide you through various ways of programmatically accessing Protein Data Bank in Europe (PDBe) data using APIs.

<img src="https://www.ebi.ac.uk/pdbe/docs_dev/logos/images/RGB/PDBe-logo-RGB_2013.png" height="300" align="right">


The REST API is a programmatic way to obtain information from the PDB and EMDB.

You can access details about:

* sample
* experiment
* models
* compounds
* cross-references
* publications
* quality
* assemblies
* and more...

For more information, visit http://www.ebi.ac.uk/pdbe/pdbe-rest-api

<br>

---

  ## How to use this notebook <a name="Quick Start"></a>
1. To run a code cell, click on the cell to select it. You will notice a play button (▶️) on the left side of the cell. Click on the play button or press Shift+Enter to run the code in the selected cell.
2. The code will start executing, and you will see the output, if any, displayed below the code cell.
3. Move to the next code cell and repeat steps 2 and 3 until you have executed all the desired code cells in sequence.
4. The currently running step is indicated by a circle with a stop sign next to it.
If you need to stop or interrupt the execution of a code cell, you can click on the stop button (■) located next to the play button.

*Remember to run the code cells in the correct order, as their execution might depend on variables or functions defined in previous cells. You can modify the code in a code cell and re-run it to see updated results.*

<br>

---

## Contact us

If you experience any bugs please contact pdbehelp@ebi.ac.uk and put "Help with" and the title of the notebook in the subject line of the message.



# Notebook #1

This notebook is the first in the training material series. It aims to lay down the foundation for understanding how users can interact with the PDBe REST API using Python3.

## 1) Making imports and setting variables

First, we import some packages that we will use, and set some variables.

We will be using Python packages / modules:

*   [re](https://https://docs.python.org/3/library/re.html) - allows use of regular expression matching operations similar to those found in Perl.
*   [requests](https://docs.python.org/3/library/re.html) - allows you to send HTTP/1.1 requests extremely easily.
*   [pprint](https://docs.python.org/3/library/pprint.html) - makes data look more readable / pretty
*   [csv](https://docs.python.org/3/library/csv.html) - enables csv file input and output

<br>



---



*FURTHER INFORMATION:*

Full list of valid PDBe API URLs / API endpoints is available from http://www.ebi.ac.uk/pdbe/api/doc/


In [1]:
# Importing Python packages / modules
import re
import requests
import pprint
import csv

# Defining variables to describe API urls
base_url = "https://www.ebi.ac.uk/pdbe/"

api_base = base_url + "api/"

summary_url = api_base + "pdb/entry/summary/"

# We have defined a variable called summary_url with the following value:
#### https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/

experiment_url = api_base + "pdb/entry/experiment/"

# We have defined a variable called experiment_url with the following value:
#### https://www.ebi.ac.uk/pdbe/api/pdb/entry/experiment/

## 2) Basic examples

We will start with some simple Python code to enhance the usefulness of PDBe's APIs.

### 2.1) Getting a value for a key from a dictionary

Dictionaries are Python data structures with unique keys and corresponding values.

PDBe API calls generally return data in the form that readily can be converted to Python dictionaries, i.e. collections of key and value pairs.

Getting values from a Python object which is dictionary data type can be done either by directly accessing the value using the key, or by using a simple function, such as below.

In [None]:
# Basic example of a Python dictionary with two keys and two corresponding values.
# We are using {} to define that this input is a Python data-type dictionary.

simplified_example_information_block = {
    "pdb_id": "8cau",
    "experimental_method": "Electron Microscopy",
}

# Getting value directly using key:
print("Getting value directly:", simplified_example_information_block["pdb_id"])
print("Getting value directly:", simplified_example_information_block["experimental_method"])

# Getting value using a simple function we have named 'get_value'.
# This function needs two inputs and gets the value for a key from a dictionary.
def get_value(key, input_information):
    try:
        return input_information[key]
    except KeyError as error:
        error_message = "no value"
        return error_message

print()

print("Getting value using function:", get_value("pdb_id", input_information=simplified_example_information_block))
print("Getting value using function:", get_value("experimental_method", input_information=simplified_example_information_block))

### 2.2) Generating summary information


#### 2.2a) Creating a mock PDBe API output

The information block below will serve as an offline example of what an actual PDBe API call would return.

The example is the summary query result for the entry "8CAU".

This stucture can be viewed on a PDBe webpage here:
https://www.ebi.ac.uk/pdbe/entry/pdb/8cau

<br>

In this page we will be demonstrating how the information on our webpages, as well as additional insight/information from the wwPDB database can be accessible *via* a programmatic approach.

We will create an object 'example_summary' which is being defined as the Python data type dictionary.

We will be using a function from the Python package **pprint** to create nice output.

<br>

---

*FURTHER INFORMATION:*

The information block below can also be generated using the json file that is available from the URL below:

https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/8cau

In [None]:
# Example below shows a block of information in equivalent format to PDBe API output.
# The information is being converted into a Python data-type dictionary.
example_summary =  {
   "8cau":[
      {
         "title":"human alpha7 nicotinic receptor in complex with the C4 nanobody and nicotine",
         "processing_site":"PDBE",
         "deposition_site":"PDBE",
         "deposition_date":"20230124",
         "release_date":"20231011",
         "revision_date":"20231011",
         "experimental_method_class":[
            "em"
         ],
         "experimental_method":[
            "Electron Microscopy"
         ],
         "split_entry":[

         ],
         "related_structures":[
            {
               "resource":"EMDB",
               "accession":"EMD-16534",
               "relationship":"associated EM volume"
            }
         ],
         "entry_authors":[
            "Prevost, M.S.",
            "Barilone, N.",
            "Dejean de la Batie, G.",
            "Pons, S.",
            "Ayme, G.",
            "England, P.",
            "Gielen, M.",
            "Bontems, F.",
            "Pehau-Arnaudet, G.",
            "Maskos, U.",
            "Lafaye, P.",
            "Corringer, P.-J."
         ],
         "number_of_entities":{
            "water":0,
            "polypeptide":2,
            "dna":0,
            "rna":0,
            "sugar":0,
            "ligand":2,
            "dna/rna":0,
            "other":0,
            "carbohydrate_polymer":1
         },
         "assemblies":[
            {
               "assembly_id":"1",
               "name":"decamer",
               "form":"hetero",
               "preferred":"true"
            }
         ]
      }
   ]
}

# We can print the dictionary object we have created to see the contents
print(example_summary)

print()

# Using the function pprint from Python package pprint makes the data look nice with indent and line-break formatting etc
pprint.pprint(example_summary)

print()

# Check the object we have made is Python data type dictionary.
type_of_obj = type(example_summary)
print("The data type for the object we have defined is", type_of_obj)

# Report the 'keys' present in the object we have made.
keys_in_obj = example_summary.keys()
print("The keys present in the object we have defined is", keys_in_obj)

# Report the number of 'keys' present in the object we have made.
number_of_keys_in_obj = len(example_summary)
print("The number of keys present in the object we have defined are", number_of_keys_in_obj)

<br>

---

*Note:*

The following built-in Python functions were used in the previous block of code:

*   [type()](https://docs.python.org/3/library/functions.html#type)
*   [len()](https://docs.python.org/3/library/functions.html#len)

We used these to query/check certain aspects of the Python object we have created.

#### 2.2b) Getting metadata for a single PDB entry from the mock PDBe API data

Now let's try to get summary information for a PDB entry using a simple function!
We will use the mock PDBe API data as an example to start.

In [None]:
# The below function called 'get_entry_info' does two things:
# --> Checks if PDB id is listed in the information.
# --> If PDB id is listed, returns the information for this id.
# --> Information is output if available in the form of a dictionary.

def get_entry_info(pdb_id, input_information):
    try:
        output_for_entry_as_list = (input_information[pdb_id])
        output_for_entry_as_dict = {}
        for item in output_for_entry_as_list:
            output_for_entry_as_dict.update(item)
        return output_for_entry_as_dict
    except KeyError as error:
        print("Key error: ", error)
        return None

# Try to get PDB entry "3bow"
print("Trying with PDB id which is NOT in the information block:")
PDB_id_3bow_info = get_entry_info("3bow", example_summary)
print(PDB_id_3bow_info)

print()

# Try to get PDB entry "8cau"
print("Trying with PDB id which is in the information block:")
PDB_id_8cau_info = get_entry_info("8cau", example_summary)
pprint.pprint(PDB_id_8cau_info)

print()

# Check the object we have made is Python data type dictionary.
type_of_obj = type(PDB_id_8cau_info)
print("The data type for the object we have defined is", type_of_obj)

# Report the 'keys' present in the object we have made.
keys_in_obj = PDB_id_8cau_info.keys()
print("The keys present in the object we have defined are", keys_in_obj)

# Report the number of keys present in the object we have made.
number_of_keys_in_obj = len(PDB_id_8cau_info)
print("The number of keys present in the object we have defined are", number_of_keys_in_obj)

In [None]:
# The dictionary we have made in the previous code block is a more complex example of a Python dictionary object.

# Getting value directly using key:
print("Getting value directly: ", PDB_id_8cau_info["experimental_method"])

print()

# Getting value using a function:
print("Getting value using function: ", get_value("experimental_method", input_information=PDB_id_8cau_info))

#### 2.2c) Getting summary information for an entry (still using mock data)

Let's write a function that can be used to write a brief summary of a PDB entry

Please note, that certain calls could return multiple PDB entries (*i.e.* POST calls), but the GET summary call we use in this exercise will always return only one PDB entry

In [None]:
# The below functions work to returns the information for this id in a summary format.

def make_entry_summary(pdb_id,input_information):
    entry_information = get_entry_info(pdb_id,input_information)

    # Getting the title of the entry
    title = get_value("title", entry_information)

    # Getting the release date of the entry
    release_date = get_value("release_date", entry_information)
    # Formatting the release data to make it more user-friendly
    formatted_release_date = "{}/{}/{}".format(release_date[:4], release_date[4:6], release_date[6:])

    # Getting the experimental methods
    # Because there can be multiple methods, so this is a list that
    # needs to be iterated
    experimental_methods = ""
    for experimental_method in get_value("experimental_method", entry_information):
        if experimental_methods:
            experimental_methods += " and "
        experimental_methods += experimental_method

    # Creating the summary text using all the extracted information
    summary = ("Entry is titled " + title + " was released on " + formatted_release_date + ". ")
    summary += ("This entry was determined using " + experimental_methods + ".")
    return summary

print(make_entry_summary("8cau",example_summary))

## 3) Switching to real API data

Finally, we will start using the PDBe API to make real calls to get data for any PDB entry of interest

First, we need a function to communicate with the API

Making calls over the network is more expensive than getting data from a mock dictionary, so we will include an additional check before making the call: we will check if the PDB id in the pdb_id argument is a valid id that matches the PDB id pattern

We will be using a functions from the Python package **re** and **request** in the function we will define below.

In [None]:
# This function will make a call to the PDBe API using the PDB id and API url provided as arguments.
def get_entry_from_api(pdb_id, api_url):

    # Check the PDB id is formatted correctly
    if not re.match("[0-9][A-Za-z][A-Za-z0-9]{2}", pdb_id):
        print("Invalid PDB id")
        return None

    # Make a GET call to the API URL
    get_request = requests.get(url=api_url+pdb_id)

    if get_request.status_code == 200:
        # If there is data returned (with HTML status code 200)
        # then return the data in JSON format
        return get_request.json()
    else:
        # If there is no data, print status code and response
        print(get_request.status_code, get_request.text)
        return None

# Try our GET function with an invalid PDB id
print("Trying to GET data with invalid PDB id:")
print(get_entry_from_api("whatever", summary_url))
print()

# Try our GET function with a valid PDB id
print("Trying to GET data with valid PDB id:")
pprint.pprint(get_entry_from_api("8cau", summary_url))

print()

print(make_entry_summary("8cau",get_entry_from_api("8cau", summary_url)))

# As you can hopefully see, the data displayed is very similar to
# what we had in the mock data in previous sections - however,
# this is actual data coming from the PDBe API



### PRACTICE:

### Please take the output from the above code cell and copy-and-paste it into the below code cell.

### Run the code to generate a set of summaries.




As you can hopefully see, the data displayed is very similar to what we had in the mock data in previous sections - however, this is actual data coming from the PDBe API

## 4) Trying the make_entry_summary() function with real API data

We can to use our make_summary() function on real API data - All we need to do is to change the argument (data) we are passing into it

In [None]:
print("Example #1: 8cau")
print(make_entry_summary("8cau",get_entry_from_api("8cau", summary_url)))
print()
print("Example #2: 8ci1")
print(make_entry_summary("8ci1",get_entry_from_api("8ci1", summary_url)))
print()
print("Example #3: 8c9x")
print(make_entry_summary("8c9x",get_entry_from_api("8c9x", summary_url)))
print()
print("Example #4: 8ce4")
print(make_entry_summary("8ce4",get_entry_from_api("8ce4", summary_url)))
print()
print("Example #5: 7u61")
print(make_entry_summary("7u61",get_entry_from_api("7u61", summary_url)))
print()
print("Example #6: 2n63")
print(make_entry_summary("2n63",get_entry_from_api("2n63", summary_url)))

## 5) Further applications with real API data

Getting the resolution for an entry is another simple example of using PDBe APIs.

---

*FURTHER INFORMATION:*

The information block below can also be generated using the json file that is available from URLs below:

*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/experiment/8cau
*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/experiment/7u61

In [None]:
# Try our GET function wwith different api endpoint & a valid PDB id for a structure determined by Electron Microscopy
print("Trying to GET data with valid PDB id:")
pprint.pprint(get_entry_from_api("8cau", experiment_url))
print()
print()
print("Resolution is", get_value("resolution",(get_entry_info("8cau",(get_entry_from_api("8cau", experiment_url))))), "Angstrom.")

In [None]:
# Try our GET function with different api endpoint & a valid PDB id for a structure determined by X-ray crystallography
print("Trying to GET data with valid PDB id:")
pprint.pprint(get_entry_from_api("8cau", experiment_url))
print()
print()
print("Resolution is", get_value("resolution",(get_entry_info("8cau",(get_entry_from_api("8cau", experiment_url))))), "Angstrom.")

We can get this type of information for any entry.

In [None]:
print("Example #1: 8cau")
print(make_entry_summary("8cau",get_entry_from_api("8cau", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("8cau",(get_entry_from_api("8cau", experiment_url))))), "Angstrom.")
print()
print("Example #2: 8ci1")
print(make_entry_summary("8ci1",get_entry_from_api("8ci1", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("8ci1",(get_entry_from_api("8ci1", experiment_url))))), "Angstrom.")
print()
print("Example #3: 8c9x")
print(make_entry_summary("8c9x",get_entry_from_api("8c9x", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("8c9x",(get_entry_from_api("8c9x", experiment_url))))), "Angstrom.")
print()
print("Example #4: 8ce4")
print(make_entry_summary("8ce4",get_entry_from_api("8ce4", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("8ce4",(get_entry_from_api("8ce4", experiment_url))))), "Angstrom.")
print()
print("Example #5: 7u61")
print(make_entry_summary("7u61",get_entry_from_api("7u61", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("7u61",(get_entry_from_api("7u61", experiment_url))))), "Angstrom.")
print()
print("Example #6: 2n63")
print(make_entry_summary("2n63",get_entry_from_api("2n63", summary_url)))
print("Resolution is", get_value("resolution",(get_entry_info("2n63",(get_entry_from_api("2n63", experiment_url))))), "Angstrom.")

## 6) Summary

In this notebook we have coverted information from an API call into a Python object that is a dictionary data-type.

<br>

We have made 4 Python functions / definitions / methods to help get and navigate the information from PDBe API calls:

*   **get_value()**

    *- gets the value that corresponds to a key in a dictionary*

    *- output is a string*

*   **get_entry_from_api()**

    *- will make a GET call to the PDBe API using the PDB id and API url as arguments*

    *- output is a dictionary*

*   **get_entry_info()**
    
    *- gets the data from an information block that corresponds to a PDB id*

    *- if id is not found, it print an error message and returns 'None'*

    *- takes output from **get_entry_from_api** as input*

    *- uses **get_value** function as part of the method*

    *- output is a dictionary*

*   **make_entry_summary()**

    *- creates a summary for a PDB entry.*

    *- takes output from **get_entry_from_api** as input*
    
    *- uses **get_value** function as part of the method*

    *- output is a string*

<br>

We have shown how one can write a 'print' statement that pulls information from a dictionary by specifying a key and displays the value that corresponds to that key on the screen. The **get_value** function can also be used to do this and is more easily incorporated into complex blocks of Python code.


## 7) Insight to help making your own notebooks

When you are building new notebooks using PDBe's API calls there are some helpful ways to view what data is available.

One option to view what data available is to access JSON files by URLs, *e.g.*:
*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/8cau
*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/summary/7u61
*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/experiment/8cau
*   https://www.ebi.ac.uk/pdbe/api/pdb/entry/experiment/7u61

Another option available is here: https://www.ebi.ac.uk/pdbe/api/doc/pdb.html

You can look at API call results by using the webinterface we have provided.

On this webpage one can generate a API call by clicking the grey **'Run Call'** button with the PDB id that is loaded by default or changing it to a PDB id of your choosing. This will reveal the data structure of what the API call will generate and enable you to decide how you will use Python code to handle the output.

## 8) EXERCISE: Input your own list of PDB ids, run API calls and generate summaries.

### 8.1) Make a list_to_summary() function to generate multiple API calls for summaries for a list of PDB ids:

The Python function below takes a PDB id list as an argument and makes Python code that we can use as input in another code cell.

The Python code we will generate will enable multiple 'GET' queries and generate multiple short summary statements.

In next notebook (*Notebook #2*) we will look into 'POST' queries.

'POST' queries are better than 'GET' queries for handling list of PDB ids.

However 'GET' can also be used with lists of PDB ids as this exercise will demonstrate.


In [None]:
# New function to generate Python code that will run muliple 'GET' queries
# The output from this function will need to be copy-and-pasted into another code cell.
def list_to_summaries(pdb_ids):
    # Check the PDB id is formatted correctly
    pdb_ids = pdb_ids
    indexes = range(1, len(pdb_ids)+1)

    for number, pdb_id in zip(indexes, pdb_ids):
        print(f"print(\"Example #{number}: {pdb_id}\")")
        print(f"print(make_entry_summary(\"{pdb_id}\",get_entry_from_api(\"{pdb_id}\", summary_url)))")
        print(f"print(\"Resolution is\", get_value(\"resolution\",get_entry_info(\"{pdb_id}\",get_entry_from_api(\"{pdb_id}\", experiment_url))), \"Angstrom\")")
        print(f"print()")

# You can replace the content inside the square brackets with PDB IDs for entries that interest you.
# Please put "" marks around each id to indicate that this is 'string' input.
pdb_id_list = ["8cau", "8ci1", "8c9x", "8ce4", "7u61", "2n63"]
list_to_summaries(pdb_id_list)


### PRACTICE:

### Please take the output from the above code cell (Ctrl+A, then Ctrl+C).
### Paste it (Ctrl+V) into the below code cell.

### Run the code to generate a set of summaries.




In [None]:
# Copy-and-paste the code from above output here:


### 8.2) Generate a csv output from a PDBe search

In the code we will be running in section 8.4 can use any comma-seperated list of PDB ids.

These lists are often found in *Data Availablity* statements in publications.

We can also generate a list by performing a search of a favorite or interesting chemical  or protein *via* the [PDBe searchbar](https://www.pdbe.org/).

For example, type 'nicotine' in the [PDBe searchbar](https://www.pdbe.org/) and select the appropriate option from the curated set of options in the pop-up.

The below image shows how to utilize the [PDBe searchbar](https://www.pdbe.org/):

<img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/Protein_Data_Bank_in_Europe_eg_nicotine.png?raw=true" height="350" align="center">

<br>

From the results generated one can download list of PDB ids in csv file.

The below image shows how to do this.

<img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/Protein_Data_Bank_in_Europe_eg_nicotine2.png?raw=true" height="300" align="center">

<br>
<br>

---

*FURTHER INFORMATION:*

A key aspect of using PDBe searchbar is NOT to treat it like Google searchbar and partial type a word and press enter.

The best way to use the PDBe searchbar is to look at what options appear in the pop-up and then click on the appropriate option.

The downloaded CSV file needs to be placed in a folder structure so it is accessible by this notebook.

In Colab:

1.   Click the folder-shaped icon <img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/Folder_icon.png?raw=true" height="20"> on the left-side of the screen.
2.   Click on the upload-document icon <img src="https://github.com/glevans/7ADD-workshop-2024/blob/main/Images/Upload-file_icon.png?raw=true" height="25"> that pops up and upload the file into the folder space associated with this notebook.

### 8.3) Convert the csv output from a PDBe search into a comma-seperated listed

The below code will take the PDBe_search.csv file and convert it into comma-seperated list and stores as variable named 'new_pdb_id_list'.

In [None]:
# Convert the csv file to a python object and then convert the column with PDB ids into list
with open('PDBe_search.csv', 'r') as file:
    reader = csv.reader(file)
    column_1 = [(row[0]) for row in reader]

# Check for whether list item matches the format of a PDB id and only keeps PDB ids in the list
new_pdb_id_list = []
for row in column_1:
    if re.match("[0-9][A-Za-z][A-Za-z0-9]{2}", row):
        new_pdb_id_list.append(row)
    else:
      continue

# Check for duplicates in the list and removes them
new_pdb_id_list = list(dict.fromkeys(new_pdb_id_list ))

print(column_1)
print(new_pdb_id_list)

### 8.4) Generate short summaries for PDB ids from PDBe search results

The below code will take the variable named 'new_pdb_id_list' and use

In [None]:
# This code cell takes the list of PDB ids from the PDBe search.
# The code below will generate the output that can be copy and pasted into the next code block.
list_to_summaries(new_pdb_id_list)


### PRACTICE:

### Please take the output from the above code cell and copy-and-paste it into the below code cell.

### Run the code to generate a set of summaries.




In [16]:
# Copy-and-paste the code from above output here:


## This ends the first notebook - please proceed to other notebooks of your interest

Copyright 2024 EMBL - European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.