<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/python-rcsb-api/data_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `rcsb-api` to access RCSB PDB's Data API

In [None]:
# Install `rcsb-api`
%pip install --upgrade rcsb-api

## Creating a Data API Query

We'll start by making a Data API query to find the experimental method used to determine PDB entry 4HHB.

A few arguments are required to create a query:

`input_type`: defines the starting point of your query. Some examples include `entries`, `polymer_entities`, and `chem_comps`. If you're unsure which `input_type` to choose, you can usually use `entries`.

`input_ids`: the identifiers of given `input_type` that you would like to search for. There are specific formats for PDB IDs of different `input_type`s
|Type|PDB ID Format|Example|
|---|---|---|
|entries|entry id|4HHB|
|polymer, branched, or non-polymer entities|[entry_id]_[entity_id]|4HHB_1|
|polymer, branched, or non-polymer entity instances|[entry_id].[asym_id]|4HHB.A|
|biological assemblies|[entry_id]-[assembly_id]|4HHB-1|
|interface|[entry_id]-[assembly_id].[interface_id]|4HHB-1.1|

`return_data_list`: the data to request for each of the given `input_ids`

In [None]:
from rcsbapi.data import DataQuery as Query

# Create a `DataQuery`/`Query` object
query = Query(
    input_type="entries",
    input_ids=["4HHB"],  # CSM IDs can be used as well
    return_data_list=["exptl.method"]
)

In [None]:
# Execute the query using `.exec` method
results = query.exec()

In [None]:
# Response is returned by `.exec`
print(results)

In [None]:
# You can also access the response through the object
print(query.get_response())

By using the Search API and Data API together, you can first refine a list of IDs that are of interest and then request data on those particular structures.

In the example below, we selected human structures associated with the phrase "interleukin" with investigational or experimental drugs. Once we've narrowed down structures of interest, we'll request the structure's experimental method and resolution.

In [None]:
from rcsbapi.search import TextQuery
from rcsbapi.search import search_attributes as attrs

# Query for structures associated with phrase "interleukin" from Homo sapiens with investigational or experimental drugs
q1 = TextQuery("interleukin")
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

search_query = q1 & q2 & (q3 | q4)
results = search_query()

# Get first 50 IDs from Search API query
id_list = list(results)[:50]

In [None]:
from rcsbapi.data import DataQuery as Query

# Use `id_list` to make Data API query
data_query = Query(
    input_type="entries",
    input_ids=id_list,
    return_data_list=["exptl.method", "diffrn_resolution_high.value"]
)

results = data_query.exec()
print(results)

## Searching All Structures

If you're interested in archive-wide data, you can use `ALL_STRUCTURES` to request fields for every `entry` or `chem_comp` in the PDB. Note that these queries will take longer to complete than queries using fewer structures.

In [None]:
from rcsbapi.data import ALL_STRUCTURES
from rcsbapi.data import DataQuery as Query

query = Query(
    input_type="chem_comps",
    input_ids=ALL_STRUCTURES,
    return_data_list=["drugbank_info.drugbank_id"]
)

# Set progress_bar to True to track query's progress
# progress bar shows number of completed batches
results = query.exec(progress_bar=True)

# ALL_STRUCTURES can also be used with entries.
# The below query is resource-intensive so it's commented out by default.
# query = Query(
#     input_type="entries",
#     input_ids=ALL_STRUCTURES,
#     return_data_list=["exptl.method"]
# )
# results = query.exec(progress_bar=True)

## Visualizing and Manipulating Queries

Once you have constructed a query, you can visualize it in our Data API query editor by using the `get_editor_link` method.

In [None]:
from rcsbapi.data import DataQuery as Query

query = Query(input_type="entries", input_ids=["4HHB"], return_data_list=["exptl.method"])
print(query.get_editor_link())

## Exploring the Schema

To explore the Data API schema through the package, you can use the `find_field_names` and `find_paths` methods

In [None]:
from rcsbapi.data import DataSchema

# Initialize a schema object
schema = DataSchema()

# To search for fields use `find_field_names`
schema.find_field_names("ligand")

In [None]:
# Pick your intended field and find the path from your desired `input_type` using `find_paths`
schema.find_paths(
    input_type="entries",
    return_data_name="rcsb_ligand_neighbors"
)

You can also explore our schema through our [Data API query editor](https://data.rcsb.org/graphql/index.html)'s Documentation Explorer

## Further Documentation

For more extensive examples and implementation details visit our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/data_api/quickstart.html).