<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/search_api_streamlining_access_to_rcsb_pdb_apis_with_python/data_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `py-rcsb-api` to access RCSB PDB's Data API

In [None]:
# Install `py-rcsb-api`
%pip install --upgrade rcsb-api

## Creating a Data API Query

A few arguments are required to create a query:

`input_type`: defines the starting point of your query. Some examples include `entries`, `polymer_entities`, `chem_comps`.

`input_ids`: the specific identifiers of given `input_type` that you would like to search for. If you're unsure which `input_type` to choose, you can usually use `entries`

|Type|PDB ID Format|Example|
|---|---|---|
|entries|entry id|4HHB|
|polymer, branched, or non-polymer entities|[entry_id]_[entity_id]|4HHB_1|
|polymer, branched, or non-polymer entity instances|[entry_id].[asym_id]|4HHB.A|
|biological assemblies|[entry_id]-[assembly_id]|4HHB-1|
|interface|[entry_id]-[assembly_id].[interface_id]|4HHB-1.1|

`return_data_list`: the data to request for each of the given `input_ids`

We'll start by making a Data API query to find the experimental method used to determine PDB entry 4HHB.

In [None]:
from rcsbapi.data import DataQuery as Query

# Create a `DataQuery`/`Query` object
query = Query(
    input_type="entries",
    input_ids=["4HHB"],  # CSM IDs can be used as well
    return_data_list=["exptl.method"]
)

results = query.exec()
print(results)
# print(query.get_response()) would be equivalent

By using the Search API and Data API together, you can refine a list of IDs that are of interest to you and then request data on those particular structures.

In the example below, I selected human structures associated with the phrase "interleukin" with investigational/experimental drugs. Once I've narrowed down my structures of interest, I request the structure's experimental method and resolution.

In [None]:
from rcsbapi.search import TextQuery
from rcsbapi.search import search_attributes as attrs

# Query for structures associated with phrase "interleukin" from Homo sapiens with investigational or experimental drugs

q1 = TextQuery("interleukin")
# You can also make `AttributeQuery`s using `search_attributes` object and operators
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

query = q1 & q2 & (q3 | q4)
results = query()

# Get IDs from Search API query
id_list = list(results)

# Use `id_list` to make Data API query

If you're interested in archive-wide data, you can use ALL_STRUCTURES to request fields for every `entry` or `chem_comp` in the PDB. Note that these queries will of course take longer to complete than queries using fewer structures.