<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/python-rcsb-api/search_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `rcsb-api` to access RCSB PDB's Search API

In [None]:
# Install `rcsb-api`
%pip install --upgrade rcsb-api

## Creating a Search API Query

We'll start by using `TextQuery` and `AttributeQuery` to search for PDB IDs associated with the phrase "Hemoglobin" and from *Homo sapiens*.

In [None]:
from rcsbapi.search import TextQuery, AttributeQuery

# Search for structures associated with the phrase "Hemoglobin"
q1 = TextQuery(value="Hemoglobin")
# Search for structures with Homo sapiens as a source organism
q2 = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)

# Use operators to combine queries
# & = AND
# | = OR
# ~ = NOT
query = q1 & q2

In [None]:
# Execute the query by running it as a function
results = query()

In [None]:
# Results are returned as an iterator of result identifiers.
for rid in results:
    print(rid)

In [None]:
# Can also convert results to a list
list(results)

By default, queries return only the ids of experimentally-determined models. You can control whether Computed Structure Models (CSMs) are returned through the `return_content_type` parameter.

In [None]:
# Using the above query, return both experimental models and CSMs
results = query(return_content_type=["computational", "experimental"])
list(results)

The Search API offers many other types of search besides `TextQuery` and `AttributeQuery` and these are also supported by the Python package.

|Search service                    |QueryType                 |
|----------------------------------|--------------------------|
|Full-text                         |`TextQuery()`             |
|Attribute (structure or chemical) |`AttributeQuery()`        |
|Sequence similarity               |`SeqSimilarityQuery()`    |
|Sequence motif                    |`SeqMotifQuery()`         |
|Structure similarity              |`StructSimilarityQuery()` |
|Structure motif                   |`StructMotifQuery()`      |
|Chemical similarity               |`ChemSimilarityQuery()`   |

Search types can be combined for more specific results. Below we will use sequence similarity and attribute search to identify polymer entities that share 90% sequence identity with the GTPase HRas protein from humans. We'll use attribute search to exclude structures that have more than one mutation.

In [None]:
from rcsbapi.search import AttributeQuery, SeqSimilarityQuery

q1 = SeqSimilarityQuery(
    value = "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH",
    identity_cutoff=0.9
)
# We would like to look at structures with few or no mutations
q2 = AttributeQuery(
    attribute="entity_poly.rcsb_mutation_count",
    operator="less_or_equal",
    value=1
)

query = q1 & q2
# For sequence similarity search, return type should be "polymer_entity"
results = query(return_type="polymer_entity")

print(list(results)[:10])


Another useful search type for comparing structures is structure similarity search. You can use structure similarity search with PDB entry IDs, file urls, or local files.

In [None]:
from rcsbapi.search import StructSimilarityQuery

# Using file_url
q3 = StructSimilarityQuery(
    structure_search_type="file_url",
    file_url="https://files.rcsb.org/view/4HHB.cif",
    file_format="cif"
)
list(q3())

# You can also search using a local file. Check our documentation for examples.

For more examples using these and other search types, check out our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html).

## Faceted Queries

Our Search API and Search API package support `facets`, which will sort results into buckets based on the returned values. This allows you to calculate statistics for your results.

For example, you can search for how many models released since 2000 were determined by each experimental method.

In [None]:
from rcsbapi.search import AttributeQuery
from rcsbapi.search import Facet

# Define the query
q = AttributeQuery(
    attribute="rcsb_accession_info.initial_release_date",
    operator="greater",
    value="2000-01-01",
) 

In [None]:
# Add a facet when executing the query
results = q(
    facets= Facet(
        name="Experimental Methods",
        aggregation_type="terms",
        attribute="exptl.method",
        min_interval_population=1000
    )
)

In [None]:
# Accessing facet results
results.facets

For more information on using `facets`, check out our [API documentation](https://search.rcsb.org/#using-facets) and [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#faceted-query-examples).

You can find additional examples utilizing other API features like [grouping results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#groupby-example) and [sorting results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#sort-example) on our readthedocs.

## Visualizing and Manipulating Queries

Once you have a query constructed, you can look at the full query in our Search API query editor or in the advanced query builder.

In [19]:
from rcsbapi.search import TextQuery
from rcsbapi.search import search_attributes as attrs

# Query for structures associated with phrase "interleukin" from Homo sapiens with investigational or experimental drugs

q1 = TextQuery("interleukin")
# You can also make `AttributeQuery`s using `search_attributes` object and operators
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

# Construct the query using AND and OR operators
query = q1 & q2 & (q3 | q4)
results = query()

In [20]:
# Get link to Search API query editor
results.get_editor_link()

'https://search.rcsb.org/query-editor.html?json=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22full_text%22%2C%22parameters%22%3A%7B%22value%22%3A%22interleukin%22%7D%2C%22node_id%22%3A0%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.scientific_name%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Homo%20sapiens%22%7D%2C%22node_id%22%3A1%7D%5D%7D%2C%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22or%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_groups%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22investigational%22%7D%2C%22nod

In [21]:
# Get link to advanced query builder populated with query
# From the builder, you can edit query and also access query editor
# TODO: not populating correctly?
results.get_query_builder_link()

'https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22full_text%22%2C%22parameters%22%3A%7B%22value%22%3A%22interleukin%22%7D%2C%22node_id%22%3A0%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.scientific_name%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Homo%20sapiens%22%7D%2C%22node_id%22%3A1%7D%5D%7D%2C%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22or%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text_chem%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22drugbank_info.drug_groups%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22investigational%22%7D%2C%22node_id%22%3A0

## Exploring the API Schema

The package offers ways to explore attributes and their descriptions.

In [22]:
from rcsbapi.search import search_attributes as attrs

# Search attributes based on a string or regex pattern
attrs.search("ligand")

[Attr(attribute='rcsb_ligand_neighbors.alt_id', type='text', description='Alternate conformer identifier for the target instance.'),
 Attr(attribute='rcsb_ligand_neighbors.atom_id', type='text', description='The atom identifier for the target instance.'),
 Attr(attribute='rcsb_ligand_neighbors.auth_seq_id', type='text', description='The author residue index for the target instance.'),
 Attr(attribute='rcsb_ligand_neighbors.comp_id', type='text', description='Component identifier for the target instance.'),
 Attr(attribute='rcsb_ligand_neighbors.distance', type='text', description='Distance value for this ligand interaction.'),
 Attr(attribute='rcsb_ligand_neighbors.ligand_alt_id', type='text', description='Alternate conformer identifier for the ligand interaction.'),
 Attr(attribute='rcsb_ligand_neighbors.ligand_asym_id', type='text', description='The entity instance identifier for the ligand interaction.'),
 Attr(attribute='rcsb_ligand_neighbors.ligand_atom_id', type='text', descripti

In [23]:
# If you already know the name of the attribute, you can search using `get_attribute_details`
attrs.get_attribute_details(attribute="rcsb_ligand_neighbors.ligand_is_bound")

Attr(attribute='rcsb_ligand_neighbors.ligand_is_bound', type='text', description='A flag to indicate the nature of the ligand interaction is covalent or metal-coordination.')

You can also look at our Search API documentation. Attribute information is split into [structure attributes](https://search.rcsb.org/structure-search-attributes.html) and [chemical attributes](https://search.rcsb.org/chemical-search-attributes.html).

If you've built a query in the advanced query editor and would like to know the corresponding Search API attribute, you can check this [attribute details](https://www.rcsb.org/docs/search-and-browse/advanced-search/attribute-details) page

## Further Documentation

See our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/api.html) page for additional examples.