<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/python-rcsb-api/search_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `rcsb-api` to access RCSB PDB's Search API

In [None]:
# Install `rcsb-api`
%pip install --upgrade rcsb-api

## Creating a Search API Query

We'll start by using `TextQuery` and `AttributeQuery` to search for PDB IDs associated with the phrase "Hemoglobin" and from Homo sapiens.

In [None]:
from rcsbapi.search import TextQuery, AttributeQuery

# Search for structures associated with the phrase "Hemoglobin"
q1 = TextQuery(value="Hemoglobin")
# Search for structures with Homo sapiens as a source organism
q2 = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)

# Use operators to combine queries
# & = AND
# | = OR
# ~ = NOT
query = q1 & q2

In [None]:
# Execute the query by running it as a function
results = query()

In [None]:
# Results are returned as an iterator of result identifiers.
for rid in results:
    print(rid)

In [None]:
# Can also convert results to a list
list(results)

By default, queries return only the ids of experimentally-determined models. You can control whether Computed Structure Models (CSMs) are returned through the `return_content_type` parameter.

In [None]:
# Using the above query, return both experimental models and CSMs
results = query(return_content_type=["computational", "experimental"])
list(results)

The Search API offers many other types of search besides `TextQuery` and `AttributeQuery` and these are also supported by the Python package.

To find examples using the below queries, check out our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html) page.

|Search service                    |QueryType                 |
|----------------------------------|--------------------------|
|Full-text                         |`TextQuery()`             |
|Attribute (structure or chemical) |`AttributeQuery()`        |
|Sequence similarity               |`SeqSimilarityQuery()`    |
|Sequence motif                    |`SeqMotifQuery()`         |
|Structure similarity              |`StructSimilarityQuery()` |
|Structure motif                   |`StructMotifQuery()`      |
|Chemical similarity               |`ChemSimilarityQuery()`   |

Our Search API and Search API package support `facets`, which will sort results into buckets based on the returned values. This allows you to calculate statistics for your results.

For example, you can search for how many models released since 2000 were determined by each experimental method.

In [17]:
from rcsbapi.search import AttributeQuery
from rcsbapi.search import Facet

# Define the query
q = AttributeQuery(
    attribute="rcsb_accession_info.initial_release_date",
    operator="greater",
    value="2000-01-01",
) 

In [18]:
# Add a facet when executing the query
results = q(
    facets= Facet(
        name="Experimental Methods",
        aggregation_type="terms",
        attribute="exptl.method",
        min_interval_population=1000
    )
)

In [None]:
# Accessing facet results
results.facets

For more information on using `facets`, check out our [API documentation](https://search.rcsb.org/#using-facets) and [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#faceted-query-examples).

You can find additional examples utilizing other API features like [grouping results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#groupby-example) and [sorting results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#sort-example) on our readthedocs.

## Visualizing and Manipulating Queries

Once you have a query constructed, you can look at the full query in our Search API query editor or in the advanced query builder.

In [None]:
from rcsbapi.search import TextQuery
from rcsbapi.search import search_attributes as attrs

# Query for structures associated with phrase "interleukin" from Homo sapiens with investigational or experimental drugs

q1 = TextQuery("interleukin")
# You can also make `AttributeQuery`s using `search_attributes` object and operators
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

# Construct the query using AND and OR operators
query = q1 & q2 & (q3 | q4)
results = query()

In [None]:
# Get link to Search API query editor
results.get_editor_link()

In [None]:
# Get link to advanced query builder populated with query
# From the builder, you can edit query and also access query editor
# TODO: not populating correctly?
results.get_query_builder_link()

## Exploring the API Schema

The package offers ways to explore attributes and their descriptions.

In [None]:
from rcsbapi.search import search_attributes as attrs

# Search attributes based on a string or regex pattern
attrs.search("ligand")

In [None]:
# If you already know the name of the attribute, you can search using `get_attribute_details`
attrs.get_attribute_details(attribute="rcsb_ligand_neighbors.ligand_is_bound")

You can also look at our Search API documentation. Attribute information is split into [structure attributes](https://search.rcsb.org/structure-search-attributes.html) and [chemical attributes](https://search.rcsb.org/chemical-search-attributes.html).

If you've built a query in the advanced query editor and would like to know the corresponding Search API attribute, you can check this [attribute details](https://www.rcsb.org/docs/search-and-browse/advanced-search/attribute-details) page

## Further Documentation

See our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/api.html) page for additional examples using different search services and features like facets and grouping.