<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/python-rcsb-api/search_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `rcsb-api` to access RCSB PDB's Search API

In [2]:
# Install `rcsb-api`
%pip install --upgrade rcsb-api

Collecting rcsb-api
  Downloading rcsb_api-1.1.2-py2.py3-none-any.whl.metadata (10 kB)
Collecting rustworkx (from rcsb-api)
  Downloading rustworkx-0.16.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting graphql-core (from rcsb-api)
  Downloading graphql_core-3.2.6-py3-none-any.whl.metadata (11 kB)
Downloading rcsb_api-1.1.2-py2.py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.8/45.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading graphql_core-3.2.6-py3-none-any.whl (203 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.4/203.4 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rustworkx-0.16.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rustworkx, graphql-core, rcsb-api
Successfully insta

## Creating a Search API Query

We'll start by using `TextQuery` and `AttributeQuery` to search for PDB IDs associated with the phrase "Hemoglobin" and from *Homo sapiens*.

In [3]:
from rcsbapi.search import TextQuery, AttributeQuery

# Search for structures associated with the phrase "Hemoglobin"
q1 = TextQuery(value="Hemoglobin")
# Search for structures with Homo sapiens as a source organism
q2 = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)

# Use operators to combine queries
# & = AND
# | = OR
# ~ = NOT
query = q1 & q2

In [4]:
# Execute the query by running it as a function
results = query()

In [5]:
# Results are returned as an iterator of result identifiers.
for rid in results:
    print(rid)

1SHR
1SI4
1Y01
1Z8U
1W09
1W0A
1W0B
3IA3
1I3D
1I3E
1FDH
1FN3
1XZY
4MQJ
6NQ5
6LCW
6LCX
3D7O
3ONZ
5X2R
5X2U
1JEB
7QU4
1YZI
3OO5
4MQK
1MKO
1NQP
6KA9
6KAE
6KAH
6KAI
1G9V
1YH9
1YHE
1YHR
2D5Z
2D60
3DUT
3S66
6BB5
6BWP
6DI4
6KAS
6KAT
6KAU
6KAV
6L5X
6L5Y
4N7N
4N7O
4N7P
1A9W
5X2S
5X2T
1ABY
1QSH
1QSI
1RPS
1XXT
1Y0D
1Y46
1Y4F
1Y4G
1Y4P
1Y85
3D17
3KMF
3OO4
4N8T
4ROL
4ROM
5WOG
5WOH
1J3Y
1UIW
3IC0
3IC2
3S65
5E29
5HY8
1LFQ
1LFT
1LFV
1LFY
1LFZ
3P5Q
1HAB
1HAC
1HBB
1NEJ
1RQ4
1XY0
1XZ5
1XZ7
1XZU
1XZV
1Y09
1Y0A
1Y0C
1Y0T
1Y22
1Y2Z
1Y31
1Y35
1Y45
1Y4B
1Y4Q
1Y4R
1Y4V
1Y5F
1Y5J
1Y5K
1Y7C
1Y7D
1Y7G
1Y7Z
1Y83
1YE1
2DN1
2DN2
2H35
3NL7
4M4A
4M4B
5UCU
6BNR
6KAO
6KAP
6KAQ
6KAR
6L5V
6L5W
6XD9
6XDT
6XE7
1J3Z
1J40
1J41
1LFL
1YFF
5JDO
1BAB
1BBB
1BIJ
1BUW
1BZ0
1CMY
1DKE
1M9P
1R1X
1R1Y
1SDK
1SDL
1XYE
2HBE
2HHD
3HXN
4HHB
4L7Y
4MQC
4MQG
4MQH
4MQI
4NI0
4NI1
6BWU
7DY3
7DY4
1GBV
1HBS
1VWT
1Y8W
1YDZ
1YE0
1YE2
1YEN
1YEO
1YEQ
1YEU
1YEV
1YG5
1YGD
1YGF
1YIE
1YIH
2M6Z
2YRS
3NMM
5KSI
5KSJ
5U3I
5UFJ
6HBW
6HK2
7JJQ
1JY7
3B75
6KYE
1CLS


In [6]:
# Can also convert results to a list
# Show first 10 results
list(results)[:10]

['1SHR',
 '1SI4',
 '1Y01',
 '1Z8U',
 '1W09',
 '1W0A',
 '1W0B',
 '3IA3',
 '1I3D',
 '1I3E']

By default, queries return only the IDs of experimentally-determined models. You can control whether Computed Structure Models (CSMs) are returned through the `return_content_type` parameter.

In [7]:
# Using the above query, return both experimental models and CSMs
results = query(return_content_type=["computational", "experimental"])

# Show first 10 results
list(results)[:10]

['1SHR',
 '1SI4',
 'AF_AFP09105F1',
 '1Y01',
 '1Z8U',
 '1W09',
 '1W0A',
 '1W0B',
 '3IA3',
 'AF_AFP69891F1']

The Search API offers many other types of search besides `TextQuery` and `AttributeQuery` and these are also supported by the Python package.

|Search service                    |QueryType                 |
|----------------------------------|--------------------------|
|Full-text                         |`TextQuery()`             |
|Attribute (structure or chemical) |`AttributeQuery()`        |
|Sequence similarity               |`SeqSimilarityQuery()`    |
|Sequence motif                    |`SeqMotifQuery()`         |
|Structure similarity              |`StructSimilarityQuery()` |
|Structure motif                   |`StructMotifQuery()`      |
|Chemical similarity               |`ChemSimilarityQuery()`   |

Search types can be combined for more specific results. Below we will use sequence similarity and attribute search to identify polymer entities that share 90% sequence identity with the GTPase HRas protein from humans. We'll use attribute search to exclude structures that have more than one mutation.

In [8]:
from rcsbapi.search import AttributeQuery, SeqSimilarityQuery

q1 = SeqSimilarityQuery(
    value = "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH",
    identity_cutoff=0.9
)
# We would like to look at structures with few or no mutations
q2 = AttributeQuery(
    attribute="entity_poly.rcsb_mutation_count",
    operator="less_or_equal",
    value=1
)

query = q1 & q2
# For sequence similarity search, return type should be "polymer_entity"
results = query(return_type="polymer_entity")

# Print first 10 results
print(list(results)[:10])


['121P_1', '1AA9_1', '1BKD_1', '1CRP_1', '1CRQ_1', '1CRR_1', '1CTQ_1', '1GNP_1', '1GNQ_1', '1GNR_1']


Another useful search type for comparing structures is structure similarity search. You can use structure similarity search with PDB entry IDs, file urls, or local files.

In [None]:
from rcsbapi.search import StructSimilarityQuery

# Using file_url
q3 = StructSimilarityQuery(
    structure_search_type="file_url",
    file_url="https://files.rcsb.org/download/4HHB.cif",
    file_format="cif"
)

# Show first 10 results
list(q3())[:10]

# You can also search using a local file. Check our documentation for examples.

['4HHB',
 '1G9V',
 '2HHB',
 '1BZ0',
 '1K0Y',
 '1COH',
 '3HHB',
 '1QSH',
 '1VWT',
 '1BZZ']

For more examples using these and other search types, check out our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html).

## Faceted Queries

Our Search API and Search API package support `facets`, which will sort results into buckets based on the returned values. This allows you to calculate statistics for your results.

For example, you can search for how many models released since 2000 were determined by each experimental method.

In [None]:
from rcsbapi.search import AttributeQuery
from rcsbapi.search import Facet

# Define the query
q = AttributeQuery(
    attribute="rcsb_accession_info.initial_release_date",
    operator="greater",
    value="2000-01-01",
)

In [None]:
# Add a facet when executing the query
results = q(
    facets=Facet(
        name="Experimental Methods",
        aggregation_type="terms",
        attribute="exptl.method",
    )
)

In [None]:
# Accessing facet results
results.facets

For more information on using `facets`, check out our [API documentation](https://search.rcsb.org/#using-facets) and [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#faceted-query-examples).

You can find additional examples utilizing other API features like [grouping results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#groupby-example) and [sorting results](https://rcsbapi.readthedocs.io/en/latest/search_api/additional_examples.html#sort-example) on our readthedocs.

## Visualizing and Manipulating Queries

Once you have a query constructed, you can look at the full query in our Search API query editor.

In [None]:
from rcsbapi.search import TextQuery
from rcsbapi.search import search_attributes as attrs

# Query for structures associated with phrase "interleukin" from Homo sapiens with investigational or experimental drugs

q1 = TextQuery("interleukin")
# You can also make `AttributeQuery`s using `search_attributes` object and operators
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

# Construct the query using AND and OR operators
query = q1 & q2 & (q3 | q4)
results = query()

In [None]:
# Get link to Search API query editor
results.get_editor_link()

## Exploring the API Schema

The package offers ways to explore attributes and their descriptions.

In [None]:
from rcsbapi.search import search_attributes as attrs

# Search attributes based on a string or regex pattern
attrs.search("ligand")

In [None]:
# If you already know the name of the attribute, you can search using `get_attribute_details`
attrs.get_attribute_details(attribute="rcsb_ligand_neighbors.ligand_is_bound")

You can also look at our Search API documentation. Attribute information is split into [structure attributes](https://search.rcsb.org/structure-search-attributes.html) and [chemical attributes](https://search.rcsb.org/chemical-search-attributes.html).

If you've built a query in the advanced query editor and would like to know the corresponding Search API attribute, you can check this [attribute details](https://www.rcsb.org/docs/search-and-browse/advanced-search/attribute-details) page

## Further Documentation

See our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/search_api/api.html) page for additional examples.