<a href="https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/search_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RCSB PDB Search API: Quickstart

This Quickstart notebook will walk through the basics of creating and executing queries using the `rcsbapi.search` module of the `rcsb-api` package. For more in-depth documentation, reference the [readthedocs page](https://rcsbapi.readthedocs.io/en/latest/search_api/quickstart.html).

\
Before beginning, you must install the package:

```pip install rcsb-api```

In [None]:
%pip install rcsb-api

In [17]:
from rcsbapi.search import TextQuery, AttributeQuery
from rcsbapi.search import search_attributes as attrs

## Full-text search
To perform a "full-text" search for structures associated with the term "Hemoglobin", you can create a `TextQuery`:

In [18]:
# Search for structures associated with the phrase "Hemoglobin"
query = TextQuery(value="Hemoglobin")

# Execute the query by running it as a function
results = query()

# Results are returned as an iterator of result identifiers.
for rid in results:
    print(rid)

2PGH
3PEL
3GOU
1NGK
6IHX
1FHJ
4YU3
1G08
1G09
1G0A
2QSP
5C6E
3CIU
2QLS
3PI8
3PI9
3PIA
1FSX
1QPW
6II1
3GKV
1G0B
1HV4
2ZLU
4YU4
2ZFB
3WR1
1HBR
2D5X
2QSS
2ZLW
3D4X
6R2O
1XQ5
8PUQ
1HDS
3GQP
2RAO
3K8B
2GTL
1SPG
1V75
1WMU
2Z6N
1IBE
8PUR
1HDA
1V4X
6SVA
1IWH
2H8D
2H8F
2ZLT
2ZLV
5LFG
2ZLX
2QRW
3EOK
1S5X
3WTG
1S5Y
1T1N
2PEG
3EU1
3FS4
3MJP
6RP5
8WIY
1NS6
1NS9
2B7H
3GQG
2AA1
3GDJ
3NFE
3NG6
1C40
1A4F
3MJU
6ZMX
1FAW
1GCV
3D1A
1LA6
1S0H
2DHB
2MHB
3A0G
3FH9
3HYU
4G51
1V4U
1V4W
1Y8H
1Y8I
1Y8K
3LQD
1SHR
1SI4
1Y01
3GQR
4X0J
1HBH
8WIX
8WIZ
1CG5
1CG8
1OUU
2QMB
6ZMY
1OUT
1W09
1W0A
1W0B
1XZY
1DLW
1Z8U
3BOM
3CY5
3VRF
3VRG
4IRO
1GCW
2R80
3BCQ
4ODC
2QU0
3GYS
3IA3
7QU4
1FN3
4H2L
1JEB
3A59
1PBX
3DHT
1FDH
3VRE
6LCW
6LCX
1I3D
1I3E
5EUI
4MQJ
6BB5
6NQ5
3AQ5
3D7O
3HF4
3ONZ
3OO5
2R1H
5X2R
5X2U
5HY8
1YZI
4L7Y
4MQK
5UCU
6KAO
6KAP
6KAQ
6KAR
6L5V
6L5W
3DHR
3AT5
3AT6
1MKO
1NQP
2KSC
2D2M
2D2N
6KA9
6KAE
6KAH
6KAI
1CMY
1R1X
3HRW
1G9V
1YH9
1YHE
1YHR
1YIE
1YIH
2D5Z
2D60
2M6Z
3DUT
3S66
5U3I
5UFJ
6BWP
6DI4
6KAS
6KAT
6KAU
6KAV
6L5X


## Attribute search
To perform a search for specific structure or chemical attributes, you can create an `AttributeQuery`.

In [19]:
# Construct a query searching for structures from humans
query = AttributeQuery(
    attribute="rcsb_entity_source_organism.scientific_name",
    operator="exact_match",  # Other operators include "contains_phrase", "exists", and more
    value="Homo sapiens"
)

# Execute query and construct a list from results
results = list(query())
print(results)

['10GS', '11GS', '121P', '12CA', '12GS', '133L', '134L', '13GS', '14GS', '16GS', '17GS', '18GS', '19GS', '1A00', '1A01', '1A02', '1A07', '1A08', '1A09', '1A0L', '1A0N', '1A0U', '1A0Z', '1A12', '1A17', '1A1A', '1A1B', '1A1C', '1A1E', '1A1M', '1A1N', '1A1O', '1A1U', '1A1W', '1A1X', '1A1Z', '1A22', '1A27', '1A28', '1A2B', '1A2C', '1A31', '1A35', '1A36', '1A3B', '1A3E', '1A3K', '1A3N', '1A3O', '1A3Q', '1A3S', '1A42', '1A46', '1A4I', '1A4P', '1A4R', '1A4V', '1A4W', '1A4Y', '1A52', '1A5E', '1A5G', '1A5H', '1A5R', '1A5Y', '1A61', '1A66', '1A6A', '1A6Q', '1A6Y', '1A6Z', '1A7A', '1A7C', '1A7F', '1A7S', '1A7X', '1A81', '1A85', '1A86', '1A8E', '1A8F', '1A8J', '1A8M', '1A93', '1A9B', '1A9E', '1A9N', '1A9U', '1A9W', '1AA2', '1AA9', '1AAP', '1AAX', '1AB2', '1ABI', '1ABJ', '1ABN', '1ABW', '1ABY', '1AD0', '1AD5', '1AD6', '1AD8', '1AD9', '1ADQ', '1ADS', '1ADX', '1ADZ', '1AE5', '1AE8', '1AFE', '1AFO', '1AGB', '1AGC', '1AGD', '1AGE', '1AGF', '1AGN', '1AGP', '1AGW', '1AH1', '1AHT', '1AHW', '1AI0', '1AI8',

Refer to the [Search Attributes](https://search.rcsb.org/structure-search-attributes.html) and [Chemical Attributes](https://search.rcsb.org/chemical-search-attributes.html) documentation for a full list of attributes and applicable operators.

Alternatively, you can also construct attribute queries with comparative operators using the `search_attributes` object (which also allows for names to be tab-completed):

In [20]:
# Search for structures from humans
query = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"

# Run query and construct a list from results
results = list(query())
print(results)

['10GS', '11GS', '121P', '12CA', '12GS', '133L', '134L', '13GS', '14GS', '16GS', '17GS', '18GS', '19GS', '1A00', '1A01', '1A02', '1A07', '1A08', '1A09', '1A0L', '1A0N', '1A0U', '1A0Z', '1A12', '1A17', '1A1A', '1A1B', '1A1C', '1A1E', '1A1M', '1A1N', '1A1O', '1A1U', '1A1W', '1A1X', '1A1Z', '1A22', '1A27', '1A28', '1A2B', '1A2C', '1A31', '1A35', '1A36', '1A3B', '1A3E', '1A3K', '1A3N', '1A3O', '1A3Q', '1A3S', '1A42', '1A46', '1A4I', '1A4P', '1A4R', '1A4V', '1A4W', '1A4Y', '1A52', '1A5E', '1A5G', '1A5H', '1A5R', '1A5Y', '1A61', '1A66', '1A6A', '1A6Q', '1A6Y', '1A6Z', '1A7A', '1A7C', '1A7F', '1A7S', '1A7X', '1A81', '1A85', '1A86', '1A8E', '1A8F', '1A8J', '1A8M', '1A93', '1A9B', '1A9E', '1A9N', '1A9U', '1A9W', '1AA2', '1AA9', '1AAP', '1AAX', '1AB2', '1ABI', '1ABJ', '1ABN', '1ABW', '1ABY', '1AD0', '1AD5', '1AD6', '1AD8', '1AD9', '1ADQ', '1ADS', '1ADX', '1ADZ', '1AE5', '1AE8', '1AFE', '1AFO', '1AGB', '1AGC', '1AGD', '1AGE', '1AGF', '1AGN', '1AGP', '1AGW', '1AH1', '1AHT', '1AHW', '1AI0', '1AI8',

## Grouping sub-queries

You can combine multiple queries using Python bitwise operators. 

In [21]:
# Query for human epidermal growth factor receptor (EGFR) structures (UniProt ID P00533)
#  with investigational or experimental drugs bound
q1 = attrs.rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession == "P00533"
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"

# Structures matching UniProt ID P00533 AND from humans
#  AND (investigational OR experimental drug group)
query = q1 & q2 & (q3 | q4)

# Execute query and print first 10 ids
results = list(query())
print(results[:10])

['2RFD', '3B2U', '3PFV', '3POZ', '3W2S', '3W32', '3W33', '4I22', '4RJ6', '4RJ7']


These examples are in "operator" syntax. You can also make queries in "fluent" syntax. Learn more about both syntaxes and implementation details in [Query Syntax and Execution](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#query-syntax-and-execution).

### Supported Search Services
The list of supported search service types are listed in the table below. For more details on their usage, see [Search Service Types](https://rcsbapi.readthedocs.io/en/latest/search_api/query_construction.html#search-service-types).

|Search service                    |QueryType                 |
|----------------------------------|--------------------------|
|Full-text                         |`TextQuery()`             |
|Attribute (structure or chemical) |`AttributeQuery()`        |
|Sequence similarity               |`SeqSimilarityQuery()`         |
|Sequence motif                    |`SeqMotifQuery()`    |
|Structure similarity              |`StructSimilarityQuery()` |
|Structure motif                   |`StructMotifQuery()`      |
|Chemical similarity               |`ChemSimilarityQuery()`   |

Learn more about available search services on the [RCSB PDB Search API docs](https://search.rcsb.org/#search-services).

For more in-depth documentation, go to [readthedocs](https://rcsbapi.readthedocs.io/en/latest/index.html)