# rcsbsearchapi quickstart

This notebook contains examples from the rcsbsearchapi [quickstart](https://rcsbsearchapi.readthedocs.io/en/latest/quickstart.html)

In [1]:
from rcsbsearchapi.search import TextQuery, Terminal
from rcsbsearchapi import rcsb_attributes as attrs

## Operator syntax

Here is an example from the RCSB Search API page, using the operator syntax. This query finds symmetric dimers having a twofold rotation with the DNA-binding domain of a heat-shock transcription factor.

Note the use of standard comparison operators (`==`, `>` etc) for rcsb attributes and set operators for combining queries.

In [8]:
# Create terminals for each query
q1 = TextQuery("heat-shock transcription factor")
q2 = attrs.rcsb_struct_symmetry.symbol == "C2"
q3 = attrs.rcsb_struct_symmetry.kind == "Global Symmetry"
q4 = attrs.rcsb_entry_info.polymer_entity_count_DNA >= 1

# combined using bitwise operators (&, |, ~, etc)
query = q1 & q2 & q3 & q4 # AND of all queries

# Call the query to execute it
for assemblyid in query("assembly"): # return type specified as "assembly"
    print(assemblyid)


1FYK-1
1FYL-1
1FYL-2
1FYM-1
3HTS-1
5D5X-1
5D5W-1
5D8K-1
5D8L-1
5D8L-2
7DCI-1
5D5U-1
5D5V-1
7DCJ-1
7DCT-1
7DCT-2
5HDN-1
5HDN-2
8HKC-1
2HAX-1
5D4R-1
5D4R-3
5D4S-1
5D4S-3
7LBW-1
7LBX-1
3FYL-1
3G6P-1
3G6Q-1
3G6Q-2
3G6R-1
3G6T-1
3G6U-1
3G8U-1
3G8X-1
3G97-1
3G99-1
3G9I-1
3G9I-2
3G9J-1
3G9M-1
3G9O-1
3G9P-1
1LAT-1
1GLU-1
1R4O-1
1R4R-1
4NNU-1
6XAV-1
6XAS-1
7UBM-1
5EMC-1
5EMP-1
5EMQ-1
6BQU-1
5VA0-1
5VA7-1
5E69-1
5E6A-1
5E6B-1
5E6C-1
5E6D-1
4HN5-1
4HN6-1
5CBX-1
5CBX-2
5CBY-1
5CBZ-1
5CBZ-2
5CC1-1
5CC1-2
6X6D-1
6X6E-1
6KON-1
6KOQ-1
5ZX2-1
6JCX-1
6KOO-1
6KOP-1
6JCY-1
7F0R-1
5NSS-1
4H10-1
1NFK-1
1DH3-1
8HAL-1
8HAM-1
8HAN-1
2HAN-1
8HSR-1
2V2T-1
8OSJ-1
8OSK-1
8OSL-1
5H3R-1
7ZQS-1
5J5P-1
5J5Q-1
5J5Q-2
3P57-1
7PGV-1
7PGW-1
8HAG-1
8HAH-1
8HAI-1
8HAJ-1
8HAK-1
7PH5-1
7PH6-1
3N97-1
7W9V-1
8BZ1-1
7CVO-1
6S01-1
4I3H-1
4YLN-1
4YLN-2
4YLN-3
7Z1Z-1
7EDX-1
7EG7-1
1LEI-1
1VKX-1
1LE9-1
1LE9-2
7CHW-1
1LE5-1
1LE5-2
2I9T-1
6RI7-1
6GH6-1
4YLO-1
4YLO-2
4YLO-3
5IPM-1
5IPN-1
7ZS9-1
7NKY-1
7NKX-1
6PMI-1
6PMJ-1
8AD1-1
6UU0-1

Attribute names can be found in the [RCSB schema](http://search.rcsb.org/rcsbsearch/v2/metadata/schema). They can also be found via tab completion, or by iterating:

In [9]:
[a.attribute for a in attrs if "authors" in a.attribute]

['citation.rcsb_authors',
 'pdbx_nmr_software.authors',
 'rcsb_primary_citation.rcsb_authors']

## Fluent syntax

Here is the same example using the fluent syntax:

In [10]:
# Start with a Attr or TextQuery, then add terms
results = TextQuery('"heat-shock transcription factor"') \
    .and_("rcsb_struct_symmetry.symbol").exact_match("C2") \
    .and_("rcsb_struct_symmetry.kind").exact_match("Global Symmetry") \
    .and_("rcsb_entry_info.polymer_entity_count_DNA").greater_or_equal(1) \
    .exec("assembly")

# Exec produces an iterator of IDs
for assemblyid in results:
    print(assemblyid)

3HTS-1
1FYK-1
1FYL-1
1FYL-2
1FYM-1
5D5X-1
5D5W-1
5D8K-1
5D8L-1
5D8L-2
7DCI-1
5D5U-1
5D5V-1
5HDN-1
5HDN-2
7DCJ-1
7DCT-1
7DCT-2


## Computed Structure Models

The RCSB PDB Search API page provides information on how to include Computed Models into a search query. Here is a code example below.

This query returns ID's for experimental and computed models associated with "hemoglobin". Queries with only computed models or only experimental models can be made.

In [9]:
q1 = TextQuery("hemoglobin")
# add parameter as a list with either "computational" or "experimental" or both as list values
q2 = q1(return_content_type=["computational", "experimental"])
list(q2)

['2GTL',
 '1HV4',
 '4YU4',
 '4V93',
 '1WMU',
 '1XQ5',
 '2Z6N',
 '4YU3',
 '3GOU',
 '3D4X',
 '2QLS',
 '1G08',
 '1G09',
 '1G0A',
 '3PEL',
 '1XZY',
 '3EU1',
 '3GQP',
 '3GQR',
 '3GYS',
 '3MJP',
 '3PI9',
 '3PIA',
 '1HDS',
 '2ZFB',
 '3A59',
 '3WR1',
 '3BCQ',
 '2D2M',
 '2D2N',
 '2PGH',
 '3CIU',
 '1V4W',
 '1V4X',
 '2RAO',
 '1V75',
 '3CY5',
 '1SHR',
 '1SI4',
 '3WTG',
 '1G0B',
 '3GDJ',
 '3K8B',
 '3PI8',
 '6IHX',
 '3BOM',
 '1Y01',
 '1HBR',
 '3D1A',
 '3DHR',
 '1V4U',
 '3FS4',
 '2B7H',
 '1NQP',
 '6II1',
 '1FHJ',
 '6SVA',
 '3GKV',
 '7E96',
 '7E97',
 '7E99',
 '2R80',
 '3EOK',
 '3LQD',
 '1LA6',
 '3NG6',
 '2D5X',
 '3MJU',
 '1IWH',
 '1SPG',
 '1W09',
 '1W0A',
 '1W0B',
 '2H8F',
 '4NI1',
 '1S5X',
 '1S5Y',
 '2AA1',
 '1QPW',
 '3IA3',
 '2QMB',
 '2QU0',
 '3A0G',
 '1FN3',
 '1FSX',
 '5C6E',
 '1Y8H',
 '1Y8I',
 '1Y8K',
 '3NFE',
 '5LFG',
 '1FAW',
 '2DHB',
 '4ODC',
 '4MQJ',
 '2H8D',
 '4H2L',
 '3HYU',
 '4NI0',
 '4MQK',
 '6R2O',
 '7E98',
 '6HBW',
 '5M3L',
 '2RI4',
 '3FH9',
 '3DHT',
 '3S65',
 '3S66',
 '6NQ5',
 '1RTX',
 

## Return Types

A search query can return different result types when a return type is specified. Below are examples on specifying return types Polymer Entities,

Non-polymer Entities, Polymer Instances, and Molecular Definitions. More information on return types can be found in the RCSB PDB Search API page.

In [12]:
q1 = Terminal("rcsb_entry_container_identifiers.entry_id", "in", ["4HHB"]) # query for 4HHB deoxyhemoglobin

print("Polymer Entities:")
for poly in q1("polymer_entity"): # include return type as a string parameter for query object
    print(poly)
print("Non-polymer Entities:")
for nonPoly in q1("non_polymer_entity"):
    print(nonPoly)
print("Polymer Instances:")
for polyInst in q1("polymer_instance"):
    print(polyInst)
print("Molecular Definitions:")
for mol in q1("mol_definition"):
    print(mol)

Polymer Entities:
4HHB_1
4HHB_2
Non-polymer Entities:
4HHB_3
4HHB_4
Polymer Instances:
4HHB.A
4HHB.B
4HHB.C
4HHB.D
Molecular Definitions:
ALA
ARG
ASN
ASP
CYS
GLN
GLU
GLY
HEM
HIS
LEU
LYS
MET
PHE
PO4
PRO
SER
THR
TRP
TYR
VAL


For a more practical example, see the [Covid-19 notebook](covid.ipynb)