# rcsbsearchapi

Access the RCSB advanced search from python: [rcsbsearchapi.readthedocs.io](https://rcsbsearchapi.readthedocs.io)

    pip install rcsbsearchapi


In [9]:
from rcsbsearchapi import rcsb_attributes as attrs, TextQuery

## Demo

We are interested in how the antiviral drug boceprevir interacts with the Covid-19 virus. 
- Source Organism is "COVID-19 virus"
- Structure title contains "protease"
- Bound to ligand "Boceprevir"

[RCSB Query](http://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.taxonomy_lineage.name%22%2C%22operator%22%3A%22exact_match%22%2C%22value%22%3A%22COVID-19%22%2C%22negation%22%3Afalse%7D%2C%22node_id%22%3A0%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22value%22%3A%22protease%22%2C%22negation%22%3Afalse%7D%2C%22node_id%22%3A1%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22chem_comp.name%22%2C%22operator%22%3A%22contains_words%22%2C%22value%22%3A%22Boceprevir%22%2C%22negation%22%3Afalse%7D%2C%22node_id%22%3A2%7D%5D%7D%2C%22return_type%22%3A%22entry%22%2C%22request_info%22%3A%7B%22query_id%22%3A%2270e677a6376b4c5eba8b4f2b73866c92%22%2C%22src%22%3A%22ui%22%7D%7D)

## Operator syntax
- Uses python comparison operators for basic attributes (`==`, `<`, `<=`, etc)
- Combine using set operators (`&`, `|`, `~`, etc)
- Execute queries as functions

In [None]:
q1 = attrs.rcsb_entity_source_organism.taxonomy_lineage.name == "COVID-19 virus"
q2 = TextQuery("protease")
q3 = attrs.chem_comp.name.contains_words("Boceprevir")
q4 = attrs.rcsb_entry_info.resolution_combined > 1.5
query = q1 & q2 & q3 & q4

list(query())

## Fluent syntax

A second syntax is available with a [fluent interface](https://en.wikipedia.org/wiki/Fluent_interface), similar to popular data science packages like tidyverse and Apache Spark. Function calls  are chained together.

Here's an example around a second antiviral, remdesivir. The drug interferes with RNA polymerase, replacing an adenine and causing early chain termination. When integrated into RNA, the nucleotide formed from remdesivir has residue code F86.

In [None]:
attrs.struct.title.contains_phrase("RNA polymerase")\
    .or_(attrs.struct.title).contains_words("RdRp")\
    .and_(attrs.rcsb_entity_source_organism.taxonomy_lineage.name).exact_match("COVID-19 virus")\
    .and_(attrs.rcsb_chem_comp_container_identifiers.comp_id).exact_match("F86")\
    .exec()\
    .iquery()


To retrieve structures from PDB, you can submit a request to [RCSB PDB's ModelServer](https://models.rcsb.org/).

In [23]:
import requests
from pprint import pprint

pdb_id = "7C6S"  # COVID-19 structure bound to Boceprevir
download = False  # Change to True if you would like to download locally

url = f"https://models.rcsb.org/v1/{pdb_id}/full?encoding=cif&copy_all_categories=false&download={download}"

pprint(requests.get(url).content)


(b'data_7C6S\n#\n_model_server_result.job_id            pdmQkksredH1gd1BfEvx4'
 b"Q \n_model_server_result.datetime_utc      '2024-10-04 13:51:21' \n_model_"
 b'server_result.server_version    0.9.11 \n_model_server_result.query_name '
 b'       full \n_model_server_result.source_id         pdb-bcif \n_model_ser'
 b'ver_result.entry_id          7C6S \n#\n_entry.id    7C6S \n#\n_exptl.entry_i'
 b"d    7C6S \n_exptl.method      'X-RAY DIFFRACTION' \n#\nloop_\n_entity.detai"
 b'ls\n_entity.formula_weight\n_entity.id\n_entity.src_method\n_entity.type'
 b'\n_entity.pdbx_description\n_entity.pdbx_number_of_molecules\n_entity.pdbx_'
 b'mutation\n_entity.pdbx_fragment\n_entity.pdbx_ec\n? 33825.547 1 man polymer'
 b" '3C-like proteinase' 1 ? ? 3.4.22.69 \n? 521.693 2 syn non-polymer 'boce"
 b"previr (bound form)' 1 ? ? ? \n? 18.015 3 nat water water 312 ? ? ? \n#\n_c"
 b'ell.angle_alpha         90 \n_cell.angle_beta          116.717 \n_cell.ang'
 b'le_gamma         90 \n_cell.entry_id          

## Try it!

[rcsbsearchapi.readthedocs.io](rcsbsearchapi.readthedocs.io)