<a href="https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/multisearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Enabling Computational Biology Research

This tool can be an integral resource for computational biologists performing data analysis or iterative processes on big datasets from the RCSB PDB. Our tool supports data automation which is essential for any researcher or computational biologists wanting to work with huge datasets. Furthermore, our tool can be incorporated within a larger research workflow to quickly and seamlessly retrieve RCSB PDB data in an automated way.

Below is an example of how a computational biologist may use our tool for data automation to facilitate their research. The first query below finds protein structures with a similar protein sequence to the target protein. The retrieved data are then used as search parameters for a set of iterative search queries that find structurally similar proteins that are bound to small molecules. Then, the researcher can use their own workflow to further investigate how the protein structures and small molecules interact.

In [None]:
%pip install rcsb-api

In [1]:
from rcsbapi.search import SeqSimilarityQuery, AttributeQuery, StructSimilarityQuery

In [2]:
# Search for similar sequences to a protein of interest
q1 = SeqSimilarityQuery("DTHKSEIAHRFKDLGEEHFKGLVLIAFSQYLQQCPFDEHVKLVNEL" + 
                   "TEFAKTCVADESHAGCEKSLHTLFGDELCKVASLRETYGDMADCCE" + 
                   "KQEPERNECFLSHKDDSPDLPKLKPDPNTLCDEFKADEKKFWGKYL" + 
                   "YEIARRHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPKIETM" + 
                   "REKVLTSSARQRLRCASIQKFGERALKAWSVARLSQKFPKAEFVEV" + 
                   "TKLVTDLTKVHKECCHGDLLECADDRADLAKYICDNQDTISSKLKE" + 
                   "CCDKPLLEKSHCIAEVEKDAIPENLPPLTADFAEDKDVCKNYQEAK" + 
                   "DAFLGSFLYEYSRRHPEYAVSVLLRLAKEYEATLEECCAKDDPHAC" +
                   "YSTVFDKLKHLVDEPQNLIKQNCDQFEKLGEYGFQNALIVRYTRKV" + 
                   "PQVSTPTLVEVSRSLGKVGTRCCTKPESERMPCTEDYLSLILNRLC" + 
                   "VLHEKTPVSEKVTKCCTESLVNRRPCFSALTPDETYVPKAFDEKLF" + 
                   "TFHADICTLPDTEKQIKKQTALVELLKHKPKATEEQLKTVMENFVA" +
                   "FVDKCCAADDKEACFAVEGPKLVVSTQTALA")

sequence_similarity_results = list(q1(return_type="polymer_entity"))
print("Sequences similar to query:")
print(sequence_similarity_results)

for i in range(5):
    similar_protein = sequence_similarity_results[i]

    entry_id = similar_protein[:-2]

    # Search for structures with small molecule(s)
    small_molecule_query = AttributeQuery(
        attribute="rcsb_nonpolymer_entity_annotation.comp_id",
        operator="exists",
        value=None
    )

    # Search for structurally similar proteins
    struct_similarity_query = StructSimilarityQuery(
        structure_search_type="entry_id",
        entry_id=entry_id,
        structure_input_type="assembly_id",
        assembly_id="1",  # assemblyid = 1 by default
        operator="strict_shape_match",
        target_search_space="assembly"
    )

    group_query = struct_similarity_query & small_molecule_query

    print("Protein structures similar to", similar_protein, "bound to a small molecule:")
    print(list(group_query("assembly")))

Sequences similar to query:
['3V03_1', '4F5S_1', '4JK4_1', '4OR0_1', '6RJV_1', '6QS9_1', '8KFO_1', '8WDD_1', '4LUF_1', '4LUH_1', '5ORF_1', '6HN0_1', '5ORI_1', '5OSW_1', '5OTB_1', '6HN1_1', '5YXE_1', '5GHK_1', '1AO6_1', '1BJ5_1', '1BM0_1', '1E78_1', '1E7A_1', '1E7B_1', '1E7C_1', '1E7E_1', '1E7F_1', '1E7G_1', '1E7H_1', '1E7I_1', '1GNI_1', '1GNJ_1', '1H9Z_1', '1HA2_1', '1HK1_1', '1HK4_1', '1N5U_1', '1O9X_1', '1UOR_1', '2BX8_1', '2BXA_1', '2BXB_1', '2BXC_1', '2BXD_1', '2BXE_1', '2BXF_1', '2BXG_1', '2BXH_1', '2BXI_1', '2BXK_1', '2BXL_1', '2BXM_1', '2BXN_1', '2BXO_1', '2BXP_1', '2BXQ_1', '2ESG_3', '2I2Z_1', '2I30_1', '2VUE_1', '2VUF_1', '2XSI_1', '2XVQ_1', '2XVU_1', '2XVV_1', '2XVW_1', '2XW0_1', '2XW1_1', '2YDF_1', '3A73_1', '3B9L_1', '3B9M_1', '3JQZ_1', '3JRY_1', '3LU6_1', '3LU7_1', '3LU8_1', '3TDL_1', '3UIV_1', '4BKE_1', '4E99_1', '4EMX_1', '4G03_1', '4G04_1', '4HGK_1', '4HGM_2', '4IW1_1', '4IW2_1', '4K2C_1', '4L8U_1', '4L9K_1', '4L9Q_1', '4LA0_1', '4LB2_1', '4LB9_1', '4N0F_3', '4S1Y_1', '