<a href="https://colab.research.google.com/github/rcsb/py-rcsb-api/blob/master/notebooks/sequence_coord_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RCSB PDB Sequence Coordinates API: Quickstart and examples

This notebook provides a quickstart to the `rcsbapi.sequence` module, which enables access to the RCSB PDB [Sequence Coordinates API](https://sequence-coordinates.rcsb.org/#sequence-coordinates-api) service.

For further details and documentation, please refer to the [readthedocs: Sequence](https://rcsbapi.readthedocs.io/en/latest/seq_api/quickstart.html).

## Installation
Start by installing the package:

    pip install rcsb-api

In [None]:
%pip install rcsb-api

In [None]:
from rcsbapi.sequence import Alignments, Annotations

## Getting Started

The [RCSB PDB Sequence Coordinates API](https://sequence-coordinates.rcsb.org/#sequence-coordinates-api) allows querying for alignments between structural and sequence databases as well as protein positional annotations/features integrated from multiple resources. Alignment data is available for NCBI [RefSeq](https://www.ncbi.nlm.nih.gov/refseq/) (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from [UniProt](https://www.uniprot.org/), [CATH](https://www.cathdb.info/), [SCOPe](https://scop.berkeley.edu/) and [RCSB PDB](https://www.rcsb.org/) and collected from the [RCSB PDB Data Warehouse](https://data.rcsb.org/#data-api).

Alignments and positional features provided by this API include Experimental Structures from the [PDB](https://www.rcsb.org/) and [select Computed Structure Models (CSMs)](https://www.rcsb.org/docs/general-help/computed-structure-models-and-rcsborg#what-csms-are-available). Alignments and positional features for CSMs can be requested using the same parameters as Experimental Structures providing CSM Ids.

The API supports requests using [GraphQL](https://graphql.org/), a language for API queries. This package simplifies generating queries in GraphQL syntax. 

There are two main types of queries: `Alignments` and `Annotations`.

### Alignments

`Alignments` queries request data about alignments between an object in a supported database to all objects of another supported database.

In [None]:
from rcsbapi.sequence import Alignments

# Fetch alignments between a UniProt Accession and PDB Entities
query = Alignments(
    db_from="UNIPROT",
    db_to="PDB_ENTITY",
    query_id="P01112",
    return_data_list=["query_sequence", "target_alignments", "alignment_length"]
)
result_dict = query.exec()
print(result_dict)

### Annotations
`Annotations` queries request annotation data about a sequence (e.g., residue-level annotations/features). Protein positional features are integrated from [UniProt](https://www.uniprot.org/), [CATH](https://www.cathdb.info/), [SCOPe](https://scop.berkeley.edu/) and [RCSB PDB](https://www.rcsb.org/) and collected from the [RCSB PDB Data Warehouse](https://data.rcsb.org/#data-api). 

In [None]:
from rcsbapi.sequence import Annotations

# Fetch all positional features for a particular PDB Instance
query = Annotations(  # type: ignore
    reference="PDB_INSTANCE",
    query_id="2UZI.C",
    sources=["UNIPROT"],
    return_data_list=["target_id", "features"]
)
result_dict = query.exec()
print(result_dict)

## Additional Usage and Examples

The `rcsbapi.sequence` module also supports more advanced query types such as `GroupAlignments`, `GroupAnnotations`, and `GroupAnnotationsSummary` as well as the use of `filters` to apply to the returned result set.

The examples below illustrate the usage of these query types.

### Alignments Query with Range
Filter alignments to a particular range:

In [None]:
from rcsbapi.sequence import Alignments

# Only return alignments data that fall in given range
query = Alignments(
    db_from="NCBI_PROTEIN",
    db_to="PDB_ENTITY",
    query_id="XP_642496",
    range=[1, 100],
    return_data_list=["target_alignments"]
)
query.exec()

### Annotations Query with Filter
You can use the `filters` argument in combination with `AnnotationFilterInput` to select which annotations to retrieve.

For example, to select just the binding site annotations:

In [None]:
from rcsbapi.sequence import Annotations, AnnotationFilterInput

# Fetch protein-ligand binding sites for PDB Instances of UniProt Q6P1M3
query = Annotations(
    reference="UNIPROT",
    query_id="Q6P1M3",
    sources=["PDB_INSTANCE"],
    filters=[
        AnnotationFilterInput(
            field="TYPE",
            operation="EQUALS",
            values=["BINDING_SITE"],
            source="PDB_INSTANCE"
        )
    ],
    return_data_list=["target_id", "features"]
)
query.exec()

### GroupAlignments
Use `GroupAlignments` to get alignments for groups of sequences (e.g., for [UniProt P01112](https://www.rcsb.org/groups/sequence/polymer_entity/P01112)).

In [None]:
from rcsbapi.sequence import GroupAlignments

query = GroupAlignments(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    return_data_list=["target_alignments.aligned_regions", "target_id"],
)
query.exec()

### GroupAlignments with Filter

To filter the results down to specific set of PDB entity IDs, use the `filter` option:

In [None]:
from rcsbapi.sequence import GroupAlignments

query = GroupAlignments(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    return_data_list=["target_alignments.aligned_regions", "target_id"],
    filter=["8CNJ_1", "8FG4_1"]
)
query.exec()

### GroupAnnotations

Use `GroupAnnotations` to get annotations for groups of sequences.

In [None]:
from rcsbapi.sequence import GroupAnnotations

query = GroupAnnotations(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    sources=["PDB_ENTITY"],
    return_data_list=["features.name","features.feature_positions", "target_id"]
)
query.exec()

### GroupAnnotations with Filter

Use the `filters` argument in combination with `AnnotationFilterInput` to select which annotations to retrieve.

In [None]:
from rcsbapi.sequence import GroupAnnotations, AnnotationFilterInput

# Fetch only "BINDING_SITE" annotations from PDB instances
query = GroupAnnotations(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    sources=["PDB_INSTANCE"],
    filters=[
        AnnotationFilterInput(
            field="TYPE",
            operation="EQUALS",
            values=["BINDING_SITE"],
            source="PDB_INSTANCE"
        )
    ],
    return_data_list=["features.name", "features.type", "features.feature_positions", "target_id"],
)
query.exec()

### GroupAnnotationsSummary
Use `GroupAnnotationsSummary` to get annotations summaries for groups of sequences.

In [None]:
from rcsbapi.sequence import GroupAnnotationsSummary

query = GroupAnnotationsSummary(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    sources=["PDB_INSTANCE"],
    return_data_list=["target_id", "features.type", "features.value"]
)
query.exec()

### GroupAnnotationsSummary with Filter

Use the `filters` argument in combination with `AnnotationFilterInput` to select which annotation summaries to retrieve.

In [None]:
from rcsbapi.sequence import GroupAnnotationsSummary, AnnotationFilterInput

# Fetch only the "LIGAND_INTERACTION" annotation summary information
query = GroupAnnotationsSummary(
    group="MATCHING_UNIPROT_ACCESSION",
    group_id="P01112",
    sources=["PDB_INSTANCE"],
    filters=[
        AnnotationFilterInput(
            field="TYPE",
            operation="EQUALS",
            values=["LIGAND_INTERACTION"],
            source="PDB_INSTANCE"
        )
    ],
    return_data_list=["target_id", "features.type", "features.value"]
)
query.exec()