<a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/training-events/2025/python-rcsb-api/search_data_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install `rcsb-api`
%pip install --upgrade rcsb-api

# Using Search and Data APIs Together

The Search and Data APIs are most powerful when used together.

You can use the Search API to identify structures of interest and then use the Data API to request information about those structures.

In the example below, we will use the Search API to request structures with the HIV protease inhibitor, ritonavir, bound. Then, we will identify which amino acid residues interact with ritonavir using the Data API.

## Search API Query

In [None]:
from rcsbapi.search import search_attributes as attrs

# Search for all structures bound to ritonavir
q1 = attrs.rcsb_chem_comp_annotation.annotation_lineage.id == "J05AE03"
q2 = attrs.rcsb_chem_comp_annotation.type == "ATC"

search_query = q1 & q2
search_results = list(search_query())

# Check first 10
print(search_results[:10])

Once you have this list of structures, you can request data on each structure's interactions with ritonavir

## Data API Query

In [None]:
from rcsbapi.data import DataQuery as Query

# Ligand interactions are contained in instance features
# Use the Search API results for the `input_ids` argument
data_query = Query(
    input_type="entries",
    input_ids=search_results,
    return_data_list=["rcsb_polymer_instance_feature"]
)

data_results = data_query.exec()

print(data_results)

## Parse Results

Responses to Data API queries will be returned in JSON format. Once you get a response, you can parse it into a format that is most helpful for you.

Below, we will parse the results into a nested dictionary where the keys are entries and the values are a list of ligand interactions.

In [None]:
# For easier-to-read output
from pprint import pprint

# We will store our parsed results in the `ligand_interactions` dict
ligand_interactions = {}

# Loop through entry results
entry_responses = data_results["data"]["entries"]
for entry in entry_responses:
    entry_id = entry["rcsb_id"]
    
    # Navigate to instance features, which is where ligand interactions will be noted
    for polymer_entity in entry["polymer_entities"]:
        for instance in polymer_entity["polymer_entity_instances"]:
            for instance_feature in instance["rcsb_polymer_instance_feature"]:
                # Other instance features will also be in this list, so check the "feature_id"
                # Also check for None values
                if (instance_feature["feature_id"]) and ("LIGAND_INTERACTION" in instance_feature["feature_id"]):
                    # Find residue identity, beginning sequence position, and end sequence position for each residue
                    residue_list = instance_feature["feature_positions"]
                    for residue in residue_list:
                        residue_dict = {
                            "residue": residue["beg_comp_id"],
                            "beg_seq_id": residue["beg_seq_id"],
                            "end_seq_id": residue["end_seq_id"]
                        }

                    # Find instance ID
                    instance_id = ""
                    additional_properties_list = instance_feature["additional_properties"]
                    for property in additional_properties_list:
                        if property["name"] == "PARTNER_ASYM_ID":
                            instance_id = property["values"][0]
                            break
                    
                    full_id = f"{entry_id}.{instance_id}"

            # Some instances will have multiple ligand interactions, so check if the instance is already in `ligand_interactions`
            if entry_id in ligand_interactions:
                ligand_interactions[entry_id].append(
                    {full_id: residue_dict}
                )
            else:
                ligand_interactions[entry_id] = [
                    {full_id: residue_dict}
                ]

pprint(ligand_interactions)

This example outlined the general workflow of using the Search API to find a list of structures of interest, then using the Data API to find data about each structure, and finally parsing the Data API response into a data structure that suits your project.

For further documentation on the `rcsb-api` package, check our [readthedocs](https://rcsbapi.readthedocs.io/en/latest/index.html).

# DWP SUGGESTIONS

## Data API Query

In [None]:
from rcsbapi.data import DataQuery as Query

# Ligand interactions are contained in instance features
# Also explicitly request the instance-level `rcsb_id`
# Use the Search API results for the `input_ids` argument
data_query = Query(
    input_type="entries",
    input_ids=search_results,
    return_data_list=["rcsb_polymer_instance_feature", "polymer_entity_instances.rcsb_id"]
)

data_results = data_query.exec()

# Print the first result
print(data_results["data"]["entries"][0])

## Parse Results

Responses to Data API queries will be returned in JSON format. Once you get a response, you can parse it into a format that is most helpful for you.

Below, we will parse the results into a nested dictionary where the keys are entity instance IDs and the values are a list of ligand interactions.

In [None]:
# For easier-to-read output
from pprint import pprint

# We will store our parsed results in the `ligand_interactions` dict
ligand_interactions = {}

def extract_feature_positions(data, feature_type, feature_name):
    """Extract out feature positions for 'feature_type' (e.g., 'LIGAND_INTERACTION')
     with 'feature_name' (e.g., 'ligand RIT')."""
    result = {}
    for entry in data.get("data", {}).get("entries", []):
        for polymer_entity in entry.get("polymer_entities", []):
            for instance in polymer_entity.get("polymer_entity_instances", []):
                rcsb_id = instance.get("rcsb_id")  # Extract instance rcsb_id
                if not rcsb_id:
                    continue
                for feature in instance.get("rcsb_polymer_instance_feature", []):
                    if feature.get("type") == feature_type and feature.get("name") == feature_name:
                        if rcsb_id not in result:
                            result[rcsb_id] = []
                        # Extract the minimal uniquely-identifying details
                        result[rcsb_id].append({
                            "additional_properties": feature.get("additional_properties", []),
                            "description": feature.get("description", ""),
                            "feature_positions": [
                                {"beg_seq_id": pos["beg_seq_id"], "beg_comp_id": pos["beg_comp_id"]}
                                for pos in feature.get("feature_positions", [])
                            ]
                        })
    return result

ligand_interactions = extract_feature_positions(data_results, "LIGAND_INTERACTION", "ligand RIT")

pprint(ligand_interactions)