Open and run this in Google Colab: <a href="https://colab.research.google.com/github/rcsb/rcsb-training-resources/blob/master/example-use-cases/ligand_file_download/ligand_file_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Specific CC to PDB Documentation

## Introduction

This notebook demonstrates how to query the RCSB Search API to generate a mapping between chemical component IDs (ligands) and PDB structures that contain them. By leveraging the `rcsb-api` Python package, it performs exact-match attribute searches and compiles the results into a simple TSV file for downstream analysis or curation.

### Inputs and Associated Variables

* **`ccids`**: A list of chemical component IDs (e.g., `"ATP"`, `"GTP"`, `"KYW"`) representing ligands of interest.
* **`AttributeQuery`**: A query class from the `rcsbapi.search` module that enables structured searches against PDB metadata fields.
* **`output_file`**: File path (`"cc-to-pdb.tsv"`) to store the final ligand-to-PDB ID mapping.

### Outputs and Associated Variables

* **`cc-to-pdb.tsv`**: A tab-separated file where each line contains a ligand ID and the list of PDB IDs that contain it.

### Major Steps in the Coding Process

1. **Construct and execute attribute queries** for each input ligand to identify PDB entries where it occurs.
2. **Collect the results**, converting them to sets of unique PDB IDs.
3. **Write the mapping to a TSV file**, with each row formatted as `<ccid>    <pdb_id1> <pdb_id2> ...`.

### Questions

This notebook helps answer:

* Which PDB entries contain a specific ligand of interest?
* How can I programmatically retrieve ligand-to-PDB mappings using the RCSB API?
* How can this mapping support further structural or biochemical analysis?

### Learning Objectives

By using this notebook, users will learn to:

* Use the `rcsb-api` Python package to query chemical component metadata.
* Automate batch retrieval of PDB associations for a list of ligands.
* Write structured results to a simple, readable TSV file for further use.

### Purpose

This notebook supports ligand-centric analysis of PDB structures by enabling fast, automated construction of ligand-to-PDB mappings. It can serve as a foundational step for workflows in structural bioinformatics, drug discovery, and chemical component tracking across the PDB archive.

## Libraries

Below is a list of libraries that need to be installed and imported to complete the tasks in this notebook.

|      Library     | Abbreviation | Contents                                                                            | Source                                                 |
| :--------------: | :----------: | :---------------------------------------------------------------------------------- | :----------------------------------------------------- |
| `rcsbapi.search` |      N/A     | Module for building and executing structured attribute queries against the RCSB PDB | [rcsb-api on PyPI](https://pypi.org/project/rcsb-api/) |

> **Note:**
> The external dependency `rcsb-api` can be installed via:
>
> ```bash
> pip install rcsb-api
> ```


## Notebook Contents

The next section of the notebook contains all the raw code for this example. **Experienced coders** can copy and use this directly, whether in a Jupyter notebook, script, or other development environment.

For **novice and intermediate coders**, the code is organized into sequential steps that walk through the entire process. This notebook includes the following components:

1. **Install and Import Required Libraries**
   Install and import the `rcsb-api` Python package used to perform structured searches against the RCSB PDB.

2. **Define a Function to Query Ligand-to-PDB Mappings**
   Create a function that uses attribute-based search to find which PDB entries contain each ligand (chemical component ID).

3. **Write the Mappings to a TSV File**
   Format the results into a tab-separated file where each row shows a ligand ID and the corresponding list of PDB IDs.

4. **Run the Search and Save the Output**
   Define input ligand IDs, run the function, and generate the output file (`cc-to-pdb.tsv`) for downstream analysis.

In [None]:
# These imports are from the RCSB API Python package.
# You can install it using:
#     pip install rcsb-api
from rcsbapi.search import AttributeQuery


# This function queries the RCSB Search API for specific chemical component IDs (CCIDs)
# and finds all PDB entries that contain each of them.
# It writes a mapping from CCID to corresponding PDB IDs to a TSV file.
def get_specific_cc_to_pdb_and_write_to_file(ccids):
    chem_comp_to_pdb_map = {}

    # Iterate through each chemical component ID (e.g., "ATP", "GTP", etc.)
    for ccid in ccids:
        # Build a query for an exact match on the chemical component ID
        query = AttributeQuery(
            attribute="rcsb_chem_comp_container_identifiers.comp_id",  # Path to the comp_id field
            operator="exact_match",                                    # Use exact string matching
            value=ccid                                                 # Current CCID being searched
        )

        # Execute the query and collect PDB IDs that contain this CCID
        results = list(query(return_type="entry"))  # Get entries (PDB IDs) as the return type
        pdb_ids = {r for r in results}              # Convert results into a set of unique PDB IDs

        # Store the mapping: CCID â†’ set of associated PDB IDs
        chem_comp_to_pdb_map[ccid] = pdb_ids

    # Write the final mapping to a TSV file
    # Format: <ccid>    <pdb_id1> <pdb_id2> ...
    output_file = "cc-to-pdb.tsv"
    with open(output_file, "w", encoding="utf-8") as f:
        for ccid, pdb_ids in chem_comp_to_pdb_map.items():
            f.write(f"{ccid}\t{' '.join(sorted(pdb_ids))}\n")

    return output_file


# Example usage
ccids = ["ATP", "GTP", "KYW"]  # List of CCIDs to search for
output_path = get_specific_cc_to_pdb_and_write_to_file(ccids)
print(f"File saved at: {output_path}")


# Stepwise Code for NOVICE and INTERMEDIATE CODERS

This notebook section demonstrates querying the RCSB Search API for specific chemical component IDs (CCIDs) and retrieving all PDB entries that contain them. The results are saved as a mapping from CCID to PDB IDs in a TSV file.

## Step 1: Install Required Library

We need the `rcsb-api` Python package. Install it with:

In [None]:
!pip install rcsb-api

## Step 2: Import Required Libraries

Import the `AttributeQuery` class used to build and execute search queries.

In [None]:
from rcsbapi.search import AttributeQuery

## Step 3: Define Function to Query Specific CCIDs and Write Output

This function takes a list of CCIDs, queries the RCSB Search API for each one, and saves the mapping to a TSV file.

In [None]:
def get_specific_cc_to_pdb_and_write_to_file(ccids):
    chem_comp_to_pdb_map = {}

    for ccid in ccids:
        query = AttributeQuery(
            attribute="rcsb_chem_comp_container_identifiers.comp_id",
            operator="exact_match",
            value=ccid
        )

        results = list(query(return_type="entry"))
        pdb_ids = {r for r in results}
        chem_comp_to_pdb_map[ccid] = pdb_ids

    output_file = "cc-to-pdb.tsv"
    with open(output_file, "w", encoding="utf-8") as f:
        for ccid, pdb_ids in chem_comp_to_pdb_map.items():
            f.write(f"{ccid}\t{' '.join(sorted(pdb_ids))}\n")

    return output_file

## Step 4: Run the Function with Example CCIDs

Use a sample list of CCIDs to generate the output TSV file.

In [None]:
ccids = ["ATP", "GTP", "KYW"]  # Example chemical component IDs
output_path = get_specific_cc_to_pdb_and_write_to_file(ccids)
print(f"File saved at: {output_path}")

## Output

The output `cc-to-pdb.tsv` file contains lines like:
- ATP 1A2B 2C3D 4E5F ...
- GTP 3F4G 5H6J 7K8L ...
- KYW 1M2N 3O4P 5Q6R ...

Each line lists a chemical component ID followed by the PDB IDs where it occurs.