# XLRanker Jupyter Notebook

This Jupyter Notebook provides an example of how to start from a peptide network to the final protein network that can be used for functional analysis.

## Example Dataset

The example data used in this notebook is from HEK293 cell lines. The PPI-XL data was taken from https://pubs.acs.org/doi/10.1021/acs.analchem.1c04485, and the protein data was from https://www.ebi.ac.uk/pride/archive/projects/PXD052801. The notebook handles downloading the data to this instance.

### Data Citations

```txt
Combining Quantitative Proteomics and Interactomics for a Deeper Insight into Molecular Differences between Human Cell Lines Anna A. Bakhtina, Helisa H. Wippel, Juan D. Chavez, and James E. Bruce Journal of Proteome Research 2024 23 (12), 5360-5371 DOI: 10.1021/acs.jproteome.4c00503 
Jiao F, Yu C, Wheat A, Wang X, Rychnovsky SD, Huang L. Two-Dimensional Fractionation Method for Proteome-Wide Cross-Linking Mass Spectrometry Analysis. Anal Chem. 2022 Mar 15;94(10):4236-4242. doi: 10.1021/acs.analchem.1c04485. Epub 2022 Mar 2. PMID: 35235311; PMCID: PMC9056026.
```

## Install xlranker

If `xlranker` is not already installed, run the below command

In [None]:
%pip install xlranker # install xlranker

^C


## Data Download

In [None]:
import tarfile
from io import BytesIO

import requests

# Download the example data
url = "https://github.com/bzhanglab/xlranker/raw/refs/heads/master/notebooks/downloads/example_data.tar.gz"
response = requests.get(url)
response.raise_for_status()

# Open the tar.gz file and save to /content/
with tarfile.open(fileobj=BytesIO(response.content), mode="r:gz") as tar:
    tar.extractall(path="/content/")
    print("Files extracted to /content/")

## Getting Started

In [None]:
import xlranker
from xlranker.config import config as xlranker_config
from xlranker.util.mapping import FastaType, PeptideMapper

xlranker_config.reduce_fasta = False  # Only accept the longest sequence
xlranker_config.output = "xlranker_output/"  # output folder


xlranker.lib.setup_logging()  # enable logging

xlranker.util.set_seed(10)  # set seed for reproducibility (optional)

mapper = PeptideMapper(
    mapping_table_path="human_2019_04.fasta",
    is_fasta=True,
    fasta_type=FastaType.UNIPROT,
)  # Use custom mapping table from UNIPROT

data_set = xlranker.lib.XLDataSet.load_from_network(
    "init_network.tsv", "omic_data/", custom_mapper=mapper
)

xlranker.run_full_pipeline(data_set)

## Analysis

One key point to evaluate the performance of xlranker's model is by looking at the separation between the prioritized and unprioritized pairs.

To create a chart, install `pandas` and `seaborn`

In [None]:
%pip install pandas seaborn scipy # Install if not done already

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from xlranker.status import PrioritizationStatus

df_dict = {"type": [], "score": []}

for pair in data_set.protein_pairs.values():
    if pair.prioritization_status == PrioritizationStatus.ML_PRIMARY_SELECTED:
        df_dict["type"].append("ML Primary Selected")
        df_dict["score"].append(pair.score)
    elif pair.prioritization_status == PrioritizationStatus.ML_SECONDARY_SELECTED:
        df_dict["type"].append("ML Secondary Selected")
        df_dict["score"].append(pair.score)
    elif pair.prioritization_status == PrioritizationStatus.ML_NOT_SELECTED:
        df_dict["type"].append("ML Not Selected")
        df_dict["score"].append(pair.score)

df = pd.DataFrame(df_dict)
sns.set_theme(context="talk", style="white")
sns.kdeplot(df, hue="type", x="score", fill=True, common_norm=False)
plt.tight_layout()
plt.show()