# EuBIC-MS Winterschool: Combined Immunopeptidomics Workflow

This comprehensive notebook guides you through a simple immunopeptidomics discovery downstream analysis.

**The Scenario**:
1.  **Discovery**: You have raw mass spec data from a CLL patient (UPN49). Can we find one or more tumor-exclusive peptides such that we can define a peptide vaccine for UPN49 or even CLL in general?
2.  **Reference**: You need to compare it to a healthy baseline (HLA Ligand Atlas).
3.  **Analysis**: You want to find tumor-specific antigens based on a CLL cohort (Waterfall Plot).
4.  **Validation**: You want to check if your reference peptides are actual binders.

In [None]:
# Run this in your terminal to install packages
# pip install pandas matplotlib seaborn requests matplotlib-venn

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib_venn import venn2
import seaborn as sns
import os

# Configure plots
sns.set_theme(style="whitegrid")

## 1. Discovery: Processing CLL Patient Data

You will process the raw data to identify peptides that are presented on **Chronic Lymphocytic Leukemia (CLL)** cells. For this we will use `nf-core/mhcquant`.

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=master&repo=nf-core/mhcquant)
> **Recommendation**: Use the `nf-core/mhcquant` Codespace in this part of the workshop and follow the steps below in the new codespace. You can have a maximum of 2 running codespaces.

-------------

We start with HLA class I raw data from a patient with **Chronic Lymphocytic Leukemia (CLL)** 
- ID: UPN49
- HLA types: A*02:01;B*08:01;B*40:01;C*03:04;C*07:02

The immunopeptidomics data is located here: [PRIDE PXD024871](https://www.ebi.ac.uk/pride/archive/projects/PXD024871).
- Click the FTP button to get to the raw data.
- Download the raw data files (e.g., `UPN49_class_I_Rep1.raw`) by using e.g. `wget` with the full address of the FTP link and the respective file name.

```bash
wget https://ftp.pride.ebi.ac.uk/pride/data/archive/2021/09/PXD024871/UPN49_class_I_Rep1.raw
```

### 1.1 Download Raw Data

**Aim of this task**

Download mass spectrometry raw files from PRIDE for patient **UPN49**:
- **Dataset**: PXD024871 (CLL immunopeptidome study)
- **Patient**: UPN49
- **HLA-I type**: A*02:01;B*08:01;B*40:01;C*03:04;C*07:02
- **Replicates**: 3 technical replicates of immunoprecipitation

<details>
<summary><b>Deep Dive: The CLL Immunopeptidome Study</b></summary>

**Study Background**

Chronic Lymphocytic Leukemia (CLL) is a blood cancer affecting B lymphocytes. The original study profiled the HLA ligandome of CLL patients to identify potential immunotherapy targets.

| Aspect | Details |
|--------|--------|
| **PRIDE ID** | PXD024871 |
| **Disease** | Chronic Lymphocytic Leukemia |
| **Sample type** | Peripheral blood mononuclear cells (PBMCs) |
| **HLA isolation** | Immunoprecipitation with W6/32 (pan-HLA Class I) |
| **MS platform** | Q Exactive Plus (Thermo) |
| **Acquisition** | DDA (Data-Dependent Acquisition) |

**Why UPN49?**

Patient UPN49 is **HLA-A*02:01** positive - one of the most common HLA alleles worldwide (~40% of Caucasians). This makes the analysis:
1. Clinically relevant (applicable to many patients)
2. Well-characterized (good prediction tools for A*02:01)
3. Comparable (many Atlas donors share this allele)

**Data structure**:
```
UPN49_IP_#1.raw  ─┐
UPN49_IP_#2.raw  ─┼─ Technical replicates (same sample, same IP)
UPN49_IP_#3.raw  ─┘
```

</details>

<details>
<summary><b>Reference: Downloading from PRIDE</b></summary>

**PRIDE Archive** stores proteomics datasets with persistent identifiers:

```bash
# FTP download pattern
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/YEAR/MONTH/PXDXXXXXX/filename.raw

# Example for PXD024871
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2021/09/PXD024871/
```

**Using wget**:
```bash
# Single file
wget -P data/raw/ ftp://ftp.pride.ebi.ac.uk/.../file.raw

# Multiple files (use -i with file list)
wget -i files_to_download.txt -P data/raw/
```

**File sizes**: Raw files are typically 0.5-5 GB each. For the workshop, we'll use a subset or pre-processed data if bandwidth is limited.

**Alternative**: Use the PRIDE web interface to browse and download: https://www.ebi.ac.uk/pride/archive/projects/PXD024871

</details>

### 1.2 Run nf-core/mhcquant

**Aim of this task**

Create a samplesheet of UPN49for the nf-core/mhcquant pipeline. The samplesheet tells the pipeline:
- Which files to process
- How to group replicates
- What conditions each sample belongs to

The following columns are required:
- `ID`: Unique identifier for the run (e.g. 1,2,3)
- `Sample`: Sample Name (Patient ID)
- `Condition`: Condition (e.g., Tumor/Healthy)
- `ReplicateFileName`: Path to the raw file

**See**: [nf-core/mhcquant Usage](https://nf-co.re/mhcquant/usage)

Generate a samplesheet of UPN49 for the `nf-core/mhcquant` pipeline.

>**Note**: The samplesheet columns need to be separated by a tab. GitHub Codespaces sometimes uses spaces instead of tabs. Please use the Editor of your choice to generate the samplesheet with tab-separation and copy+paste the samplesheet into the MHCquant codespace.

------------

You can run the pipeline with a simple command in the terminal:
```bash
nextflow run nf-core/mhcquant --input samplesheet.tsv --outdir UPN49_class1 -profile docker
```

Different input data will vary the pipeline parameters of course. For our purpose, the default parameters are sufficient. If you are curious about all parameters you can check out the [parameter documentation](https://nf-co.re/mhcquant/3.1.0/parameters/).

Once the pipeline is finished, you can find the results in the `UPN49_class1` directory. Quickly check out the content of the folder, the most important files are:
- The tsv file(s)
- The MultiQC report <-- Visual summary and quality control of your input runs

A detailed description of the output files can be found [here](https://nf-co.re/mhcquant/3.1.0/output/).

---------------

### Exercise 1: Understanding the Samplesheet

1. If you had two patients (UPN49 and UPN52), each with 3 replicates, how many rows would the samplesheet have?

2. What would happen if you gave each replicate a different `Sample` value instead of grouping them?

<details>
<summary><b>Click to reveal the answers</b></summary>

**Answer 1**: **6 rows** (2 patients × 3 replicates each)

```tsv
ID	Sample	Condition	ReplicateFileName
UPN49_1	UPN49	Tumor	...
UPN49_2	UPN49	Tumor	...
UPN49_3	UPN49	Tumor	...
UPN52_1	UPN52	Tumor	...
UPN52_2	UPN52	Tumor	...
UPN52_3	UPN52	Tumor	...
```

**Answer 2**: The pipeline would treat each replicate as a separate sample:
- No replicate-level summarization
- Peptide frequencies calculated per-replicate instead of per-patient
- Loss of statistical power for quantification

This is a common mistake! Always check that replicates share the same `Sample` value.

</details>


## 2. Reference: The HLA Ligand Atlas

To know what is "tumor-specific", we need to know what is "benign".
We will use the **HLA Ligand Atlas** as our healthy baseline.

### 2.1 Get the Data

**Aim of this task**

We want to download and explore the HLA Ligand Atlas data:

- Download the **donor metadata** (information about each sample)
- Download the **peptide data** (sequences with their HLA associations)
- Join these tables to create a complete dataset

The HLA Ligand Atlas contains peptides identified from healthy human tissues - these are the "self" peptides that our immune system sees every day.

Get the data: https://hla-ligand-atlas.org/data

<details>
<summary><b>Deep Dive: What is the HLA Ligand Atlas?</b></summary>

**The Immunopeptidome and Why It Matters**

Every cell in your body constantly chops up its proteins and presents peptide fragments on its surface via HLA molecules. This is how T cells distinguish "self" from "non-self" (like viruses or cancer).

```
Protein → Proteasome → Peptides → HLA Loading → Cell Surface → T Cell Recognition
```

**The HLA Ligand Atlas** is a database of peptides identified from healthy human tissues:

| Feature | Description |
|---------|-------------|
| **Source** | Benign human tissue samples |
| **Method** | Immunoprecipitation + LC-MS/MS |
| **Content** | ~180,000 unique peptides |
| **HLA Types** | Class I (A, B, C) and Class II (DR, DQ, DP) |
| **Tissues** | 29 different tissue types |

**Why is this useful?**

1. **Basic research**: Which peptides humans present T cells
2. **Cancer immunotherapy**: Identify tumor-specific peptides not in normal tissues
3. **Vaccine design**: Help prioritizing peptides for infectios or cancer vaccines using a benign reference

</details>



<details>
<summary><b>New to Python? Understanding pandas DataFrames</b></summary>

A DataFrame is like an Excel spreadsheet in Python:

**1. Loading data:**
```python
df = pd.read_csv("file.csv")           # From CSV
df = pd.read_csv("file.tsv", sep="\t") # From TSV (tab-separated)
```

**2. Exploring data:**
```python
df.head()              # First 5 rows
df.shape               # (rows, columns)
df.columns             # Column names
df.info()              # Data types and missing values
df['column'].unique()  # Unique values in a column
```

**3. Filtering data:**
```python
# Single condition
filtered = df[df['column'] == 'value']

# Multiple conditions (use & for AND, | for OR)
filtered = df[(df['col1'] == 'A') & (df['col2'] > 10)]

# String methods
filtered = df[df['name'].str.startswith('AUT')]
```

**4. Merging tables:**
```python
merged = pd.merge(df1, df2, on='common_column')
```

</details>

In [None]:
# Load HLA Ligand Atlas DataFrames using pandas
# ...

In [None]:
# Explore the donors table
# ...

In [None]:
# Explore the peptides table
# ...

In [None]:
# Join donor and peptide tables. This connects each peptide to its donor's HLA allele information
# ...

### 2.2 Excursion: HLA Peptide Properties (Optional)

Now that we have a bigger dataset at hand, let's explore the HLA Ligand Atlas in a bit more detail. This analysis helps us understand:
- Are our peptides the expected length for their HLA class?
- Which alleles are most represented?
- How much proteins do these peptides cover?

#### 2.2.1 Peptide Length Distribution

**Aim of this task**

HLA Class I and Class II molecules present peptides of different lengths:
- **Class I**: Typically 8-12 amino acids (closed binding groove)
- **Class II**: Typically 12-25 amino acids (open binding groove)

We will visualize these distributions to confirm our data matches expected biology.

<details>
<summary><b>Deep Dive: Why Different Peptide Lengths?</b></summary>

**HLA Class I vs Class II Structure**

The structural difference in the binding groove determines peptide length:

**Class I (HLA-A, B, C)**:
```
    Closed groove
    ┌─────────────┐
    │ N─peptide─C │  ← Peptide fits snugly (8-12 aa)
    └─────────────┘
    Anchor residues at P2 and P9 (C-terminus)
```

**Class II (HLA-DR, DQ, DP)**:
```
    Open groove (ends open)
    ───┬─────────────┬───
       │   peptide   │     ← Peptide can extend beyond (12-25 aa)
    ───┴─────────────┴───
    Core 9-mer binds, flanking regions extend out
```

**Biological significance**:
- Class I: Presents intracellular peptides to CD8+ cytotoxic T cells
- Class II: Presents extracellular peptides to CD4+ helper T cells

**Expected distributions**:
| Class | Peak Length | Range | Most Common |
|-------|-------------|-------|-------------|
| I | 9 aa | 8-12 aa | 9-mers (~60-70%) |
| II | 15 aa | 12-25 aa | 15-mers |

</details>

In [None]:
# Plot the length distribution of all HLA class I peptides
# ...

In [None]:
# Plot HLA class II peptide length distribution
...

Look at the Class I histogram above and answer:

1. What is the most common peptide length for Class I and II in the HLA Ligand Atlas?
2. Why might we see some 8-mers and 12-mers even though 9-mers are most common?
3. If you would see a large peak at 15-mers in Class I data, what might be wrong?

<details>
<summary><b>Click to reveal the answers</b></summary>

**1. Most common length**: 9 amino acids (9-mers), typically 60-70% of Class I peptides.

**2. 8-mers and 12-mers exist because**:
- HLA binding grooves have some flexibility
- Different HLA alleles prefer slightly different lengths
- HLA-B*08:01 tends to prefer 8-mers; some HLA-B alleles accommodate 10-11 mers
- The binding is determined by anchor residues, not strict length

**3. A 15-mer peak in Class I would indicate**:
- Possible Class II contamination in the sample
- Sample mislabeling (Class II labeled as Class I)
- Cross-contamination during immunoprecipitation
- Data processing error

This is why length distribution analysis is a key quality control step!

</details>

### 2.2.2 Allele Distribution

**Aim of this task**

Explore which HLA alleles are most represented in our dataset. Which alleles are over-represented?

In [None]:
# Count donors per allele and plot a barplot
...

### 2.2.3 Motif Analysis

**Aim of this task**

Investigate the binding motifs of peptides from UPN49

**Steps:**
1. Copy peptides to clipboard
2. Paste into [GibbsCluster-2.0](https://services.healthtech.dtu.dk/services/GibbsCluster-2.0/)
3. Compare results to the [MHC Motif Atlas](http://mhcmotifatlas.org/home)

Do you see which cluster belongs to which allele motif?

<details>
<summary><b>Deep Dive: What Are HLA Binding Motifs?</b></summary>

**Anchor Residues and Binding Pockets**

Each HLA allele has specific "pockets" that prefer certain amino acids:

```
Example: HLA-A*02:01 binding motif (9-mer)

Position:  1   2   3   4   5   6   7   8   9
           │   │                       │   │
           ▼   ▼                       ▼   ▼
Anchor:    -   L   -   -   -   -   -   -   V
           -   M   -   -   -   -   -   -   L
           
P2 = Leucine/Methionine (B pocket)
P9 = Valine/Leucine (F pocket, C-terminus)
```

**Sequence Logos**

A sequence logo visualizes amino acid preferences:
- **Height** = information content (bits) - how conserved is this position?
- **Letters** = amino acids, sized by frequency
- Tall stacks at P2 and P9 indicate strong anchor positions

**How GibbsCluster works**

1. Takes a list of peptides (unknown HLA assignment)
2. Uses unsupervised clustering to group by binding motif
3. Returns predicted motifs and cluster assignments
4. You can then match clusters to known HLA alleles

**Common Class I Motifs**:
| Allele | P2 Anchor | P9 Anchor |
|--------|-----------|----------|
| A*02:01 | L, M | V, L |
| A*01:01 | T, S | Y |
| B*07:02 | P | L |
| B*08:01 | K, R | L |

</details>

Look at the amino acid frequency of each cluster created by GibbsCluster:

1. What amino acids are enriched at position 2 (P2)?
2. What amino acids are enriched at position 9 (P9)?
3. Based on the motif reference below, which HLA allele(s) might this sample have?

<details>
<summary><b>Click for examples</b></summary>

**Reference motifs:**
| Allele | P2 | P9 |
|--------|-----|-----|
| HLA-A*02:01 | L, M | V, L |
| HLA-A*01:01 | T, S | Y |
| HLA-A*03:01 | L, V, M | K, R |
| HLA-B*07:02 | P | L |
</details>


## 3. Comparison: Tumor vs Benign

**Aim of this task**

Compare the CLL immunopeptidome against the benign HLA Ligand Atlas to identify:
- **Tumor-enriched peptides**: Found frequently in tumor, rarely in healthy tissue
- **Shared peptides**: Common to both (normal self-peptides)
- **Benign-enriched peptides**: Found in healthy tissue but not tumor

We need two datasets:
1. **Tumor peptides**: We will now work with the full CLL immunopeptidome dataset (containing UPN49). You can import the dataset from `./data/cll_warehouse_class1.tsv` using pandas.
2. **Benign peptides**: From HLA Ligand Atlas

<details>
<summary><b>Deep Dive: What is a Waterfall Plot?</b></summary>

**Concept**

A waterfall plot visualizes differential peptide presentation between two conditions:

```
Frequency
   ↑
 1.0│  ████                                    Tumor-enriched
    │  ████ ███                                (potential TAAs)
    │  ████ ███ ██                          
 0.5│  ████ ███ ██ █                        
    │  ████ ███ ██ █                        
────┼─────────────────────────────────────── Peptides →
    │                    █ ██ ███ ████      
-0.5│                    █ ██ ███ ████      
    │                       ██ ███ ████████  Benign-enriched
-1.0│                          ███ ████████  (normal self)
   ↓
```

**How to read it**:
- **Bars pointing UP**: Peptide frequency in tumor
- **Bars pointing DOWN**: Peptide frequency in benign
- **Peptides on the left**: Tumor-specific or tumor-enriched (TAA candidates)
- **Peptides on the right**: Benign-specific or benign-enriched

**Ideal TAA candidates**:
- High bar pointing up (frequent in tumor)
- No bar pointing down (absent in benign)

**Why "waterfall"?**
When sorted by tumor frequency, the plot resembles a waterfall cascading down.

</details>

In [None]:
# Load CLL HLA-I data from './data/cll_warehouse_class1.tsv'
# ...

In [None]:
# Filter the HLA Ligand Atlas for HLA-I peptides
# ...

---

### Exercise 2: Predict the Overlap


Before we create the waterfall plot:

1. What percentage of tumor peptides do you expect to find in the benign HLA Ligand Atlas?
2. Why might some true tumor peptides also appear in the benign reference?

<details>
<summary><b>Click to reveal the answers</b></summary>

**Answer 1**: Typically **70-90%** of tumor peptides are also found in benign tissue (depending on the size of the benign dataset).

This might seem surprising, but remember:
- Most peptides are from normal cellular proteins (housekeeping genes)
- The immune system already "sees" these peptides every day
- True tumor-specific peptides are a small minority

**Answer 2**: Reasons for overlap:
1. **Normal self-peptides**: From proteins expressed in both healthy and cancer cells
2. **Tissue-of-origin peptides**: Cancer cells retain some normal tissue markers
3. **Shared HLA alleles**: Different donors present similar peptides if they share HLA types
4. **Technical overlap**: Common contaminants or highly abundant proteins

**The goal** is to find the 10-30% that are tumor-enriched or even tumor-specific!

</details>

---

In [None]:
# Plot venn diagram of HLA Ligand Atlas and CLL Cohohrt

### 3.1 Calculate Peptide Frequencies

**Aim of this task**

Calculate how frequently each peptide appears across samples:

$$\text{Frequency} = \frac{\text{Number of samples with peptide}}{\text{Total samples in group}}$$

**Filtering**: Remove "one-hit wonders" (peptides seen in only 1 sample) to reduce noise.

In [None]:
# Filter out one hit wonders
# ...

# Compute peptide frequencies in CLL dataframe and HLA Ligand Atlas dataframe separately
# ...

### 3.2 Create the Waterfall Plot

**Aim of this task**

Visualize the comparison:
- Tumor frequency as positive (upward) bars
- Benign frequency as negative (downward) bars
- Compute the ratio of tumor to benign frequency
- Sort by this ratio and then by tumor frequency (descending)
- [Reference plot](https://www.frontiersin.org/files/Articles/705974/fimmu-12-705974-HTML/image_m/fimmu-12-705974-g001.jpg)

In [None]:
# Plot waterfall (hint: plot bar per peptide)

---

### Exercise 3: Interpret the Waterfall Plot

Look at the waterfall plot above and answer:

1. What does the "waterfall" shape tell you about the data?
2. Why are the leftmost (tumor-only) bars important?
3. What would a "flat" plot indicate?

<details>
<summary><b>Click to reveal the answers</b></summary>

**Answer 1: Waterfall shape interpretation**

The descending "waterfall" from left to right shows:
- Left side: Peptides with HIGH tumor frequency
- Right side: Peptides with LOW tumor frequency
- The steepness indicates how many peptides are at each frequency level

**Answer 2: Dark red bars (tumor-only)**

These are the most promising TAA candidates because:
- Present in tumor (detected in patient)
- Absent in benign (not normal self-peptide)
- Less likely to cause autoimmunity if targeted
- Potentially tumor-specific or over-expressed

**Answer 3: A flat plot would indicate**
- All peptides at similar frequency
- No clear tumor-enriched or benign-enriched populations
- Could mean:
  - Sample quality issues
  - Wrong comparison (mismatched HLA types)
  - The tumor is very similar to normal tissue

</details>

---

In [None]:
# Show top X most suitable peptides
# ...

In [None]:
# Check if UPN49 has one of the top X peptides
# ...

### 3.3 Re-do waterfall with A*02:01 restricted background (Advanced)
Since UPN49 is A*02:01 positive, let's restrict the Atlas background to design vaccine targets only for A*02:01+ donors.

**Aim of this task**

Re-do the analysis from 3.2, but this time only consider A*02:01+ donors.


In [None]:
# ...

## 5. Bonus: Binding Prediction

Are our CLL peptides actually good HLA binders?
Let's revisit the CLL data and look at the binding predictions that were already computed. 

### 5.1 Binding Affinity vs Percentile Rank

**Key concepts:**

- **Binding Affinity (IC50)**: Predicted concentration for 50% inhibition (Abbrev. `BA`)
  - Lower = Stronger binder
  - <50 nM = Strong binder
  - <500 nM = Weak binder
  - >500 nM = Non-binder

- **Percentile Rank**: How this peptide compares to random peptides (Abbrev.`rank`)
  - Lower = Better binder
  - <0.5% = Strong binder
  - <2% = Weak binder
  - >2% = Non-binder

**We typically use Percentile Rank for filtering** because it's comparable across alleles.

<details>
<summary><b>Deep Dive: IC50 vs Percentile Rank - Which Should You Use?</b></summary>

**The Problem with IC50**

IC50 values are not comparable across HLA alleles:

```
HLA-A*02:01: Average IC50 of good binders = ~50 nM
HLA-B*08:01: Average IC50 of good binders = ~200 nM
```

Using a fixed cutoff (e.g., <500 nM) would include:
- Many weak A*02:01 binders (false positives)
- Too few B*08:01 binders (false negatives)

**The Solution: Percentile Rank**

For each allele, MHCFlurry/NetMHCpan/MixMHCpred...:
1. Predicts binding for 100,000+ random peptides
2. Ranks your peptide against this background
3. Returns the percentile (what % of random peptides bind better)

| Rank | Interpretation | Typical Use |
|------|---------------|-------------|
| <0.5% | Strong binder | High-confidence targets |
| 0.5-2% | Weak binder | Include with caution |
| >2% | Non-binder | Usually exclude |

**Recommendation**: Use percentile rank <2% as your standard cutoff.

</details>

In [None]:
# Revisit initial CLL dataset again
#  ...

In [None]:
# BA is a transformed IC50 value. Write a function ba2ic50 to 
# transform binding affinities to IC50 values
# ...

# Get best IC50 (min) and best rank (min) over all 6 alleles per peptide
# ...

# Visualize the relationship between IC50 and Percentile Rank
# ...

In [None]:
# Compute Sample Purity (n_binders(UPNX) / n_peptides(UPNX)) and plot
# ...

---

### Exercise 4: Interpret the Purity Results

Based on the purity analysis above:

1. Is this a high-quality immunopeptidomics sample? What purity would you expect?
2. Why might some true HLA ligands be predicted as non-binders?
3. How would you interpret a sample with only 30% purity?

<details>
<summary><b>Click to reveal the answers</b></summary>

**1. Quality expectations**:
- High-quality samples: 70-90% binders (rank < 2%)
- Good samples: 50-70% binders
- Poor samples: <50% binders

**2. True ligands predicted as non-binders could be due to**:
- Prediction tool limitations (not all alleles modeled equally well)
- Novel binding modes not in training data
- Post-translational modifications not considered
- Peptides binding to alleles not in our samplesheet (bad HLA typing)

**3. Low purity (30%) interpretations**:
- Sample contamination with non-HLA peptides
- Incorrect HLA typing (wrong alleles specified)
- Technical issues during immunoprecipitation
- Allele not well-represented in prediction tool

**Action**: Always investigate low-purity samples before downstream analysis!

</details>

---

## Summary

Congratulations! You've completed the Combined Immunopeptidomics Workflow. You learned:

| Topic | Key Concept |
|-------|-------------|
| **Discovery** | Used `nf-core/mhcquant` to identify peptides from raw CLL data |
| **Reference** | Used HLA Ligand Atlas as a benign baseline for comparison |
| **Comparison** | Created Waterfall Plots to identify Tumor-Associated Antigens (TAAs) |
| **Quality** | Verified peptide quality using Length distribution and Binding Prediction |


## Resources

- **HLA Ligand Atlas**: https://hla-ligand-atlas.org/
- **nf-core/mhcquant**: https://nf-co.re/mhcquant/
- **nf-core/epitopeprediction**: https://nf-co.re/epitopeprediction/
- **GibbsCluster**: https://services.healthtech.dtu.dk/services/GibbsCluster-2.0/
- **MHC Motif Atlas**: http://mhcmotifatlas.org/