# AMULETY CLI Tutorial

## Introduction

This tutorial demonstrates how to use AMULETY command line interface (CLI) to translate and embed both BCR (B-cell receptor) and TCR (T-cell receptor) sequences. 

AMULETY supports a wide range of embedding models for different immune receptor types. For a full list of the supported models, please check the [Usage](../usage.md) documentation page.

## Installation

Before getting started, please install AMULETY.

Option 1: pip installation

In [None]:
pip install amulety

Processing /home/gisela/Projects/immcantation/amulety
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: amulety
  Building wheel for amulety (pyproject.toml) ... [?25ldone
[?25h  Created wheel for amulety: filename=amulety-0.1.1-py3-none-any.whl size=56458 sha256=d421262a9192f02535d3b0e58a8045b90d5ceb82e3348ebb6d131320ba127530
  Stored in directory: /tmp/pip-ephem-wheel-cache-ocgnqz9w/wheels/62/9c/ae/cf264ec5dc3182abb1db72271ca902b4604f165dec1643f073
Successfully built amulety
Installing collected packages: amulety
  Attempting uninstall: amulety
    Found existing installation: amulety 0.1.1
    Uninstalling amulety-0.1.1:
      Successfully uninstalled amulety-0.1.1
Successfully installed amulety-0.1.1
Note: you may need to restart the kernel to use updated packages.


Option 2: conda installation

In [None]:
conda install -c bioconda amulety

To verify the installation and print the help message, run:

In [34]:
! amulety --help


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mamulety [OPTIONS] COMMAND [ARGS]...[0m[1m                                    [0m[1m [0m
[1m                                                                                [0m
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-

# Downloading example data and reference database

The following command downloads an example AIRR format file of BCR sequences and the reference IgBlast database.

In [35]:
# Create tutorial directory and download example data
! mkdir -p tutorial
! mkdir -p tutorial/output
! wget -P tutorial https://zenodo.org/records/11373741/files/AIRR_subject1_FNA_d0_1_Y1.tsv

# Download and extract IgBlast reference database
! wget -P tutorial -c https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip
! unzip tutorial/igblast_base.zip -d tutorial
! rm tutorial/igblast_base.zip

--2025-09-15 11:37:21--  https://zenodo.org/records/11373741/files/AIRR_subject1_FNA_d0_1_Y1.tsv
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.43.25, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 479789 (469K) [application/octet-stream]
Saving to: ‘tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv’


2025-09-15 11:37:22 (835 KB/s) - ‘tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv’ saved [479789/479789]

--2025-09-15 11:37:23--  https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/database-cache/igblast_base.zip [following]
--2025-09-15 11:37:23--  https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/databas

# Translating nucleotides to amino acid sequences

The inputs to the embedding models are [AIRR format files](https://docs.airr-community.org/en/stable/datarep/overview.html#datarepresentations) with immune receptor amino acid sequences. If the AIRR file only contains nucleotide sequences, the `amulety translate-igblast` command can help with the translation. The input requires:

- Path to the V(D)J sequence AIRR file
- Output directory path to write the translated sequences
- Reference IgBlast database to perform alignment and translation

In [36]:
! amulety translate-igblast tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv tutorial tutorial/igblast_base


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-15 11:37:31,870 - INFO - Converting AIRR table to FastA for IgBlast translation...
2025-09-15 11:37:31,875 - INFO - Calling IgBlast for running translation...
2025-09-15 11:37:33,352 - INFO - Saved the translations in the dataframe (sequence_aa contains the full translation and sequence_vdj_aa contains the VDJ translation).
2025-09-15 11:37:33,354 - INFO - Took 1.48 seconds
2025-09-15 11:37:33,354 - INFO - Saved the translations in tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv file.


## Embedding sequences

Now we are ready to embed the sequences using various models. AMULETY uses a unified `embed` command that supports all available models.

To print the help message for the embedding command run:

In [37]:
! amulety embed --help


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mamulety embed [OPTIONS][0m[1m                                                [0m[1m [0m
[1m                                                                                [0m
 Embeds sequences from an AIRR rearrangement file using the specified model.    
 Example usage:                                                                 
 [2mamulety embed [0m[1;2;36m-[0m[1;2;36m-chain[0m[2m HL [0m[1;2;36m-[0m[1;2;36m-model[0m[2m antib

### BCR embedding examples

Let's demonstrate embedding BCR sequences using different models:

#### AntiBERTy (BCR-specific model)

In [38]:
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberty --batch-size 2 --output-file-path tutorial/test_embedding.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-15 11:37:46,304 - INFO - Detected single-cell data format
2025-09-15 11:37:46,306 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-15 11:37:46,306 - INFO - Removed 102 sequences not matching H chain
2025-09-15 11:37:46,738 - INFO - AntiBERTy loaded. Size: 26.03 M
2025-09-15 11:37:46,739 - INFO - Batch 1/48
2025-09-15 11:37:47,405 - INFO - Batch 2/48
2025-09-15 11:37:47,470 - INFO - Batch 3/48
2025-09-15 11:37:47,522 - INFO - Batch 4/48
2025-09-15 11:37:47,568 - INFO - Batch 5/48
2025-09-15 11:37:47,610 - INFO - Batch 6/48
2025-09-15 1

#### AntiBERTa2 (BCR-specific model)

In [39]:
# Embed heavy-light chain pairs using AntiBERTa2
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberta2 --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_antiberta2.pt



 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-15 11:37:53,629 - INFO - Detected single-cell data format
2025-09-15 11:37:53,631 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-15 11:37:53,631 - INFO - Removed 102 sequences not matching H chain
2025-09-15 11:37:54,593 - INFO - AntiBERTa2 loaded. Size: 202.642462 M
2025-09-15 11:37:54,593 - INFO - Batch 1/48.
2025-09-15 11:37:55,519 - INFO - Batch 2/48.
2025-09-15 11:37:55,900 - INFO - Batch 3/48.
2025-09-15 11:37:56,308 - INFO - Batch 4/48.
2025-09-15 11:37:56,734 - INFO - Batch 5/48.
2025-09-15 11:37:57,138 - INFO - Batch 6/48.


#### AbLang (BCR-specific model with separate heavy/light models)

In [40]:
# Embed both heavy and light chains separately using AbLang
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H+L --model ablang --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_ablang.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-15 11:39:39,910 - INFO - Detected single-cell data format
2025-09-15 11:39:39,912 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-15 11:39:41,311 - INFO - AbLang heavy chain model loaded
2025-09-15 11:39:41,311 - INFO - Batch 1/99
2025-09-15 11:39:41,452 - INFO - Batch 2/99
2025-09-15 11:39:41,573 - INFO - Batch 3/99
2025-09-15 11:39:41,674 - INFO - Batch 4/99
2025-09-15 11:39:41,781 - INFO - Batch 5/99
2025-09-15 11:39:41,891 - INFO - Batch 6/99
2025-09-15 11:39:42,007 - INFO - Batch 7/99
2025-09-15 11:39:42,106 - INFO - Batch 8/99


### BALM-paired model (BCR paired chains)

BALM-paired is a specialized model for BCR trained on paired heavy-light chains. We can embed concatenated heavy and light chains with AMULETY with the `--chain HL` option.

In [42]:
# Embed heavy-light chain pairs using BALM-paired
# The model will be automatically downloaded on first use
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain HL --model balm-paired --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_balm_paired.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-15 11:41:32,794 - INFO - Detected single-cell data format
2025-09-15 11:41:32,795 - INFO - Single-cell AIRR data detected (all entries have cell_id).
[31m╭─[0m[31m────────────────────[0m[31m [0m[1;31mTraceback [0m[1;2;31m(most recent call last)[0m[31m [0m[31m─────────────────────[0m[31m─╮[0m
[31m│[0m [2;33m/home/gisela/miniconda3/envs/amulet/lib/python3.11/site-packages/amulety/[0m[1;33mamu[0m [31m│[0m
[31m│[0m [1;33mlety.py[0m:[94m839[0m in [92membed[0m                                                         [31m│[0m
[31m│[0

### Protein Language Models

Then we want to use the same dataset to embed using the general protein language models.

#### ESM2 (Protein language model)

In [1]:
# Embed heavy chains only using ESM2
! amulety embed --input-airr ../tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model esm2 --batch-size 2 --output-file-path ../tutorial/AIRR_subject1_FNA_d0_1_Y1_esm2.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-08 22:52:48,775 - INFO - Detected single-cell data format
2025-09-08 22:52:48,777 - INFO - Processing both BCR and TCR sequences from the file.
2025-09-08 22:52:48,777 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-08 22:52:48,778 - INFO - Removed 102 sequences not matching H chain
tokenizer_config.json: 100%|██████████████████| 95.0/95.0 [00:00<00:00, 281kB/s]
vocab.txt: 100%|█████████████████████████████| 93.0/93.0 [00:00<00:00, 1.31MB/s]
special_tokens_map.json: 100%|██████████████████| 125/125 [00:00<00:00, 592kB/s]
config.json:

### Immune2Vec 
Immune2Vec requires manual installation follows by:

In [None]:
#Installing Immune2Vec
# Clone repository
git clone https://bitbucket.org/yaarilab/immune2vec_model.git

# please store the path: /path/to/immune2vec_model for later use:
# using custom path
! amulety embed --input-airr ../tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model immune2vec --immune2vec-path /path/to/immune2vec_model --batch-size 2 --output-file-path ../tutorial/AIRR_subject1_FNA_d0_1_Y1_immune2vec.pt

### Custom/Fine-tuned models

You can use custom or fine-tuned models from HuggingFace or local paths using the `custom` model type:

In [2]:
# Example: Using a fine-tuned ESM2 model from HuggingFace
! amulety embed --input-airr ../tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model custom \
  --model-path "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization" \
  --embedding-dimension 320 \
  --max-length 512 \
  --batch-size 2 \
  --output-file-path ../tutorial/custom_model_embeddings.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-08 23:12:04,178 - INFO - Detected single-cell data format
2025-09-08 23:12:04,179 - INFO - Processing both BCR and TCR sequences from the file.
2025-09-08 23:12:04,179 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-08 23:12:04,180 - INFO - Removed 102 sequences not matching H chain
tokenizer_config.json: 100%|████████████████████| 108/108 [00:00<00:00, 247kB/s]
vocab.txt: 100%|█████████████████████████████| 93.0/93.0 [00:00<00:00, 1.22MB/s]
special_tokens_map.json: 100%|██████████████████| 125/125 [00:00<00:00, 599kB/s]
config.json:

### TCR embedding examples

AMULETY also supports TCR-specific models. Here we also provide TCR example data and you can download and have a try: 

In [None]:
# Create tutorial directory and download TCR example data
# TBD...
wget -P tutorial https://zenodo.org/records/11373741/TBD...

#### TCR-BERT (TCR-specific model)

In [3]:
# Embed TCR beta-alpha chain pairs using TCR-BERT
# Note: This assumes you have TCR data in AIRR format
! amulety embed --input-airr ../tutorial/AIRR_tcr_sample.tsv --chain HL --model tcr-bert --batch-size 2 --output-file-path ../tutorial/tcr_embeddings_tcrbert.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-06 15:41:06,325 - INFO - Detected single-cell data format
2025-09-06 15:41:06,326 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-06 15:41:06,330 - INFO - Loading TCR-BERT model for TCR embedding...
2025-09-06 15:41:06,937 - INFO - Successfully loaded TCR-BERT model
2025-09-06 15:41:06,937 - INFO - TCR-BERT model loaded. Size: 57.39 M
2025-09-06 15:41:06,937 - INFO - TCR-BERT Batch 1/25.
2025-09-06 15:41:06,989 - INFO - TCR-BERT Batch 2/25.
2025-09-06 15:41:07,022 - INFO - TCR-BERT Batch 3/25.
2025-09-06 15:41:07,052 - INFO - TCR-BER

#### TCRT5 (TCR beta chain only)

In [2]:
# Embed TCR beta chains using TCRT5 (only supports H/beta chains)
! amulety embed --input-airr ../tutorial/AIRR_tcr_sample.tsv --chain H --model tcrt5 --batch-size 2 --output-file-path ../tutorial/tcr_embeddings_tcrt5.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

2025-09-06 15:40:29,129 - INFO - Detected single-cell data format
2025-09-06 15:40:29,131 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-06 15:40:29,131 - INFO - Removed 100 sequences not matching H chain
2025-09-06 15:40:29,133 - INFO - Loading TCRT5 model for TCR embedding...
tokenizer_config.json: 21.1kB [00:00, 6.70MB/s]
spiece.model: 100%|██████████████████████████| 238k/238k [00:00<00:00, 2.87MB/s]
added_tokens.json: 2.35kB [00:00, 10.1MB/s]
special_tokens_map.json: 2.64kB [00:00, 9.78MB/s]
The tokenizer class you load from this check

## Checking dependencies

Some models require additional dependencies that are not installed by default. You can check which dependencies are missing:

In [12]:
# Check which optional dependencies are missing
! amulety check-deps


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding Tool
 version [1;36m0.1[0m.[1;36m1[0m

Checking AMULETY dependencies...

IgBlast (for translate-igblast command):
  IgBlast (igblastn) is available

Embedding model dependencies:
2025-09-08 23:27:12,074 - INFO - Available models: AntiBERTy, AbLang, TCREMP, TCR-BERT, TCRT5, ESM2, ProtT5
  1 dependencies are missing.
  AMULETY will raise ImportError with installation instructions when these models are used.

  To install missing dependencies:
    • Immune2Vec: git clone https://bitbucket.org/yaarilab/immune2vec_model.git && add to Python path

  Note: Models will provide detailed installation instructions whe