# AMULETY CLI Tutorial

## Introduction

This tutorial demonstrates how to use AMULETY command line interface (CLI) to translate and embed both BCR (B-cell receptor) and TCR (T-cell receptor) sequences. 

AMULETY supports a wide range of embedding models for different immune receptor types. For a full list of the supported models, please check the [Usage](../usage.md) documentation page.

## Installation

Before getting started, please install AMULETY. You can install AMULETY through conda or pip. The conda installation will already install the IgBlast dependency, while if installing via pip, the IgBLAST dependency will need to be installed separately.

Install AMULETY through conda:

In [None]:
conda install -c bioconda amulety

To verify the installation and print the help message, run:

In [1]:
! amulety --help


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mamulety [OPTIONS] COMMAND [ARGS]...[0m[1m                                    [0m[1m [0m
[1m                                                                                [0m
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m [1;36m-[0

# Downloading example data and reference database

The following command downloads an example AIRR format file of BCR sequences and the reference IgBlast database.

In [None]:
# Create tutorial directory and download example data
! mkdir -p tutorial
! wget -P tutorial https://zenodo.org/records/17186858/files/AIRR_subject1_FNA_d0_1_Y1.tsv

# Download and extract IgBlast reference database
! wget -P tutorial -c https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip
! unzip tutorial/igblast_base.zip -d tutorial
! rm tutorial/igblast_base.zip

--2025-09-24 13:00:01--  https://zenodo.org/records/17186858/files/AIRR_subject1_FNA_d0_1_Y1.tsv
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.43.25, 188.185.48.194, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 479753 (469K) [application/octet-stream]
Saving to: 'tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv.2'


2025-09-24 13:00:02 (578 KB/s) - 'tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv.2' saved [479753/479753]

--2025-09-24 13:00:03--  https://github.com/nf-core/test-datasets/raw/airrflow/database-cache/igblast_base.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/database-cache/igblast_base.zip [following]
--2025-09-24 13:00:03--  https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/data

# Translating nucleotides to amino acid sequences

The inputs to the embedding models are [AIRR format files](https://docs.airr-community.org/en/stable/datarep/overview.html#datarepresentations) with immune receptor amino acid sequences. If the AIRR file only contains nucleotide sequences, the `amulety translate-igblast` command can help with the translation. The input requires:

- Path to the V(D)J sequence AIRR file
- Output directory path to write the translated sequences
- Reference IgBlast database to perform alignment and translation

In [3]:
! amulety translate-igblast tutorial/AIRR_subject1_FNA_d0_1_Y1.tsv tutorial tutorial/igblast_base


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:28:42,713 - INFO - Converting AIRR table to FastA for IgBlast translation...
2025-09-24 11:28:42,720 - INFO - Calling IgBlast for running translation...
2025-09-24 11:28:44,404 - INFO - Saved the translations in the dataframe (sequence_aa contains the full translation and sequence_vdj_aa contains the VDJ translation).
2025-09-24 11:28:44,407 - INFO - Took 1.69 seconds
2025-09-24 11:28:44,408 - INFO - Saved the translations in tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv file.


## Embedding sequences

Now we are ready to embed the sequences using various models. AMULETY uses a unified `embed` command that supports all available models.

To print the help message for the embedding command run:

In [5]:
! amulety embed --help


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mamulety embed [OPTIONS][0m[1m                                                [0m[1m [0m
[1m                                                                                [0m
 Embeds sequences from an AIRR rearrangement file using the specified model. It 
 returns the                                                                    
                                                                                
 [2mExample usage:[

### BCR embedding examples

Let's demonstrate embedding BCR sequences using different models:

#### AntiBERTy (BCR-specific model)

In [4]:
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberty --batch-size 2 --output-file-path tutorial/test_embedding.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:28:55,583 - INFO - Detected single-cell data format
2025-09-24 11:28:55,585 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 11:28:55,586 - INFO - Removed 102 sequences not matching H chain
2025-09-24 11:29:02,850 - INFO - AntiBERTy loaded. Size: 26.03 M
2025-09-24 11:29:02,850 - INFO - Batch 1/48
2025-09-24 11:29:02,887 - INFO - Batch 2/48
2025-09-24 11:29:02,912 - INFO - Batch 3/48
2025-09-24 11:29:02,933 - INFO - Batch 4/48
2025-09-24 11:29:02,955 - INFO - Batch 5/48
2025-09-24 11:29:02,976 - INFO - Batch 6/48
202

#### AntiBERTa2 (BCR-specific model)

In [10]:
# Embed heavy-light chain pairs using AntiBERTa2
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model antiberta2 --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_antiberta2.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:44:05,686 - INFO - Detected single-cell data format
2025-09-24 11:44:05,688 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 11:44:05,688 - INFO - Removed 102 sequences not matching H chain
tokenizer_config.json: 100%|████████████████████| 116/116 [00:00<00:00, 339kB/s]
vocab.txt: 100%|█████████████████████████████| 80.0/80.0 [00:00<00:00, 1.56MB/s]
special_tokens_map.json: 100%|█████████████████| 124/124 [00:00<00:00, 1.24MB/s]
config.json: 100%|█████████████████████████████| 575/575 [00:00<00:00, 1.76MB/s]
Xet Stor

#### AbLang (BCR-specific model with separate heavy/light models)

In [11]:
# Embed both heavy and light chains separately using AbLang
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H+L --model ablang --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_ablang.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:45:05,420 - INFO - Detected single-cell data format
2025-09-24 11:45:05,421 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 11:45:06,418 - INFO - AbLang heavy chain model loaded
2025-09-24 11:45:06,418 - INFO - Batch 1/99
2025-09-24 11:45:06,500 - INFO - Batch 2/99
2025-09-24 11:45:06,570 - INFO - Batch 3/99
2025-09-24 11:45:06,642 - INFO - Batch 4/99
2025-09-24 11:45:06,712 - INFO - Batch 5/99
2025-09-24 11:45:06,790 - INFO - Batch 6/99
2025-09-24 11:45:06,864 - INFO - Batch 7/99
2025-09-24 11:45:06,939 - INFO - Ba

### BALM-paired model (BCR paired chains)

BALM-paired is a specialized model for BCR trained on paired heavy-light chains. We can embed concatenated heavy and light chains with AMULETY with the `--chain HL` option.

In [13]:
# Embed heavy-light chain pairs using BALM-paired
# The model will be automatically downloaded on first use
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain HL --model balm-paired --batch-size 2 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_balm_paired.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:51:36,752 - INFO - Detected single-cell data format
2025-09-24 11:51:36,754 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 12:02:54,987 - INFO - Model size: 303.92M
Batch 1/48

Batch 2/48

Batch 3/48

Batch 4/48

Batch 5/48

Batch 6/48

Batch 7/48

Batch 8/48

Batch 9/48

Batch 10/48

Batch 11/48

Batch 12/48

Batch 13/48

Batch 14/48

Batch 15/48

Batch 16/48

Batch 17/48

Batch 18/48

Batch 19/48

Batch 20/48

Batch 21/48

Batch 22/48

Batch 23/48

Batch 24/48

Batch 25/48

Batch 26/48

Batch 27/48

Batch 28/48



### Protein Language Models

Then we want to use the same dataset to embed using the general protein language models.

#### ESM2 (Protein language model)

In [5]:
# Embed heavy chains only using ESM2
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model esm2 --batch-size 1 --output-file-path tutorial/AIRR_subject1_FNA_d0_1_Y1_esm2.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 11:29:55,935 - INFO - Detected single-cell data format
2025-09-24 11:29:55,935 - INFO - Processing both BCR and TCR sequences from the file.
2025-09-24 11:29:55,936 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 11:29:55,936 - INFO - Removed 102 sequences not matching H chain
tokenizer_config.json: 100%|██████████████████| 95.0/95.0 [00:00<00:00, 157kB/s]
vocab.txt: 100%|█████████████████████████████| 93.0/93.0 [00:00<00:00, 1.33MB/s]
special_tokens_map.json: 100%|██████████████████| 125/125 [00:00<00:00, 448kB/s]
con

### Custom/Fine-tuned models

You can use custom or fine-tuned models from HuggingFace or local paths using the `custom` model type:

In [14]:
# Example: Using a fine-tuned ESM2 model from HuggingFace
! amulety embed --input-airr tutorial/AIRR_subject1_FNA_d0_1_Y1_translated.tsv --chain H --model custom \
  --model-path "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization" \
  --embedding-dimension 320 \
  --max-length 512 \
  --batch-size 2 \
  --output-file-path tutorial/custom_model_embeddings.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 12:30:43,593 - INFO - Detected single-cell data format
2025-09-24 12:30:43,595 - INFO - Processing both BCR and TCR sequences from the file.
2025-09-24 12:30:43,596 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 12:30:43,597 - INFO - Removed 102 sequences not matching H chain
Some weights of EsmForMaskedLM were not initialized from the model checkpoint at AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization and are newly initialized: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm

### TCR embedding examples

AMULETY also supports TCR-specific models. Here we also provide TCR example data and you can download and have a try: 

In [7]:
# Download TCR example data
! wget -P tutorial https://zenodo.org/records/17186858/files/AIRR_tcr_sample.tsv

--2025-09-24 11:35:16--  https://zenodo.org/records/17186858/files/AIRR_tcr_sample.tsv
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40915 (40K) [application/octet-stream]
Saving to: 'tutorial/AIRR_tcr_sample.tsv'


2025-09-24 11:35:17 (166 KB/s) - 'tutorial/AIRR_tcr_sample.tsv' saved [40915/40915]



#### TCR-BERT (TCR-specific model)

In [15]:
# Embed TCR beta-alpha chain pairs using TCR-BERT
# Note: This assumes you have TCR data in AIRR format
! amulety embed --input-airr tutorial/AIRR_tcr_sample.tsv --chain HL --model tcr-bert --batch-size 2 --output-file-path tutorial/tcr_embeddings_tcrbert.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 12:31:02,594 - INFO - Detected single-cell data format
2025-09-24 12:31:02,595 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 12:31:02,599 - INFO - Dropping 100 cells with missing heavy or light chain...
2025-09-24 12:31:02,600 - INFO - Loading TCR-BERT model for TCR embedding...
2025-09-24 12:31:04,618 - INFO - Successfully loaded TCR-BERT model
2025-09-24 12:31:04,619 - INFO - TCR-BERT model loaded. Size: 57.39 M
2025-09-24 12:31:04,619 - INFO - TCR-BERT Batch 1/25.
2025-09-24 12:31:04,663 - INFO - TCR-BERT Batch 2/

#### TCRT5 (TCR beta chain only)

In [16]:
# Embed TCR beta chains using TCRT5 (only supports H/beta chains)
! amulety embed --input-airr tutorial/AIRR_tcr_sample.tsv --chain H --model tcrt5 --batch-size 2 --output-file-path tutorial/tcr_embeddings_tcrt5.pt


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

2025-09-24 12:31:11,221 - INFO - Detected single-cell data format
2025-09-24 12:31:11,221 - INFO - Single-cell AIRR data detected (all entries have cell_id).
2025-09-24 12:31:11,222 - INFO - Removed 100 sequences not matching H chain
2025-09-24 12:31:11,222 - INFO - Loading TCRT5 model for TCR embedding...
tokenizer_config.json: 21.1kB [00:00, 23.3MB/s]
spiece.model: 100%|██████████████████████████| 238k/238k [00:00<00:00, 2.78MB/s]
added_tokens.json: 2.35kB [00:00, 16.2MB/s]
special_tokens_map.json: 2.64kB [00:00, 12.0MB/s]
The tokenizer class you load from t

## Checking dependencies

Some models require additional dependencies that are not installed by default. You can check which dependencies are missing:

In [1]:
# Check which optional dependencies are missing
! amulety check-deps


 █████  ███    ███ ██    ██ ██      ███████ ████████     ██    ██
██   ██ ████  ████ ██    ██ ██      ██         ██         ██  ██
███████ ██ ████ ██ ██    ██ ██      █████      ██          ████
██   ██ ██  ██  ██ ██    ██ ██      ██         ██           ██
██   ██ ██      ██  ██████  ███████ ███████    ██           ██

AMULETY: Adaptive imMUne receptor Language model Embedding tool for TCR and 
antibodY
 version [1;36m2.0[0m

Checking AMULETY dependencies...

IgBlast (for translate-igblast command):
  IgBlast (igblastn) is available

Embedding model dependencies:
2025-09-24 12:51:20,234 - INFO - Available models: AntiBERTy, AbLang, TCR-BERT, TCRT5, ESM2, ProtT5
  1 dependencies are missing.
  AMULETY will raise ImportError with installation instructions when these models are used.

  To install missing dependencies:
    • Immune2Vec: git clone https://bitbucket.org/yaarilab/immune2vec_model.git && add to Python path

  Note: Models will provide detailed installation instructions wh