Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
221 lines (177 sloc) 16.9 KB

Analysis modules

Table of VDJtools modules

VDJtools software package contains a comprehensive set of immune repertoire post-analysis routines, which are subdivided into several analysis modules. Each module's section provides command line usage syntax and parameter descriptions for each of the routines, as well as output example and description.


Summary statistics, spectratyping, etc


Repertoire richness and diversity


Clonotype sharing between samples


Filtering and resampling


Clonotype table operations


Functional annotation of clonotype tables (antigen specificity, amino acid properties, etc)

  • :ref:`CalcCdrAAProfile` Builds a profile of CDR3 regions (V germline, V-D junction, ...) using a set of amino-acid physical properties
  • :ref:`Annotate2` Computes a set of basic (insert size, ...) and amino acid physical properties (GRAVY, ...) for clonotypes
  • :ref:`ScanDatabase` Queries a database containing clonotypes of known antigen specificity.


Some useful utilities


Each routine generates a comprehensive tabular output and some produce optional graphical output. In case of graphical output, the corresponding R script with specified arguments (at the beginning of the script, commented) will be stored to the analysis folder. Thus, user can uncomment the script arguments, modify the script and re-run it. This behavior be disabled by running VDJtools with discard_scripts argument prior to routine name.

By default, all graphical output is generated in PDF format, to generate PNG images use ``--plot-type png option.

When running routines that output clonotype tables consider the following:

  • Joint and pooled samples are stored in VDJtools fomat
  • Samples produced using :ref:`ScanDatabase` or :ref:`Annotate` routine are in VDJtools format and include additional annotation columns. Annotation columns are retained when running most of VDJtools routines
  • When loading a joint/pooled sample into VDJtools, clonotype abundance vectors, incidence counts, etc will be treated as clonotype level annotations
  • Annotation columns will not be preserved when joining/pooling annotated samples, a workaround

here will be to use :ref:`ApplySampleAsFilter` routine


When exporting a table generated by one of VDJtools routines into R use the following command to parse the input correctly:

read.table("some_table.txt", header=T, quote="", sep = "\t")

Common parameters

There are several parameters that are commonly used among analysis routines:

Shorthand Long name Argument Description
-h --help   Brings up the help message for selected routine
-m --metadata path Path to metadata file. Should point to a tab-delimited file with the first two columns containing sample path and sample id respectively, and the remaining columns containing user-specified data. See :ref:`metadata` section
-u --unweighted   If present as an option and not set, all statistics will be weighted by clonotype frequency
-i --intersect-type string :ref:`overlap_type`, that specifies which clonotype features (CDR3 sequence, V/J segments, hypermutations) will be compared when checking if two clonotypes match. Allowed values: strict,nt,ntV,ntVJ,aa,aaV,aaVJ and aa!nt.
-p --plot   [plotting] Enable plotting for routines that supports it.
  --plot-type <pdf|png> [plotting] Specifies whether to generate a PDF or PNG file. While latter could be easily embedded, PDF plots have superior quality.
-f --factor string [plotting] Name of the sample metadata column that should be treated as factor. If the name contains spaces, the argument should be surrounded with double quotes, e.g. -f "Treatment type"
-n --factor-numeric   [plotting] Treat the factor as numeric?
-l --label string [plotting] Name of the sample metadata column that should be treated as label. If the name contains spaces, the argument should be surrounded with double quotes, e.g. -l "Patient id"
-c --compress path Compress resulting clonotype tables using GZIP.

Overlap type

Some of VDJtools routines require to define clonotype matching strategy when computing clonotype sharing between samples. This parameter is also used when collapsing clonotype tables, e.g. a common situation is when one is interested in estimating the extent of convergent recombination, which is the number of distinct nucleotide CDR3 sequences per one CDR3 amino acid sequence. This requires to collapse clonotype table by identical CDR3aa field.

The list of strategies is defined below.

Shorthand Rule Note
strict CDR3nt (AND) V (AND) J (AND) SHMs Require full match for receptor nucleotide sequence
nt CDR3nt  
ntV CDR3nt (AND) V  
ntVJ CDR3nt (AND) V (AND) J  
aa CDR3aa  
aaV CDR3aa (AND) V  
aaVJ CDR3aa (AND) V (AND) J  
aa!nt CDR3aa (AND)((NOT) CDR3nt ) Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments

As somatic hypermutations (SHMs) are currently not supported by VDJtools, strict and ntVJ options are identical. See VDJtools :ref:`clonotype_spec` specification for details.

You can’t perform that action at this time.