The align
command aligns raw sequencing reads to reference V, D, J and C genes of T- and B- cell receptors. It has the following syntax:
mixcr align [options] input_file1 [input_file2] output_file.vdjca
MiXCR supports fasta
, fastq
, fastq.gz
and paired-end fastq
and fastq.gz
input. In case of paired-end reads two input files should be specified.
The following table contains description of command line options for align
:
Option | Default value | Description |
---|---|---|
-h , --help |
Print help message. | |
-r {file} --report ... |
Report file name. If this option is not specified, no report file be produced. | |
-с {chain} --chains ... |
ALL |
Target immunological chain list separated by ", ". Available values: IGH , IGL , IGK , TRA , TRB , TRG , TRD , IG (for all immunoglobulin chains), TCR (for all T-cell receptor chains), ALL (for all chains) . It is highly recomended to use the default value for this parameter in most cases at the align step. Filltering is also possible at the export step. |
-s {speciesName} --species ... |
HomoSapiens |
Species (organism). Possible values: hsa (or HomoSapiens ) and mmu (or MusMusculus ), or any that was provided during import of segments (see import segments <ref-importSegments> ) |
-p {parameterName} --parameters ... |
default |
Preset of parameters. Possible values: default and rna-seq . The rna-seq preset are specifically optimized for analysis of Rna-Seq data (see below) <ref-alignRNASeq> |
-i , --diff-loci |
Accept alignments with different loci of V and J genes (by default such alignments are dropped). | |
-t {numberOfThreads} --threads ... |
number of available CPU cores | Number of processing threads. |
-n {numberOfReads} --limit ... |
Limit number of sequences that will be analysed (only first -n sequences will be processed from input file(s)). |
|
-a , --save-description |
Copy read(s) description line from .fastq or .fasta to .vdjca file (can be then exported with -descrR1 and -descrR2 options in exportAlignments <ref-export> action). |
|
-v , --write-all |
Write alignment results for all input reads: including empty results for non-aligned reads. This option also turns off "same locus filter", so --diff-loci has no effect if this option is specified. |
|
-g , --save-reads |
Copy read(s) from .fastq or .fasta to .vdjca file (this is required for exporting reads aggregated by clones; see this section <ref-exporting-reads> ). |
|
--not-aligned-R1 |
Write all not aligned reads (R1) to the specified file. | |
--not-aligned-R2 |
Write all not aligned reads (R) to the specified file. | |
-Oparameter=value |
Overrides default value of aligner parameter (see next subsection). |
All parameters are optional.
MiXCR uses a wide range of parameters that controls aligner behaviour. There are some global parameters and gene-specific parameters organized in groups: vParameters
, dParameters
, jParameters
and cParameters
. Each group of parameters may contain further subgroups of parameters etc. In order to override some parameter value one can use -O
followed by fully qualified parameter name and parameter value (e.g. -Ogroup1.group2.parameter=value
).
One of the key MiXCR features is ability to specify particular gene regions <ref-geneFeatures>
which will be extracted from reference and used as a targets for alignments. Thus, each sequencing read will be aligned to these extracted reference regions. Parameters responsible for target gene regions are:
Parameter | Default value | Description |
---|---|---|
vParameters.geneFeatureToAlign |
VRegion |
region in V gene which will be used as target in align |
dParameters.geneFeatureToAlign |
DRegion |
region in D gene which will be used as target in align |
jParameters.geneFeatureToAlign |
JRegion |
region in J gene which will be used as target in align |
cParameters.geneFeatureToAlign |
CExon1 |
region in C gene which will be used as target in align |
It is important to specify these gene regions such that they will fully cover target clonal gene region which will be used in assemble <ref-assemble>
(e.g. CDR3).
One can override default gene regions in the following way:
mixcr align -OvParameters.geneFeatureToAlign=VTranscript input_file1 [input_file2] output_file.vdjca
Other global aligner parameters are:
Parameter | Default value | Description |
---|---|---|
|
120.0 |
Minimal total alignment score value of V and J genes. |
|
5 |
Maximal number of hits for each gene type: if input sequence align to more than maxHits targets, then only top maxHits hits will be kept. |
|
12 |
Minimal clonal sequence length (e.g. minimal sequence of CDR3 to be used for clone assembly) |
|
VThenJ |
Order in which V and J genes aligned in target (possible values JThenV and VThenJ ). Parameter affects only single-read alignments and alignments of overlapped paired-end reads. Non-overlaping paired-end reads are always processed in VThenJ mode. JThenV can be used for short reads (~100bp) with full (or nearly full) J gene coverage. |
relativeMinVFR3CDR3Score (only for paired-end analysis) |
0.7 |
Relative minimal alignment score of FR3+VCDR3Part region for V gene. V hit will be kept only if its FR3+VCDR3Part part aligns with score greater than relativeMinVFR3CDR3Score * maxFR3CDR3Score , where maxFR3CDR3Score is the maximal alignment score for FR3+VCDR3Part region among all of V hits for current input reads pair. |
readsLayout (only for paired-end analysis) |
Opposite |
Relative orientation of paired reads. Available values: Opposite , Collinear , Unknown . |
One can override these parameters in the following way:
mixcr align -OmaxHits=3 input_file1 [input_file2] output_file.vdjca
MiXCR uses same types of aligners to align V, J and C genes (KAligner
from MiLib; the idea of KAligner
is inspired by this article). These parameters are placed in parameters
subgroup and can be overridden using e.g. -OjParameters.parameters.mapperKValue=7
. The following parameters for V, J and C aligners are available:
Parameter | Default V value | Default J value | Default C value | Description |
---|---|---|---|---|
mapperKValue |
5 |
5 |
5 |
Length of seeds used in aligner. |
floatingLeftBound |
true |
true |
false |
Specifies whether left bound of alignment is fixed or float: if floatingLeftBound set to false, the left bound of either target or query will be aligned. Default values are suitable in most cases. |
floatingRightBound |
true |
true |
false |
Specifies whether right bound of alignment is fixed or float: if floatingRightBound set to false, the right bound of either target or query will be aligned. Default values are suitable in most cases. If your target molecules have no primer sequences in J Region (e.g. library was amplified using primer to the C region) you can change value of this parameter for J gene to false to increase J gene identification accuracy and overall specificity of alignments. |
minAlignmentLength |
15 |
15 |
15 |
Minimal length of aligned region. |
maxAdjacentIndels |
2 |
2 |
2 |
Maximum number of indels between two seeds. |
absoluteMinScore |
40.0 |
40.0 |
40.0 |
Minimal score of alignment: alignments with smaller score will be dropped. |
relativeMinScore |
0.87 |
0.87 |
0.87 |
Minimal relative score of alignments: if alignment score is smaller than relativeMinScore * maxScore , where maxScore is the best score among all alignments for particular gene type (V, J or C) and input sequence, it will be dropped. |
maxHits |
7 |
7 |
7 |
Maximal number of hits: if input sequence align with more than maxHits queries, only top maxHits hits will be kept. |
These parameters can be overridden like in the following example:
mixcr align -OvParameters.parameters.minAlignmentLength=30 \
-OjParameters.parameters.relativeMinScore=0.7 \
input_file1 [input_file2] output_file.vdjca
Scoring used in aligners is specified by scoring
subgroup of parameters. It contains the following parameters:
Parameter | Default value | Description |
---|---|---|
subsMatrix |
|
Substitution matrix. Available types:
|
gapPenalty |
-12 |
Penalty for gap. |
Scoring parameters can be overridden in the following way:
mixcr align -OvParameters.parameters.scoring.gapPenalty=-20 input_file1 [input_file2] output_file.vdjca
mixcr align -OvParameters.parameters.scoring.subsMatrix=simple(match=4,mismatch=-11) \
input_file1 [input_file2] output_file.vdjca
The following parameters can be overridden for D aligner:
Parameter | Default value | Description |
---|---|---|
absoluteMinScore |
30.0 |
Minimal score of alignment: alignments with smaller scores will be dropped. |
relativeMinScore |
0.85 |
Minimal relative score of alignment: if alignment score is smaller than relativeMinScore * maxScore , where maxScore is the best score among all alignments for particular sequence, it will be dropped. |
maxHits |
3 |
Maximal number of hits: if input sequence align with more than maxHits queries, only top maxHits hits will be kept. |
One can override these parameters like in the following example:
mixcr align -OdParameters.absoluteMinScore=10 input_file1 [input_file2] output_file.vdjca
Scoring parameters for D aligner are the following:
Parameter | Default value | Description |
---|---|---|
type |
affine |
Type of scoring. Possible values: affine , linear . |
subsMatrix |
|
Substitution matrix. Available types:
|
gapOpenPenalty |
-10 |
Penalty for gap opening. |
gapExtensionPenalty |
-1 |
Penalty for gap extension. |
These parameters can be overridden in the following way:
mixcr align -OdParameters.scoring.gapExtensionPenalty=-5 input_file1 [input_file2] output_file.vdjca