Default list and sequences of V
, D
, J
and C
genes used
by MiXCR are taken from GenBank. Accession numbers of records used for
each locus are listed in the following table:
Homo sapiens |
TRA /TRD |
NG_001332.2 |
TRB |
NG_001333.2 | |
TRG |
NG_001336.2 | |
IGH |
NG_001019.5 | |
IGK |
NG_000834.1 | |
IGL |
NG_000002.1 | |
Mus musculus |
TRA /TRD |
NG_007044.1 |
TRB |
NG_006980.1 | |
TRG |
NG_007033.1 | |
IGH |
NG_005838.1 | |
IGK |
NG_005612.1 | |
IGL |
NG_004051.1 |
MiXCR outputs alignments in exportClones
and exportAlignments
as
a list of 7 fields separated by |
symbol as follows:
targetFrom | targetTo | targetLength | queryFrom | queryTo | mutations | alignmentScore
where
targetFrom
- position of first aligned nucleotide in target sequence (sequence of gene feature from reference V, D, J or C gene used in alignment; e.g.VRegion
in TRBV12-2); this boundary is inclusivetargetTo
- next position after last aligned nucleotide in target sequence; this boundary is exclusivetargetLength
- length of target sequence (e.g. length ofVRegion
in TRBV12-2)queryFrom
- position of first aligned nucleotide in query sequence (sequence of sequencing read or clonal sequence); this boundary is inclusivequeryTo
- next position after last aligned nucleotide in query sequence; this boundary is exclusivemutations
- list of mutations from target sequence to query sequence (see below)alignmentScore
- score of alignment
all positions are zero-based (i.e. first nucleotide has index 0)
Mutations are encoded as a list of single-nucleotide edits (similar to what is used in definition of Levenshtein distance, i.e. insertions, deletions or substitutions); if one apply these mutations to aligned subsequence of target sequence, one will obtain aligned subsequence of query sequence.
Each single mutation (single-nucleotide edit) is encoded in the following way (without any spaces; some fields may absent in some cases, see description):
type
[fromNucleotide
] position
[toNucleotide]
- type of mutation (one letter):
S
for substitutionD
for deletionI
for insertion- fromNucleotide is a nucleotide in target sequence affected by mutation (applicable only for substitutions and deletions; absent for insertions)
- position is a zero-based absolute position in target sequence affected by mutation; for insertions denotes position in target sequence right after inserted nucleotide
- toNucleotide nucleotide after mutation (applicable only for substitutions and insertions; absent for deletions)
Note, that for deletions and substitutions
targetSequence[position] == fromNucleotide
i.e. target sequence always have fromNucleotide at position position; for insertions fromNucleotide field is absent
Here are several examples of single mutations:
SA4T
- substitution ofA
at position4
toT
DC12
- deletion ofC
at position12
I15G
- insertion ofG
before position15
Consider the following BLAST-like alignments encoded in MiXCR notation:
Alignment without mutation
target = TTGTGCTGACAGATACCCC query = CGAGTGCTGACAGATACCGTCGATGCT BLAST like alignment: 2 GTGCTGACAGATACC 16 ||||||||||||||| 3 GTGCTGACAGATACC 17 MiXCR alignment: 2|17|19|3|18||75.0
subsequence from target
(from nucleotide 0 to nucleotide 15) was
found to be identical to susequence from query
(from nucleotide 3 to
nucleotide 18).
Alignment with mutation
target = TTGTGCTGACAGATACCCC query = CGAGTGCTATAGACTACCGTCGATGCT BLAST like alignment: 2 GTGCTGACAGA-TACC 16 ||||| | ||| |||| 3 GTGCT-ATAGACTACC 17 MiXCR alignment: 2|17|19|3|18|DG7SC9TI13C|41.0
so, to obtain subseqeunce from query sequence from 3 to 18 we need
to apply the following mutations to subsequence of target sequence
from 2 to 16: - deletion of G
at position 7
- substitution of
C
at position 9
to T
- insertion of C
before at position
13