Skip to content

Commit

Permalink
New export fields. V5BeginTrimmed naming fixed. Docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
dbolotin committed Sep 25, 2015
1 parent 35d9562 commit 622b80f
Show file tree
Hide file tree
Showing 11 changed files with 294 additions and 121 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG_CURRENT
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
New feature: optional short column names in `export...` action to simplify further data analysis using data table processing libraries like Pandas or R/DataFrames. (`-s` / `--no-spaces` in `exportAlignments` and `exportClones`)
Added `UTR5BeginTrimmed` reference point
New export fields: `-defaultAnchorPoints` outputs positions of default anchor point in aligned reads or clonal sequence (this column is added to default output format), `-positionOf` outputs position of specified anchor point, `-lengthOf` outputs lengt of specified gene feature
Added `V5UTRBeginTrimmed` anchor point, `V5UTR` gene feature renamed to `V5UTRGermline`, trimmed `V5UTR` gene feature added
minor: some column names in output tab-delimited files slightly changed
minor: NPE in exportAlignmentsPretty fixed
minor: New reference poins added to exportAlignmentsPretty output
minor: New anchor poins added to exportAlignmentsPretty output
77 changes: 77 additions & 0 deletions doc/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,16 @@ The following fields can be exported both for alignments and clones:
+-----------------------------------+----------------------------------------------------------+
| ``-minFeatureQuality [feature]`` | Minimal quality of sequence of specified gene feature. |
+-----------------------------------+----------------------------------------------------------+
| ``-defaultAnchorPoints`` | Outputs a list of default anchor points (see table |
| | below for the list of anchor points and format). |
+-----------------------------------+----------------------------------------------------------+
| ``-lengthOf [feature]`` | Outputs length of specified gene feature. |
+-----------------------------------+----------------------------------------------------------+
| ``-positionOf [anchorPoint]`` | Outputs position of specified anchor point in the |
| | clonal sequence or aligned read. |
+-----------------------------------+----------------------------------------------------------+



The following fields are specific for alignments:

Expand Down Expand Up @@ -183,6 +193,73 @@ The following fields are specific for clones:
| ``-targets`` | Number of targets, i.e. number of gene regions used to assemble clones. |
+---------------+----------------------------------------------------------------------------------------+

Default anchor point positions
------------------------------

Positions of anchor poins produced by ``-defaultAnchorPoints`` option are outputted as a colon separated list.
If anchor point is not covered by target sequence nothing is printed for it, but flanking colon symbols are
preserved to maintain positions in array. See example:

::

:::::::::108:117:125:152:186:213:243:244:

If there are several target sequences (e.g. paired-end reads or multi-part clonal sequnce), the array is outputted for
each target sequence. In this case arrays are sepparated by comma:

::

2:61:107:107:118:::::::::::::,:::::::::103:112:120:147:181:208:238:239:

Even if there are no anchor points in either of parts:

::

:::::::::::::::::,:::::::::108:117:125:152:186:213:243:244:


The following table shows the correspondance between anchor point and positions in default anchor point array:

+--------------------------+---------------------+--------------------+
| Anchors point | Zero-based position | One-based position |
+==========================+=====================+====================+
| V5UTRBeginTrimmed | 0 | 1 |
+--------------------------+---------------------+--------------------+
| V5UTREnd / L1Begin | 1 | 2 |
+--------------------------+---------------------+--------------------+
| L1End / VIntronBegin | 2 | 3 |
+--------------------------+---------------------+--------------------+
| VIntronEnd / L2Begin | 3 | 4 |
+--------------------------+---------------------+--------------------+
| L2End / FR1Begin | 4 | 5 |
+--------------------------+---------------------+--------------------+
| FR1End / CDR1Begin | 5 | 6 |
+--------------------------+---------------------+--------------------+
| CDR1End / FR2Begin | 6 | 7 |
+--------------------------+---------------------+--------------------+
| FR2End / CDR2Begin | 7 | 8 |
+--------------------------+---------------------+--------------------+
| CDR2End / FR3Begin | 8 | 9 |
+--------------------------+---------------------+--------------------+
| FR3End / CDR3Begin | 9 | 10 |
+--------------------------+---------------------+--------------------+
| VEndTrimmed | 10 | 11 |
+--------------------------+---------------------+--------------------+
| DBeginTrimmed | 11 | 12 |
+--------------------------+---------------------+--------------------+
| DEndTrimmed | 12 | 13 |
+--------------------------+---------------------+--------------------+
| JBeginTrimmed | 13 | 14 |
+--------------------------+---------------------+--------------------+
| CDR3End / FR4Begin | 14 | 15 |
+--------------------------+---------------------+--------------------+
| FR4End | 15 | 16 |
+--------------------------+---------------------+--------------------+
| CBegin | 16 | 17 |
+--------------------------+---------------------+--------------------+
| CExon1End | 17 | 18 |
+--------------------------+---------------------+--------------------+

Examples
--------

Expand Down
166 changes: 87 additions & 79 deletions doc/geneFeatures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,12 @@ If D gene is not found in the sequence or is not present in target locus
(e.g. TRA), ``DBeginTrimmed`` and ``DEndTrimmed`` anchor points as well
as ``VDJunction`` and ``DJJunction`` gene features are not defined.

Similar to ``...Trimmed`` anchor points in V(D)J junction there is a
``V5UTRBeginTrimmed`` anchor point representing left bound of alignment
upstream start codon. This point is required because 5'UTR could have
different length from transcript to transcript, and because library of
gene segments inside MiXCR does'n have information on exact 5'UTR lengths.

.. _ref-featureSyntax:

Gene feature syntax
Expand Down Expand Up @@ -119,82 +125,84 @@ it is by example:
List of predefined gene features
--------------------------------

+----------------------------+--------------------------------------+
| Gene Feature Name | Gene feature decomposition |
+============================+======================================+
| VGene | {UTR5Begin:VEnd} |
+----------------------------+--------------------------------------+
| VDJTranscript | {UTR5Begin:L1End}+{L2Begin:FR4End} |
+----------------------------+--------------------------------------+
| V5UTR | {UTR5Begin:UTR5End} |
+----------------------------+--------------------------------------+
| VTranscript | {UTR5Begin:L1End}+{L2Begin:VEnd} |
+----------------------------+--------------------------------------+
| Exon1 | {L1Begin:L1End} |
+----------------------------+--------------------------------------+
| L | {L1Begin:L1End}+{L2Begin:L2End} |
+----------------------------+--------------------------------------+
| VTranscriptWithout5UTR | {L1Begin:L1End}+{L2Begin:VEnd} |
+----------------------------+--------------------------------------+
| VLIntronL | {L1Begin:L2End} |
+----------------------------+--------------------------------------+
| VDJTranscriptWithout5UTR | {L1Begin:L1End}+{L2Begin:FR4End} |
+----------------------------+--------------------------------------+
| Intron | {VIntronBegin:VIntronEnd} |
+----------------------------+--------------------------------------+
| VExon2 | {L2Begin:VEnd} |
+----------------------------+--------------------------------------+
| Exon2 | {L2Begin:FR4End} |
+----------------------------+--------------------------------------+
| L2 | {L2Begin:L2End} |
+----------------------------+--------------------------------------+
| VExon2Trimmed | {L2Begin:VEndTrimmed} |
+----------------------------+--------------------------------------+
| FR1 | {FR1Begin:FR1End} |
+----------------------------+--------------------------------------+
| VRegionTrimmed | {FR1Begin:VEndTrimmed} |
+----------------------------+--------------------------------------+
| VRegion | {FR1Begin:VEnd} |
+----------------------------+--------------------------------------+
| VDJRegion | {FR1Begin:FR4End} |
+----------------------------+--------------------------------------+
| CDR1 | {CDR1Begin:CDR1End} |
+----------------------------+--------------------------------------+
| FR2 | {FR2Begin:FR2End} |
+----------------------------+--------------------------------------+
| CDR2 | {CDR2Begin:CDR2End} |
+----------------------------+--------------------------------------+
| FR3 | {FR3Begin:FR3End} |
+----------------------------+--------------------------------------+
| VCDR3Part | {CDR3Begin:VEndTrimmed} |
+----------------------------+--------------------------------------+
| CDR3 | {CDR3Begin:CDR3End} |
+----------------------------+--------------------------------------+
| GermlineVCDR3Part | {CDR3Begin:VEnd} |
+----------------------------+--------------------------------------+
| ShortCDR3 | {CDR3Begin(3):CDR3End(-3)} |
+----------------------------+--------------------------------------+
| VDJunction | {VEndTrimmed:DBeginTrimmed} |
+----------------------------+--------------------------------------+
| VJJunction | {VEndTrimmed:JBeginTrimmed} |
+----------------------------+--------------------------------------+
| DRegion | {DBegin:DEnd} |
+----------------------------+--------------------------------------+
| DCDR3Part | {DBeginTrimmed:DEndTrimmed} |
+----------------------------+--------------------------------------+
| DJJunction | {DEndTrimmed:JBeginTrimmed} |
+----------------------------+--------------------------------------+
| GermlineJCDR3Part | {JBegin:CDR3End} |
+----------------------------+--------------------------------------+
| JRegion | {JBegin:FR4End} |
+----------------------------+--------------------------------------+
| JRegionTrimmed | {JBeginTrimmed:FR4End} |
+----------------------------+--------------------------------------+
| JCDR3Part | {JBeginTrimmed:CDR3End} |
+----------------------------+--------------------------------------+
| FR4 | {FR4Begin:FR4End} |
+----------------------------+--------------------------------------+
| CExon1 | {CBegin:CExon1End} |
+----------------------------+--------------------------------------+
| CRegion | {CBegin:CEnd} |
+----------------------------+--------------------------------------+
+---------------------------+-------------------------------------+
| Gene Feature Name | Gene feature decomposition |
+===========================+=====================================+
| V5UTRGermline | {UTR5Begin:V5UTREnd} |
+---------------------------+-------------------------------------+
| VGene | {UTR5Begin:VEnd} |
+---------------------------+-------------------------------------+
| VTranscript | {UTR5Begin:L1End}+{L2Begin:VEnd} |
+---------------------------+-------------------------------------+
| VDJTranscript | {UTR5Begin:L1End}+{L2Begin:FR4End} |
+---------------------------+-------------------------------------+
| V5UTR | {V5UTRBeginTrimmed:V5UTREnd} |
+---------------------------+-------------------------------------+
| VDJTranscriptWithout5UTR | {L1Begin:L1End}+{L2Begin:FR4End} |
+---------------------------+-------------------------------------+
| VLIntronL | {L1Begin:L2End} |
+---------------------------+-------------------------------------+
| L | {L1Begin:L1End}+{L2Begin:L2End} |
+---------------------------+-------------------------------------+
| VTranscriptWithout5UTR | {L1Begin:L1End}+{L2Begin:VEnd} |
+---------------------------+-------------------------------------+
| Exon1 | {L1Begin:L1End} |
+---------------------------+-------------------------------------+
| Intron | {VIntronBegin:VIntronEnd} |
+---------------------------+-------------------------------------+
| Exon2 | {L2Begin:FR4End} |
+---------------------------+-------------------------------------+
| VExon2Trimmed | {L2Begin:VEndTrimmed} |
+---------------------------+-------------------------------------+
| L2 | {L2Begin:L2End} |
+---------------------------+-------------------------------------+
| VExon2 | {L2Begin:VEnd} |
+---------------------------+-------------------------------------+
| VRegionTrimmed | {FR1Begin:VEndTrimmed} |
+---------------------------+-------------------------------------+
| VDJRegion | {FR1Begin:FR4End} |
+---------------------------+-------------------------------------+
| FR1 | {FR1Begin:FR1End} |
+---------------------------+-------------------------------------+
| VRegion | {FR1Begin:VEnd} |
+---------------------------+-------------------------------------+
| CDR1 | {CDR1Begin:CDR1End} |
+---------------------------+-------------------------------------+
| FR2 | {FR2Begin:FR2End} |
+---------------------------+-------------------------------------+
| CDR2 | {CDR2Begin:CDR2End} |
+---------------------------+-------------------------------------+
| FR3 | {FR3Begin:FR3End} |
+---------------------------+-------------------------------------+
| GermlineVCDR3Part | {CDR3Begin:VEnd} |
+---------------------------+-------------------------------------+
| VCDR3Part | {CDR3Begin:VEndTrimmed} |
+---------------------------+-------------------------------------+
| CDR3 | {CDR3Begin:CDR3End} |
+---------------------------+-------------------------------------+
| ShortCDR3 | {CDR3Begin(3):CDR3End(-3)} |
+---------------------------+-------------------------------------+
| VJJunction | {VEndTrimmed:JBeginTrimmed} |
+---------------------------+-------------------------------------+
| VDJunction | {VEndTrimmed:DBeginTrimmed} |
+---------------------------+-------------------------------------+
| DRegion | {DBegin:DEnd} |
+---------------------------+-------------------------------------+
| DCDR3Part | {DBeginTrimmed:DEndTrimmed} |
+---------------------------+-------------------------------------+
| DJJunction | {DEndTrimmed:JBeginTrimmed} |
+---------------------------+-------------------------------------+
| JRegion | {JBegin:FR4End} |
+---------------------------+-------------------------------------+
| GermlineJCDR3Part | {JBegin:CDR3End} |
+---------------------------+-------------------------------------+
| JCDR3Part | {JBeginTrimmed:CDR3End} |
+---------------------------+-------------------------------------+
| JRegionTrimmed | {JBeginTrimmed:FR4End} |
+---------------------------+-------------------------------------+
| FR4 | {FR4Begin:FR4End} |
+---------------------------+-------------------------------------+
| CRegion | {CBegin:CEnd} |
+---------------------------+-------------------------------------+
| CExon1 | {CBegin:CExon1End} |
+---------------------------+-------------------------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ public static MultiAlignmentHelper getTargetAsMultiAlignment(VDJCAlignments vdjc
}

public static final PointToDraw[] points = new PointToDraw[]{
pd(ReferencePoint.UTR5BeginTrimmed, "<5'UTR"),
pd(ReferencePoint.UTR5End, "5'UTR><L1"),
pd(ReferencePoint.V5UTRBeginTrimmed, "<5'UTR"),
pd(ReferencePoint.V5UTREnd, "5'UTR><L1"),
pd(ReferencePoint.L1End, "L1>"),
pd(ReferencePoint.L2Begin, "<L2"),
pd(ReferencePoint.FR1Begin, "L2><FR1"),
Expand Down

0 comments on commit 622b80f

Please sign in to comment.