Skip to content

Commit

Permalink
Change the output format to VCF 4.1
Browse files Browse the repository at this point in the history
Add license and readme files
  • Loading branch information
jiantao committed Aug 29, 2012
1 parent 39046df commit 9b8190d
Show file tree
Hide file tree
Showing 6 changed files with 255 additions and 6 deletions.
21 changes: 21 additions & 0 deletions LICENSE
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2012 Jiantao Wu, Gabor Marth

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
71 changes: 71 additions & 0 deletions README.md
@@ -0,0 +1,71 @@
=========================================================================================
Tangram 0.1.0 Release Distribution Documentation 2012-08-29
Jiantao Wu (jiantaowu.xining@gmail.com), Marth Lab [1], Boston College Biology Department
=========================================================================================


Introduction
=========================

Tangram is a C/C++ command line toolbox for structural variation(SV) detection based on
MOSAIK [2] alignments. It takes advantage of both read-pair and split-read algorithms
and is extremely fast and memory-efficient. Powered by the Bamtools API [3], Tangram can
call SV events on multiple BAM files (a population) simutaneously to increase the
sensitivity on low-coverage dataset.Currently it reports mobile element insertions (MEI).
More other SV event types will be introduced soon.


Obtaining and Compiling
=========================

> git clone git://github.com/jiantao/Tangram.git
> cd src
> make

Detection pipeline
=========================

Currently, Tangram contains four sub-programs:

1. tangram_scan : Scan through the bam file and calculate the fragment length distribution
for each library in that bam file. It will output the fragment length
distribution files for each input bam file.

2. tangram_merge : If more than one bam files need to be scanned, this program will combine
all the fragment length distribution files together. It will output the
merged fragment length distribution file that enable the detection of
multiple bam files simutaneously. This step is optional if only one bam
file was used.

3. tangram_index : Index the normal and special (MEI sequences) reference file. It will output
the indexed refrence file. This step is required for split read algorithm.

4. tangram_detect : Detect the SV events from the MOSAIK aligned BAM files. It will output the
unfiltered VCF files.


The overall detection pipeline for Tangram looks like the following

tangram_scan (input: BAM files) --> fragment length distribution files \
\
-----> tangram_detect (BAM files) --> VCF files
/
tangram_index (input: reference fasta files) --> indexed reference file /

For the detailed usage of each program, please run "$PROGRAM -help"


Bug Report
=========================

Please report bugs using the built-in bug reporting feature in github or by sending the author
an email.


References
=========================

[1] http://bioinformatics.bc.edu/marthlab/Main_Page
[2] https://github.com/wanpinglee/MOSAIK
[3] https://github.com/pezmaster31/bamtools
6 changes: 6 additions & 0 deletions src/TangramDetect/TGM_Array.h
Expand Up @@ -34,6 +34,7 @@ namespace Tangram
~Array();

inline void Init(unsigned int capacity);
inline void MemSet(int value);
inline void InitToEnd(void);

void ResizeNoCopy(unsigned int newCap);
Expand Down Expand Up @@ -113,6 +114,11 @@ namespace Tangram
memset(data + size, 0, sizeof(T) * (capacity - size));
}

template <class T> inline void Array<T>::MemSet(int value)
{
memset(data, value, sizeof(T) * (capacity));
}

template <class T> void Array<T>::ResizeNoCopy(unsigned int newCap)
{
if (capacity < newCap)
Expand Down
5 changes: 5 additions & 0 deletions src/TangramDetect/TGM_LibTable.h
Expand Up @@ -69,6 +69,11 @@ namespace Tangram

bool GetSpecialRefID(uint32_t& specialRefID, const char* specialRefName) const;

inline unsigned int GetNumSamples(void) const
{
return sampleNames.Size();
}

inline const Array<char*>& GetAnchorNames(void) const
{
return anchorNames;
Expand Down
146 changes: 142 additions & 4 deletions src/TangramDetect/TGM_Printer.cpp
Expand Up @@ -16,6 +16,7 @@
* =====================================================================================
*/

#include <time.h>
#include "TGM_Printer.h"

using namespace Tangram;
Expand Down Expand Up @@ -50,6 +51,11 @@ void Printer::Print(void)
void Printer::PrintSpecial(void)
{
unsigned int numSp = libTable.GetNumSpecialRef();
unsigned int numSamples = libTable.GetNumSamples();

PrintSpecialHeader();
sampleMap.Init(numSamples);
sampleMap.SetSize(numSamples);

for (unsigned int i = 0; i != numSp; ++i)
{
Expand All @@ -69,7 +75,7 @@ void Printer::PrintSpecial(void)

while (GetNextSpecial(pRpSpecials, pSplitSpecials))
{
sampleMap.clear();
sampleMap.MemSet(0);
InitFeatures();

if (printIdx.pRpSpecial != NULL)
Expand All @@ -78,12 +84,15 @@ void Printer::PrintSpecial(void)
if (printIdx.pSplitEvent != NULL)
SetSampleInfoSplit(*(printIdx.pSplitEvent));

SetSampleString();
// SetSampleString();
SetSpecialFeatures(i);
PrintSpecialBody();

/*
printf("chr%s\t%d\t%d\t%c\t%d\t%d\t%d\t%d\t%s\t%d\t%d\t%d\t%d\t%s\n", features.anchorName, features.pos, features.pos + features.len + 1, features.strand,
features.rpFrag[0], features.rpFrag[1], features.splitFrag[0], features.splitFrag[1], features.spRefName, features.pos5[0], features.pos5[1],
features.pos3[0], features.pos3[1], formatted.str().c_str());
*/
}
}

Expand All @@ -101,23 +110,150 @@ void Printer::PrintSpecial(void)

for (unsigned int j = 0; j != len; ++j)
{
sampleMap.clear();
sampleMap.MemSet(0);
InitFeatures();

SetSampleInfoSplit(pSplitSpecials[j]);

SetSampleString();
// SetSampleString();
SetSpecialFeaturesFromSplit(pSplitSpecials[j]);

PrintSpecialBody();

/*
printf("chr%s\t%d\t%d\t%c\t%d\t%d\t%d\t%d\t%s\t%d\t%d\t%d\t%d\t%s\n", features.anchorName, features.pos, features.pos + features.len + 1, features.strand,
features.rpFrag[0], features.rpFrag[1], features.splitFrag[0], features.splitFrag[1], features.spRefName, features.pos5[0], features.pos5[1],
features.pos3[0], features.pos3[1], formatted.str().c_str());
*/
}
}
}
}
}

void Printer::PrintSpecialHeader(void)
{
time_t now = time(0);
struct tm tstruct;
char buf[80];
tstruct = *localtime(&now);
strftime(buf, sizeof(buf), "%Y%m%d", &tstruct);

printf("##fileformat=VCFv4.1\n"
"##fileDate=%s\n"
"##source=Tangram\n"
"##ALT=<ID=INS:ME:AL,Description=\"Insertion of ALU element\">\n"
"##ALT=<ID=INS:ME:L1,Description=\"Insertion of L1 element\">\n"
"##ALT=<ID=INS:ME:SV,Description=\"Insertion of SVA element\">\n"
"##ALT=<ID=INS:ME:HE,Description=\"Insertion of HERV element\">\n"
"##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description=\"Imprecise structural variation\">\n"
"##INFO=<ID=STRAND,Number=1,Type=String,Description=\"Orientation of the inserted mobile elements. '/' means the strand information is not available\">\n"
"##INFO=<ID=CIPOS,Number=2,Type=Integer,Description=\"Confidence interval around POS for imprecise variants. Only presents if the 'IMPRECISE' flag is set\">\n"
"##INFO=<ID=MEILEN,Number=1,Type=Integer,Description=\"Inserted length of MEI. -1 means the inserted length is not available.\">\n"
"##INFO=<ID=FRAG,Number=4,Type=Integer,Description=\"Detailed information of supporting fragments: 5' read-pair fragments, 3' read-pair fragments,"
"5' split fragments and 3' split fragments \">\n"
"##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\n"
"##FORMAT=<ID=AD,Number=1,Type=Integer,Description=\"Allele Depth, how many reads support this allele\">\n",
buf);

printf("#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT");

const Array<char*>& sampleNames = libTable.GetSampleNames();

unsigned int numSamples = sampleNames.Size();
for (unsigned int i = 0; i != numSamples; ++i)
{
printf("\t%s", sampleNames[i]);
}

printf("\n");
}

void Printer::PrintSpecialBody(void)
{
int insertedLen = -1;
if (printIdx.pSplitEvent != NULL)
{
if (printIdx.pSplitEvent->pSpecialData->end >= 0 && printIdx.pSplitEvent->pSpecialData->pos >=0)
insertedLen = printIdx.pSplitEvent->pSpecialData->end - printIdx.pSplitEvent->pSpecialData->pos + 1;
}

int splitFrag = features.splitFrag[0] + features.splitFrag[1];
char refChar = char_table[pRef->refSeq[features.pos - pRef->pos]];

formatted.clear();
formatted.str("");

if (splitFrag == 0)
{
int ciPos1 = 0;
int ciPos2 = 0;
if (features.pos5[1] < features.pos3[0])
{
ciPos1 = features.pos5[0] - features.pos5[1];
ciPos2 = features.pos3[1] - features.pos5[1];
}
else
{
ciPos1 = features.pos5[0] - features.pos3[0];
ciPos2 = features.pos3[1] - features.pos3[0];
}

formatted << "CIPOS=" << ciPos1 << "," << ciPos2 << ";";
}

printf("chr%s\t"
"%d\t"
".\t"
"%c\t"
"<INS:ME:%s>\t"
".\t"
".\t"
"%s%s",
features.anchorName,
features.pos,
refChar,
features.spRefName,
splitFrag > 0 ? "" : "IMPRECISE;",
formatted.str().c_str()
);

formatted.clear();
formatted.str("");

formatted << "FRAG=" << features.rpFrag[0] << "," << features.rpFrag[1] << "," << features.splitFrag[0] << "," << features.splitFrag[1] << ";";

printf("%sSTRAND=%c;MEILEN=%d\t"
"GT:AD",
formatted.str().c_str(),
features.strand,
insertedLen
);

unsigned int numSamples = libTable.GetNumSamples();
char buff[4];
buff[1] = '/';
buff[3] = '\0';

for (unsigned int i = 0; i != numSamples; ++i)
{
if (sampleMap[i] > 0)
{
buff[0] = '1';
buff[2] = '.';
}
else
{
buff[0] = '0';
buff[2] = '0';
}

printf("\t%s:%d", buff, sampleMap[i]);
}

printf("\n");
}

bool Printer::GetNextSpecial(const SpecialEvent* pRpSpecials, const SplitEvent* pSplitSpecials)
{
if (printIdx.rpIdx == printIdx.rpSize)
Expand Down Expand Up @@ -238,6 +374,7 @@ void Printer::SetSampleInfoSplit(const SplitEvent& splitEvent)
}
}

/*
void Printer::SetSampleString(void)
{
const Array<char*>& sampleNames = libTable.GetSampleNames();
Expand All @@ -252,6 +389,7 @@ void Printer::SetSampleString(void)
formatted << name << "_" << itor->second << "_";
}
}
*/

void Printer::SetSpecialFeatures(unsigned int zaSpRefID)
{
Expand Down
12 changes: 10 additions & 2 deletions src/TangramDetect/TGM_Printer.h
Expand Up @@ -80,6 +80,10 @@ namespace Tangram

void PrintSpecial(void);

void PrintSpecialHeader(void);

void PrintSpecialBody(void);

inline void InitFeatures(void)
{
features.anchorName = NULL;
Expand Down Expand Up @@ -115,21 +119,25 @@ namespace Tangram

void SetSampleInfoSplit(const SplitEvent& splitEvent);

void SetSampleString(void);
// void SetSampleString(void);

void SetSpecialFeatures(unsigned int zaSpRefID);

void SetSpecialFeaturesFromSplit(const SplitEvent& splitEvent);

void SetSpecialFeaturesFromRp(const SpecialEvent& rpSpecial, unsigned int zaSpRefID);

bool SpecialPrintFilter(void);

private:

PrintIdx printIdx;

PrintFeatures features;

std::map<unsigned int, unsigned int> sampleMap;
// std::map<unsigned int, unsigned int> sampleMap;

Array<unsigned int> sampleMap;

std::ostringstream formatted;

Expand Down

0 comments on commit 9b8190d

Please sign in to comment.