Skip to content

Allele-Specific Quantification of Structural Variations in Cancer Genomes

Notifications You must be signed in to change notification settings

leofountain/Weaver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weaver

Allele specific base-pair resolution quantification of Strcutrual variations in cancer genome

yangli9@illinois.edu
leofountain@gmail.com

Version 0.20

----------------------------
INSTALL
----------------------------

Bamtools (https://github.com/pezmaster31/bamtools) libraries are needed

	included in Weaver_SV/lib and Weaver_SV/inc

export LD_LIBRARY_PATH=<PREFIX>/Weaver/Weaver_SV/lib/:$LD_LIBRARY_PATH

libz required //-lz flag

Parallel::ForkManager (http://search.cpan.org/~szabgab/Parallel-ForkManager-1.06/lib/Parallel/ForkManager.pm) perl package is needed

Bedtools (https://github.com/arq5x/bedtools)

Samtools (http://samtools.sourceforge.net/)

BOOST C++ library (http://www.boost.org/)

BWA (http://bio-bwa.sourceforge.net/)

Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)




1	Modify the required BOOST directory in src/Makefile

2	./INSTALL.sh


-----------------------------
DATA
-----------------------------

wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz





-----------------------------
EXAMPLE DATA
-----------------------------

wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_example.tar.gz



RUN:

Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64
solo_ploidy TARGET 2
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0

----------------------------
Weaver_SV.pl
----------------------------
SV finding
Input:
	BAM file from BWA

Output:
	VCF file for SV

----------------------------
Weaver_pipeline.pl
----------------------------
Master program:
	1	Generate SV
	2	Generate other inputs needed for Weaver

INPUTS

DATA package:
	1000 Genomes Project Phase 1 haplotypes



----------------------------
Weaver
----------------------------
Core PGM program

INPUTS:
	1	SV

Outputs:
	1	Purity and haploid-level sequencing coverage
	2	Allele specific copy number of genomic regions
	3	Allele specific copy number of structural variations
	4	Relative timing of structural variations
	5	Cancer scaffolds
	5	Phasing of germline SNPs in CNV regions



----------------------------
Weaver_lite
----------------------------
Core PGM program, with SNP phasing disabled to speed up

INPUTS:
	1	SV
	2	reference
	3	Mappability (available for hg19)
	4	Region (available for hg19)
	5	wig (from bam)



----------------------------
Weaver PLOIDY
----------------------------

Weaver PLOIDY -f  -S  -s ../SNP_dens -g GAP_20140416_num -w  -r 1 -m  -p 16



INPUTS:

-f reference file (fasta), should match the reference used in original bam file. Especially for most TCGA datasets, the alignment was performed on //www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta, which does not have "chr" prefix  [MANDATORY]

-S SV file, with format consistent with Weaver_SV. [MANDATORY]
 
-s SNP file, with ref and alt mappings [MANDATORY]

-w wig file from bam, storing the coverage information [MANDATORY]

-r 1, if first time running (generating temp files); 0 if want to use existing temp files. [default 1]

-m mappability file, download from http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz [MANDATORY]

-p number of cores [default 1]


-----------------------------
FILE FORMAT DECLARITIONS
-----------------------------

Wiggle file:

Wiggle file need to be declared with fixedStep, step 1 and span 1
fixedStep chrom=chr1 start=9994 step=1 span=1
if a chromosome has multiple declaration lines, they need to be sorted based on position:
fixedStep chrom=chr1 start=9994 step=1 span=1
X
X
X
fixedStep chrom=chr1 start=100 step=1 span=1
X
X
X
Is not allowed



Bam file:

Must be sorted and indexed.

SNP file:

NGS SNP link file


1KGP SNP link


SV:


Genome region file:

GAP regions in assembly are annotated.

###################
Output:
###################

REGION_CN_PHASE: storing phased allele specific copy number of genome

CHR	BEGIN	END	ALLELE_1_CN	ALLELE_2_CN




SV_CN_PHASE: Structural variation copy number and phasing, catagory

CHR_1	POS_1	ORI_1	ALLELE_	CHR_2   POS_2   ORI_2   ALLELE_	CN	germline/somatic_post_aneuploidy/somatic_pre_aneuploidy



###############
CONTACT
###############

Yang Li
Ma Lab
Bioengineering Dept., University of Illinois at Urbana-Champaign

yangli9@illinois.edu
https://github.com/leofountain/Weaver

About

Allele-Specific Quantification of Structural Variations in Cancer Genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages