SVelter

Description

This software is designed to identify both simple and complex rearrangements from paired-end sequencing data. Specific information regarding the methodology can be found in the respective publication:

Zhao X, Emery SB, Myers B, Kidd JM, and Mills RE. Resolving complex strucural genomic rearrangements using a randomized approach. Genome Biology 2016 Jun 10;17:126

Required third-party resources

R:        https://www.r-project.org/
python:   https://www.python.org/
samtools: http://samtools.sourceforge.net/

Quick Start

Download and Install

git clone https://github.com/mills-lab/svelter.git
cd svelter
python setup.py install --user
export PATH=$PATH:$HOME/.local/bin

Setup working directory:

svelter.py Setup --reference reference.fa --workdir /working/directory/ --support ../Support/hg19/

Run svelter with its default setting:

svelter.py --sample /absolute/path/of/sample.bam --workdir /working/directory/

Required files:

Exclude.ref.bed, CN2.ref.bed and Segdup.ref.bed are available under Support for some versions of human reference genome. 'Exclude.ref.bed' specifies the genomic regions to be excluded from SV analysis; CN2.ref.bed specifies the copy neutral genomic regions where SVs are rarely reported; Segdup.ref.bed specifieds predefined segmental duplications in reference genome; will be excluded from analysis;

Customized versions could be used, as long as they are in bed format, collected in the same folder, and named in the same format.For more details, please see Support.

Pre-indexed files of certain reference genomes have been produced and kept under folder /Support/Index-ref. For specific reference, if not pre-indexed files provided, the optional parameter '--ref-index' could be omit and the indexed files would be produced through the setup step.

Attention:

reference file should have been indexed by calling samtools first: samtools faidx ref.fasta
in the Setup step, reference file should be specified by absolute path
the pre-indexed files under ./Support/ref-index/ are restored through large file resources on github, which require manual download.
working directory is required to be writable for temporary files
with large sample size (eg. >50X whole genome sequencing), it is recommended that the parameter --null-copyneutral-perc 0.01 added to your command, with small ones (eg. < 10x), --null-copyneutral-perc 0.5 is recommended. This parameter decides the number of CN2 regions extracted for building null model.

Usage

svelter.py [options] [parameters]

Options:

    Setup
    Clean
    NullModel
    BPSearch
    BPIntegrate
    SVPredict
    SVIntegrate

Parameters:

For `Setup`:

Required Parameters:

	--workdir, writable working directory.
	--reference, absolute path of reference genome. eg: .../svelter/reference/genome.fa
	--support, folder containing all supportive file including: Exclude.bed,CN2.bed,Segdup.bed

Optional Parameters:

	--ref-index, folders containin pre-indexed files, if applicable. For certain versions of human genome, the indexed files are availabel from https://github.com/mills-lab/svelter.

For other steps:

Required:

	--workdir, writable working directory.
	--sample, input alignment file in bam format

Optional:

	--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used

	--null-copyneutral-length, minimum length requirement for --copyneutral regions used to build null model (default: 2000)

	--null-copyneutral-perc, percentage of regions from --copyneutral to utilize (default: 0.1)

	--null-random-length, specify the length of random regions if --copyneutral parameter not used (default: 5000)

	--null-random-num, specify the number of random regions if --copyneutral parameter not used (default: 10000)

	--num-iteration, maximum number of iterations per structure will run in SV predicting step

	--qc-map-cutoff, the minimum mapping quality required for a breakpoint to be reported (default: 0.0)

	--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)

	--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)

	--qc-structure, minimum quality score of a resolved structure to be considered as PASS and included in the output vcf file

	--split-min-len, the minumum length of clip read considered as split; (default:10% of read length)

	--prefix, output prefix for vcf and svelter files (default: input.vcf, input.svelter)

	--ploidy, limit algorithm to specific zygosity (0:heterozygous only; 1:homozygous only; 2:both; default:2)

For faster processing, svelter can by run with multiple cores:

Step1: Build null models:

svelter.py NullModel --sample /absolute/path/of/sample.bam --workdir /working/directory

Optional Parameters:

--chromosome, name of chromosome to run. should match chromosome name in bam file

--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used

--null-copyneutral-length, minimum length requirement for --copyneutral regions used to build null model (default: 2000)

--null-copyneutral-perc, percentage of regions from --copyneutral to utilize (default: 0.1)

--null-random-length, specify the length of random regions if --copyneutral parameter not used (default: 5000)

--null-random-num, specify the number of random regions if --copyneutral parameter not used (default: 10000)

--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)

--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)

--split-min-len, the minumum length of clip read considered as split  (default:10% of read length)

Step2: Search for Breakpoints:

svelter.py BPSearch --sample /absolute/path/of/sample.bam --workdir /working/directory

Optional Parameters:

--chromosome, name of chromosome to run. should match chromosome name in bam file

--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used

--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)

--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)

--split-min-len, the minumum length of clip read considered as split; (default:10% of read length)

--qc-map-tool, the tool extracts mappability information from a bigWig file,avaliable from: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigSummary

--qc-map-file, .bigWig file used to decide local genomic mappability, avaliable from: ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/encodeDCC/wgEncodeMapability/

--qc-map-cutoff, the minimum mapping quality required for a breakpoint to be reported (default: 0.0)

Step3: Cluster Breakpoints:

svelter.py BPIntegrate --sample /absolute/path/of/sample.bam --workdir /working/directory

Optional Parameters:

--chromosome, name of chromosome to run. should match chromosome name in bam file

--batch, specify number of structures in each separate file (if 0, output files will be calssified by chromosomes; default, all BP clustered will be integrated in one txt file)

Step4: Resolve complex structural variants:

svelter.py SVPredict --sample sample.bam --workdir /working/directory --bp-file input/file/containing/clustered/bps

Optional Parameters:

--num-iteration, maximum number of iterations per structure will run

--ploidy, limit algorithm to specific zygosity (0:heterozygous only; 1:homozygous only; 2:both; default:2)

--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used

--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)

Step5: Write output in vcf format:

svelter.py SVIntegrate --workdir /working/directory --prefix output  --input-path path/of/output/from/Step4

Optional Parameters:

--qc-structure, minimum quality score of a resolved structure to be considered as PASS and included in the output vcf file

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
Callsets		Callsets
Support.Scripts		Support.Scripts
Support.records		Support.records
Support		Support
svelter_sv		svelter_sv
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVelter

Description

Required third-party resources

Quick Start

Required files:

Attention:

Usage

Options:

Parameters:

For `Setup`:

Required Parameters:

Optional Parameters:

For other steps:

Required:

Optional:

For faster processing, svelter can by run with multiple cores:

Step1: Build null models:

Step2: Search for Breakpoints:

Step3: Cluster Breakpoints:

Step4: Resolve complex structural variants:

Step5: Write output in vcf format:

About

Releases 3

Packages

Contributors 2

Languages

License

mills-lab/svelter

Folders and files

Latest commit

History

Repository files navigation

SVelter

Description

Required third-party resources

Quick Start

Required files:

Attention:

Usage

Options:

Parameters:

For Setup:

Required Parameters:

Optional Parameters:

For other steps:

Required:

Optional:

For faster processing, svelter can by run with multiple cores:

Step1: Build null models:

Step2: Search for Breakpoints:

Step3: Cluster Breakpoints:

Step4: Resolve complex structural variants:

Step5: Write output in vcf format:

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

For `Setup`:

Packages