ngs-bits - Short-read sequencing tools
Binaries of ngs-bits are available via Bioconda. Alternatively, ngs-bits can be built from sources:
Changes already implemented in HEAD for next release:
- Added tools: VcfBreakMulti.
Changes in release 2018_10:
- Expecting VEP instead of SnpEff annotations in VCFs now (VariantQC, SomaticQC, RohHunter).
- Added GRCh38 support (MappingQC, SomaticQC, SampleGender, SampleSimilarity).
- Added tools: SampleAncestry, VcfCheck.
For older releases see the releases page.
Please report any issues or questions to the ngs-bits issue tracker.
Have a look at the ECCB'2018 poster.
The documentation of individual tools is linked in the tools list below.
For some tools the documentation pages contain only the command-line help, for other tools they contain more information.
ngs-bits contains a lot of tools that are used for NGS-based diagnostics in our institute:
- SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data.
- SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files.
- SampleGender - Determines sample gender based on a BAM file.
- SampleAncestry - Estimates the ancestry of a sample based on variants.
- CnvHunter - CNV detection from targeted resequencing data using non-matched control samples.
- RohHunter - ROH detection based on a variant list annotated with AF values.
- UpdHunter - UPD detection from trio variant data.
The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used.
- ReadQC - Quality control tool for FASTQ files.
- MappingQC - Quality control tool for a BAM file.
- VariantQC - Quality control tool for a VCF file.
- SomaticQC - Quality control tool for tumor-normal pairs (paper and example output data).
- BamClipOverlap - (Soft-)Clips paired-end reads that overlap.
- BamDownsample - Downsamples a BAM file to the given percentage of reads.
- BamFilter - Filters a BAM file by multiple criteria.
- BamHighCoverage - Determines high-coverage regions in a BAM file.
- BamToFastq - Converts a BAM file to FASTQ files (paired-end only).
- BedAdd - Merges regions from several BED files.
- BedAnnotateFromBed - Annotates BED file regions with information from a second BED file.
- BedAnnotateGC - Annnotates the regions in a BED file with GC content.
- BedChunk - Splits regions in a BED file to chunks of a desired size.
- BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files.
- BedExtend - Extends the regions in a BED file by n bases.
- BedInfo - Prints summary information about a BED file.
- BedIntersect - Intersects two BED files.
- BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file.
- BedMerge - Merges overlapping regions in a BED file.
- BedReadCount - Annoates the regions in a BED file with the read count from a BAM file.
- BedShrink - Shrinks the regions in a BED file by n bases.
- BedSort - Sorts the regions in a BED file
- BedSubtract - Subracts one BED file from another BED file.
- BedToFasta - Converts BED file to a FASTA file (based on the reference genome).
- FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs.
- FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset.
- FastqExtract - Extracts reads from a FASTQ file according to an ID list.
- FastqExtractBarcode - Moves molecular barcodes of reads to a separate file.
- FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID.
- FastqFormat - Determines the quality score offset of a FASTQ file.
- FastqList - Lists read IDs and base counts.
- FastqMidParser - Counts the number of occurances of each MID/index/barcode in a FASTQ file.
- FastqToFasta - Converts FASTQ to FASTA format.
- FastqTrim - Trims start/end bases from the reads in a FASTQ file.
- VcfAnnotateFromBed - Annotates the INFO column of a VCF with data from a BED file.
- VcfBreakMulti - Breaks multi-allelic variants into several lines, making sure that allele-specific INFO/SAMPLE fields are still valid.
- VcfCheck - Checks a VCF file for errors.
- VcfFilter - Filters a VCF based on the given criteria.
- VariantFilterRegions - Filter a variant list based on a target region.
- VcfLeftNormalize - Normalizes all variants and shifts indels to the left in a VCF file.
- VcfSort - Sorts variant lists according to chromosomal position.
- VcfStreamSort - Sorts entries of a VCF file according to genomic position using a stream.
Some of the tools need the NGSD, a MySQL database that contains for example gene, transcript and exon data.
Installation instructions for the NGSD can be found here.
- BedAnnotateGenes - Annotates BED file regions with gene names.
- BedGeneOverlap - Calculates how much of each overlapping gene is covered.
- GenesToApproved - Replaces gene symbols by approved symbols using the HGNC database.
- GenesToBed - Converts a text file with gene names to a BED file.
- NGSDExportGenes - Lists genes from NGSD.