Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
304 lines (216 sloc) 20 KB

Bioinformatics

Table of Contents

Bioinformatician

Social media

Programming skills

Bioinformatics

生物信息杂谈

Talks

Online courses

Book

Comprehensive packages

  • [python] Biopython
  • [golang] Biogo
  • [golang] bio - A simple but high-performance bioinformatics package in Go

General file formats

  • zindex - Create an index on a compressed text file
  • tabix  - table file index
  • wormtable - Write-once-read-many table for large datasets.

bam/sam/tabix/bgzf

  • [python] hts-python - pythonic wrapper for libhts
  • [python] htseq - HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments. http://www-huber.embl.de/users/anders/HTSeq/
  • [golang] biogo/hts
  • bamtools - C++ API & command-line toolkit for working with BAM data
  • samblaster -  a tool to mark duplicates and extract discordant and split reads from sam files.
  • [python] pysamstats - A fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.
  • [python] pysam - a python module for reading and manipulating Samfiles. It's a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix. Another sam parser: simplesam
  • grabix - a wee tool for random access into BGZF files
  • [golang]  bix - tabix file access with golang using biogo machinery
  • mergesam - Automate common sam & bam conversions
  • SAMstat - Displaying sequence statistics for next generation sequencing

Fasta/q

  • seqtk - Toolkit for processing sequences in FASTA/Q formats
  • seqkit - A cross-platform and efficient toolkit for FASTA/Q file manipulation http://bioinf.shenwei.me/seqkit
  • [python] pyfaidx - pyfaidx: efficient pythonic random access to fasta subsequences
  • [golang] bio - A lightweight and high-performance bioinformatics package in Go

FASTA index

GFF/BED/VCF

  • bedtools2 - A powerful toolset for genome arithmetic.
  • BEDOPS - the fast, highly scalable and easily-parallelizable genome analysis toolkit
  • [python] gffutils - GFF and GTF file manipulation and interconversion
  • [python] pybedtools - Python wrapper for Aaron Quinlan's BEDTools
  • [golang] irelate - Streaming relation (overlap, distance, KNN) of (any number of) sorted genomic interval sets. #golang
  • [golang] vcfgo - a golang library to read, write and manipulate files in the variant call format.
  • vcflib - a simple C++ library for parsing and manipulating VCF files, + many command-line utilities

Others formats

  • blast_table2xml - Convert blast m6 format to xml for blast2go
  • seqmagick - file format conversion in Biopython in a convenient way

Database API

  • pyensembl - Python interface to ensembl reference genome metadata (exons, transcripts, etc...)

data structure

  • kvector - kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

Models

  • pomegranate - Graphical models for Python, implemented in Cython for speed.

Scripts

  • oneliners - Useful bash one-liners for bioinformatics.
  • cgat - CGAT - Computational Genomics Analysis Tools
  • bcbb - Incubator for useful bioinformatics code, primarily in Python and R http://bcbio.wordpress.com
  • jcvi - Python utility libraries on genome assembly, annotation and comparative genomics
  • picobio - Miscellaneous Bioinformatics scripts etc mostly in Python
  • pydna - Classes and code for representing double stranded DNA and functions for simulating homologous recombination and Gibson assembly.
  • BioUtils - Python scripts for miscellaneous bioinformatics stuff.
  • sesbio - Bioinformatics scripts for genome analysis
  • ngsutils - Tools for next-generation sequencing analysis http://ngsutils.org
  • ngsTools - Programs to analyse NGS data for population genetics purposes

Visualization

Circos Related

  • Circos: Perl package for circular plots, which are well suited for genomic rearrangements.
  • J-Circos is a java application for doing interactive work with circos plots.
  • ClicO FS: an interactive web-based service of Circos.
  • rCircos: R package for circular plots. [last update: 2013]
  • OmicCircos: R package for circular plots for omics data.[last update: 2015-04]

Others

Kmer

Phylogenetic tree

Taxonomy

Assembly

  • Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
  • nucleotid.es - an assembler catalogue

Alignment

  • hpg-aligner - HPG Aligner is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) mapper which supoprts both DNA and RNA alignment
  • AliView - Software for aligning viewing and editing dna/aminoacid sequences, intuitive, fast and lightweight. Download and website: http://www.ormbunkar.se/aliview

Multiple Alignment

Mapping

Bacterial comparative genomics

Metagenomics

16S

Classifier | removing human reads

  • taxonomer.iobio - Taxonomer is a kmer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from both clinical and environmental samples.
  • BMTagger - Best Match Tagger for removing human reads from metagenomics datasets paper,sop
  • Centrifuge - Classifier for metagenomic sequences

Virome

  • viral-ngs - Viral genomics analysis pipelines

Chip-seq

Plastform

  • Rabix - Portable Bioinformatics Pipelines
  • bioboxes - Standards for Interchangeable Bioinformatics Containers
  • Anvi’o is an analysis and visualization platform for ‘omics data. introduction

PCR

  • find_differential_primers - Scripts to aid the design of differential primers for diagnostic PCR.
  • Primer3-py - Primer3-py is a Python-abstracted API for the popular Primer3 library. The intention is to provide a simple and reliable interface for automated oligo analysis and design.

HPC

  • hpcgo - Helping submit jobs to HPC cluster.
  • easy_qsub  - Easily submitting PBS jobs with script template. Multiple input files supported.