A fast multi-threaded k-mer counter
C++ M4 Shell Makefile Python Perl Ruby
Permalink
Failed to load latest commit information.
doc Documentation: how to count a subset of k-mers May 8, 2018
examples Fixed Makefile in examples. Nov 9, 2017
include/jellyfish Move stream Feb 8, 2018
jellyfish Fix testing on arm64. Feb 14, 2018
lib Check for posix_memalign or aligned_alloc Mar 29, 2018
m4 Update with last m4-ax_ext.m4 from gnu archive Feb 19, 2018
sub_commands Merge branch 'feature/sam_support' into develop Nov 10, 2017
swig Updated installation documentation of extensions. Apr 3, 2018
tests Added missing tests/small_mers.sh file. Feb 12, 2018
unit_tests Grow value field if key size small. Feb 2, 2018
.gitignore No more errors on bloom filter error rate. Jul 7, 2017
.gitmod Added .gitmod file with branch to track Sep 24, 2015
.travis.yml travis.yml: only upload test logs Nov 9, 2017
CHANGES Virtual destructor if virtual methods. Added warning to support old v… May 26, 2011
HalfLICENSE Added support for half float. Not well tested yet. Apr 7, 2011
LICENSE Legalese. Nov 5, 2010
Makefile.am Grow value field if key size small. Feb 2, 2018
README Fixed typo in README. May 12, 2014
README.md Added dependencies to compile from git tree. Jun 6, 2018
config.rpath Detect htslib (for sam/bam/cram support) properly. Mar 10, 2017
configure.ac Bump version to 2.2.10 Mar 30, 2018
development.mk Don't use top_srcdir but srcdir. More portable to be included as subm… Jun 27, 2015
gtest.mk Don't use top_srcdir but srcdir. More portable to be included as subm… Jun 27, 2015
header-license First possibly working version of large_hash_array. Testing takes too… Oct 3, 2012
jellyfish-2.0.pc.in Moved some files in preparation of 2.0 Jan 22, 2013
jellyfish.spec.in Added jellyfish.spec.in Feb 6, 2015
local.mk basic autoconf setup created Jul 28, 2010

README.md

Jellyfish

Overview

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. Jellyfish can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in a binary format, which can be translated into a human-readable text format using the "jellyfish dump" command, or queried for specific k-mers with "jellyfish query". See the UserGuide provided on Jellyfish's home page for more details.

If you use Jellyfish in your research, please cite:

Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011

Installation

To get an easier to compiled packaged tar ball of the source code, download a release from the github release. You need make and g++ version 4.4 or higher. To install in your home directory, do:

./configure --prefix=$HOME
make -j 4
make install

To compile from the git tree, you will also need autoconf, automake, libool, gettext, pkg-config and yaggo. Then to compile and install (in /usr/local in that example) with:

autoreconf -i
./configure
make -j 4
sudo make install

If the software is installed in system directories (hint: you needed to use sudo to install), like the example above, then the system library cache must be updated like such:

sudo ldconfig

Usage

Instruction of use are available in the doc directory.

Extra / Examples

In the examples directory are potentially useful extra programs to query/manipulates output files of Jellyfish, using the shared library of Jellyfish in C++ or with scripting languages. The examples are not compiled by default. Each subdirectory of examples is independent and is compiled with a simple invocation of 'make'.

Binding to script languages

Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the release tarball. The development files of the target scripting language are required.

Compilation of the bindings from the git tree requires SWIG version 3 and adding the switch --enable-swig to the configure command lines show below.

To compile all three bindings, configure and compile with:

./configure --enable-ruby-binding --enable-python-binding --enable-perl-binding
make -j 4
sudo make install

By default, Jellyfish is installed in /usr/local and the bindings are installed in the proper system location. When the --prefix switch is passed, the bindings are installed in the given directory. For example:

./configure --prefix=$HOME --enable-python-binding
make -j 4
make install

This will install the python binding in $HOME/lib/python2.7/site-packages (adjust based on your Python version).

Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (PYTHONPATH, RUBYLIB and PERL5LIB respectively). For example:

export PYTHONPATH=$HOME/lib/python2.7/site-packages

See the swig directory for examples on how to use the bindings.