Copyright (C) 2015 Jonas Maaskola
Provided under GNU General Public License Version 3 or later.
See the file COPYING provided with this software for details of the license.
This program samples sequences without replacement from one or a pair of FASTQ files. The purpose for doing this is saturation analysis, i.e. to determine whether the performed sequencing has reached saturation.
How to use
Sequences sampled from the first (or only) FASTQ file are written to the standard output stream, while sequences sampled from the second FASTQ file are written to the standard error stream.
Apart from FASTQ files, sampling can also be done from FASTA files. Input files can be compressed with gzip or bzip2, and are uncompressed on the fly. Two threads are used during output when pairs of files are processed.
Currently, multi-line format variants of FASTA and FASTQ are not supported.
- clone the repository
- move into the directory
- create a directory where the code should be compiled
- change to it
- invoke CMake, specifying the path where the software should be installed
git clone https://github.com/maaskola/samplefq.git cd samplefq mkdir build cd build cmake .. -DCMAKE_INSTALL_PREFIX=/where/to/install make make install
read2.fastq are paired FASTQ files, the
following will sample 100 sequences without replacement:
samplefq -k 100 -1 read1.fastq -2 read2.fastq > read1_sample100.fastq 2> read2_sample100.fastq
If you want to sample sequences from FASTA files instead of FASTQ, just use the
samplefa binary instead of the