Derep Seqs

Dereplicate looooooong sequences!

If you want to get rid of duplicate long sequences (i.e. contigs that are exact substrings of some other contigs), derep_seqs is the tool for you!

Install

Download the source code (either with git clone or by downloading a release), cd into the source directory, and then use make to build it.

git clone https://github.com/mooreryan/derep_seqs.git
cd derep_seqs
make

This will install derep_seqs to the bin directory in the source directory. You can now move derep_seqs and sort_fasta to somewhere on your path if you'd like.

Usage

derep_seqs <num worker threads> <seqs.fasta> > seqs.derep.fa

Example

The fasta file must be sorted by increasing sequence length. The program sort_fasta (included in the bin directory) will do this for you.

$ bin/derep_seqs 10 <(bin/sort_fasta contigs.fasta) > contigs.derep.fa

That's it!

Error codes

0: Success
1: Argument error
2: Couldn't open a file
3: Error creating thread
4: Error joining thread

Versions

v0.1.0: First release
v0.2.0: Sort on decreasing seq length. Use greedy algorithm. Prefilter. Use hash3 instead of SSEF.
v0.3.0: Use hashing for prefiltering.
v0.4.0: Don't store hash vals...uses way less memory :) but it's slow again :(
v0.5.0: Use pthreads for multithreading!
v0.6.0: Make prefilter length a tunable option
v0.7.0: Use Rabin-Karp search for filtering

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bin		bin
src		src
test_files		test_files
.gitignore		.gitignore
COPYING		COPYING
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Derep Seqs

Install

Usage

Example

Error codes

Versions

About

Releases

Packages

Languages

License

mooreryan/derep_seqs

Folders and files

Latest commit

History

Repository files navigation

Derep Seqs

Install

Usage

Example

Error codes

Versions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages