Prot-SpaM

Note

Currently this program only supports Linux.

Compilation

cd into the root directory (containing the 'Makefile') and type:

make

Run

./protspam [options] -l <filelist>

Filelist

The program takes a plain text file containing the relative paths to each input dataset. To create your 'filelist' simply type:

ls -1 path/to/input/* > filelist

This will list each file in specified directory, one file per line.

Options

	-h/-?: print this help and exit
	-w <integer>: pattern weight (default 6)
	-d <integer>: number of don't-care positions (default 40)
	-s <integer>: the minimum score of a spaced-word match to be considered homologous (default: 0)
	-m <integer>: number of patterns used (default 5)
	-t <integer>: number of threads (default: omp_get_max_threads() )
	-o <filename>: filename for distance matrix (default: DMat)
	-l <filename>: specify a list of files to read as input
	-z : if option is set, the pattern set used will be stored in patterns.txt"
	-p <filename>: filename of pattern set to load and reuse"

Sequence format:

Sequence must be in FASTA format. All protein sequences of one proteome must be contained in one FASTA file.

Example:

>Protein1
RAKSDLKEASDKE..
>Protein2
ATSDLAGTASDKE..
>Protein3
ARNCQEFGSDSDW..
..

Citation:

Scientific publications using filtered spaced word matches should cite:

Leimeister, C. A., Schellhorn, J., Schoebel, S., Gerth, M., Bleidorn, C., & Morgenstern, B. (2018).
Prot-SpaM: Fast alignment-free phylogeny reconstruction based on whole-proteome sequences.
bioRxiv, 306142.

Paper Abstract:

Word-based or "alignment-free" sequence comparison has become an active area of research in bioinformatics.
Recently, fast word-based algorithms have been proposed that are able to accurately estimate phylogenetic
distances between genomic DNA sequences without the need to calculate full sequence alignments. One of these
approaches is Filtered Spaced Word Matches. Herein, we extend this approach to estimate evolutionary distances
between species based on their complete or incomplete proteomes; our implementation is called Prot-SpaM.
We show that Prot-SpaM can accurately estimate phylogenetic distances, and that our program can be used to
calculate phylogenetic trees from whole proteomes in a matter of seconds.
For various groups of taxa, we show that trees calculated with Prot-SpaM are of high quality.
The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM

Datasets:

You can download the datasets referenced in the paper here.

Contact:

jendrik.schellhorn@stud.uni-goettingen.de

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
include		include
src		src
.gitignore		.gitignore
COPYING		COPYING
Makefile		Makefile
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prot-SpaM

Note

Compilation

Run

Filelist

Options

Sequence format:

Citation:

Paper Abstract:

Datasets:

Contact:

About

Uh oh!

Releases

Packages

Languages

License

jschellh/ProtSpaM

Folders and files

Latest commit

History

Repository files navigation

Prot-SpaM

Note

Compilation

Run

Filelist

Options

Sequence format:

Citation:

Paper Abstract:

Datasets:

Contact:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages