GitHub - sylvainforet/psytrans: Parasite Symbiont Transcriptome Separation

sylvainforet / psytrans Public

forked from jueshengong/psytrans

Notifications You must be signed in to change notification settings
Fork 1
Star 5

Parasite Symbiont Transcriptome Separation

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
README		README
psytrans.py		psytrans.py

Repository files navigation

NAME
psytrans.py Parasite & Symbiont Transcriptome Separation

SYNOPSIS
python psytrans.py [QUERIES] [-H FILE] [-S FILE] [OPTIONS]
python psytrans.py [QUERIES] [-b BLASTRESULTSFILE] [OPTIONS]

DESCRIPTION
psytrans.py separates the sequences of a host species from those of its main symbiont(s) or parasite(s) based on Support Vector Machine classification.
The program takes as input a file in fasta format with the sequences to be classified.
The program also requires a file with sequences of a species related to the host, and a file with sequences related to the symbiont (or parasite).
The queries will be compared to these two files using BLASTX.
Alternatively, the user can provide the output of pre-computed BLASTX searches (in tabular format: -outfmt 6 or 7).
The classification is then carried out using the command line tools from libsvm.

DEPENDENCIES
psytrans requires makeblastdb and blastx from the NCBI blast+ distribution, unless the user provides pre-computed blast results.
psytrans also requires a few command line utilities from libsvm: svm-scale, svm-train and svm-predict

OPTIONS
Generic Program Information

-h, --help
Print a usage message briefly summarizing the command-line options.

Global options

-R, --restart
Restart the script from the last checkpoint.

-p, --nbThreads
Number of threads to use for the blast searches and for the SVM training.

-V, --verbosemode
Runs the script in verbose mode.

-t, --tempDir
Specify the name of the temporary directory.

-X, --clearTemp
Clears all temporary data in the temporary directory upon completion.

-z, --stopAfter
Choices:['db','runBlast','parseBlast','kmers','SVM']
This option allows the user to choose whether the process should stop, once the process has completed a specific stage.
db refers to the database creation stage;
runBlast refers to the BLAST search stage;
parseBlast refers to the separation of unambiguous and ambiguous sequence stage;
kmers refers to the preparation of SVM input stage;
SVM refers to the SVM training and testing stage.

Preparation of training set options

-e, --maxBestEvalue
Set the maximum value for the best e-value to be used to classify unambiguous sequences.

-n, --numberOfSeq
Set the maximum number of training & testing sequences.

Kmer parameters

-c, --minWordSize
Set the minimum value of DNA word length.

-k, --maxWordSize
Set the maximum value of DNA word length.

EXAMPLE
You start with an assembly containing a mixture of sequences from a host A and a symbiont (or parasite) B: host_and_symb.fasta
You also provide a file with proteins from a species related to the host: related_host_proteins.fasta
and a file with proteins from a species related to the symbiont: related_symb_proteins.fasta
You can then start the Psytrans process as follows:

python psytrans.py host_and_symb.fasta -H related_host_proteins.fasta -S related_symb_proteins.fasta

Wait a few hours (depending on the number of sequences), and in the current directory you will find a new file starting with the prefix `host_' and another file starting with the prefix `symb_', corresponding to the sequences classified as host or symbiont (or parasite) respectively.

To run the program using 8 threads and using /home/user/tmp as a temporary directory, use the following command:

python psytrans.py host_and_symb.fasta -H related_host_proteins.fasta -S related_symb_proteins.fasta -p 8 -t /home/user/tmp

AUTHOR
Written by Sylvain Forêt and Jue-Sheng Ong.

REPORTING BUGS
Report bugs at sylvain.foret@anu.edu.au
Psytrans repository <https://github.com/sylvainforet/psytrans>

psytrans is a free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License
versions 3 or later. For more information about these matters see http://www.gnu.org/licenses/.