Parasite Symbiont Transcriptome Separation
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 15 commits ahead, 27 commits behind jueshengong:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README
psytrans.py

README

NAME
       psytrans.py Parasite & Symbiont Transcriptome Separation

SYNOPSIS
       python psytrans.py [QUERIES] [-H FILE] [-S FILE] [OPTIONS]
       python psytrans.py [QUERIES] [-b BLASTRESULTSFILE] [OPTIONS]

DESCRIPTION
       psytrans.py separates the sequences of a host species from those of its main symbiont(s) or parasite(s) based on Support Vector Machine classification.
       The program takes as input a file in fasta format with the sequences to be classified.
       The program also requires a file with sequences of a species related to the host, and a file with sequences related to the symbiont (or parasite).
       The queries will be compared to these two files using BLASTX.
       Alternatively, the user can provide the output of pre-computed BLASTX searches (in tabular format: -outfmt 6 or 7).
       The classification is then carried out using the command line tools from libsvm.

DEPENDENCIES
       psytrans requires makeblastdb and blastx from the NCBI blast+ distribution, unless the user provides pre-computed blast results.
       psytrans also requires a few command line utilities from libsvm: svm-scale, svm-train and svm-predict

OPTIONS
   Generic Program Information

       -h, --help
              Print a usage message briefly summarizing the command-line options.

   Global options

       -R, --restart
              Restart the script from the last checkpoint.

       -p, --nbThreads
              Number of threads to use for the blast searches and for the SVM training.

       -V, --verbosemode
              Runs the script in verbose mode.

       -t, --tempDir
              Specify the name of the temporary directory.

       -X, --clearTemp
              Clears all temporary data in the temporary directory upon completion.

       -z, --stopAfter
              Choices:['db','runBlast','parseBlast','kmers','SVM']
              This option allows the user to choose whether the process should stop, once the process has completed a specific stage.
              db refers to the database creation stage;
              runBlast refers to the BLAST search stage;
              parseBlast refers to the separation of unambiguous and ambiguous sequence stage;
              kmers refers to the preparation of SVM input stage;
              SVM refers to the SVM training and testing stage.

   Preparation of training set options

       -e, --maxBestEvalue
              Set the maximum value for the best e-value to be used to classify unambiguous sequences.

       -n, --numberOfSeq
              Set the maximum number of training & testing sequences.

   Kmer parameters

       -c, --minWordSize
              Set the minimum value of DNA word length.

       -k, --maxWordSize
              Set the maximum value of DNA word length.

EXAMPLE
       You start with an assembly containing a mixture of sequences from a host A and a symbiont (or parasite) B: host_and_symb.fasta
       You also provide a file with proteins from a species related to the host: related_host_proteins.fasta
       and a file with proteins from a species related to the symbiont: related_symb_proteins.fasta
       You can then start the Psytrans process as follows:

              python psytrans.py host_and_symb.fasta  -H related_host_proteins.fasta -S related_symb_proteins.fasta

       Wait a few hours (depending on the number of sequences), and in the current directory you will find a new file starting with the prefix `host_' and another file starting with the prefix `symb_', corresponding to the sequences classified as host or symbiont (or parasite) respectively.

       To run the program using 8 threads and using /home/user/tmp as a temporary directory, use the following command:

              python psytrans.py host_and_symb.fasta  -H related_host_proteins.fasta -S related_symb_proteins.fasta -p 8 -t /home/user/tmp

AUTHOR
       Written by Sylvain Forêt and Jue-Sheng Ong.

REPORTING BUGS
       Report bugs at sylvain.foret@anu.edu.au
       Psytrans repository <https://github.com/sylvainforet/psytrans>

COPYRIGHT
       Copyright © 2014 Sylvain Forêt & Jue-Sheng Ong.

       psytrans  is a free  software and comes with ABSOLUTELY NO WARRANTY.  You are welcome to redistribute it under the terms of the GNU General Public License
       versions 3 or later.  For more information about these matters see http://www.gnu.org/licenses/.