Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
A set of Python command-line programs getting additional statistics about microsatellites or simple sequence repeats (SSR) from MISA output. http://www.nenno.it/PySSRstat
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
PySSRstat ========= Version 1.0, 2015-08-31 PySSRstat is a set of command line programs that use the output of a Perl script called MISA , that must be executed first. MISA generates two output files. One file ending in "*.misa" is from now on called "MISA-file" and the one ending in "*.statistics" called "MISA-statistics-file". The output files generated by MISA are used by the programs of PySSRstat for further analysis about the distribution or repeats, to find the longest repeats and to filter the MISA-file by repeat length (minimum and maximum) and optional by border for primer selection. PySSRstat was written for and used in Galasso and Ponzoni (2015). The programs of the PySSRstat are written by Mario Nenno in the Python 3.4 language and tested on a PC with Core i5-4570, 8 GB RAM and MS Windows 8.1. All programs of PySSRstat are copyright 2015 by Mario Nenno and distributed under the terms of the Revised BSD License. For more details see the file LICENSE.txt. Set 1) For distribution of repeats and find longest repeats =========================================================== Program Description ------------------------------------------------------------------ statistics_misa.py Extract additional statistical data form the MISA-statistics-file and MISA-file Input: - MISA-file - MISA-statistics-file - optional parameter -rpc for SSR repeat classes (experimental) Output: repeats_analysis.txt statgetlongest.py Find accessions of the longest repeats Input: - repeats_analysis.txt - MISA-file Output: longest-sequences-list.txt Set 2) Filter MISA-file by repeat length and border for primer selection ======================================================================== Program Description ------------------------------------------------------------------------------- filterrepeatsmisa.py Filter the MISA file by minimum and maximum repeat length Input: - MISA-file - <minimum> - <maximum> Output: filtered-repeats-sequence-list.txt getsequences.py Extract the previously filtered accessions from database file. Optionally filter for a border of n bp up- and downstream of microsatellite Input: - filtered-repeats-sequence-list.txt - db file with original sequences in FASTA format [-b nnn] (optional) Output: - index.txt - getsequences-info.txt - repeats-sequences.fas repeats-sequences-border.fas (if with border option) [- border.txt] 3) Others ========= imperfect.py Statistics about imperfect repeats in MISA-file (experimental) Input: - MISA-file Output: - imperfect.txt format_border.py Format the file border.txt with spaces or tabs Input: - border file (output of getsequences.py) - 'tab' or 'space' as column delimiter OPTIONAL PARAMTER -idt or --idtrunc to truncat id at the first underscore character Ouput: - border-space.txt (if space delimited) - boder-tab.txt (if tab delimited) Flow of data and programs ========================= a) Repeat analysis, distribution, longest: MISA-file MISA-statistics-file | ! [statistics_misa.py] -> repeats_analysis.txt MISA-file | | [statgetlongest.py] -> longest-sequences-list.txt b) For Primer selection: MISA-file | | [filterrepeatsmisa.py] -> filtered-repeats-sequence-list.txt db file with original sequences | | [getsequences.py] -> getsequences-info.txt repeats-sequences.fas or repeats-sequences-border.fas border.txt Note: The file index.txt is for internal use and helps to speed up the extraction of accessions for the sequence db file References ==========  Thiel T., Michalek W., Varshney R., Graner A. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical and Applied Genetics 106: 411-422. Link: http://pgrc.ipk-gatersleben.de/misa/.  Galasso, I. and Ponzoni, E. (2015) In Silico Exploration of Cannabis sativa L. Genome for Simple Sequence Repeats (SSRs). American Journal of Plant Sciences, 6, 3244-3250. doi: 10.4236/ajps.2015.619315 Link: http://www.scirp.org/Journal/PaperInformation.aspx?PaperID=62020 Fulltext: http://dx.doi.org/10.4236/ajps.2015.619315