Gimme: A lightweight reference-guided transcripts assembler.
The program is developed in laboratory of genomics, evolution and development (GED lab), Michigan State University.
Web site http://ged.msu.edu.
Author Likit Preeyanon, email@example.com
Copyright and license
The prgram is Copyright Michigan State University. The code is freely available for use and re-use under GNU GPL license. See LICENSE.txt or http://www.gnu.org/licenses/.
Gimme is unpublished. A manuscript is in preparation.
Source code is available at https://github.com/ged-lab/gimme.git.
Run python setup.py install in the main directory to download and install required packages.
Gimme should be able to run on any platform with Python 2.7 interpreter.
You can simply run
python ./src/gimme.py <input file>
Gimme can read an input file in PSL or BED format. Use gff2bed.py in utils directory to convert GFF file to BED file.
Note, Gimme currently ignores strandedness of a transcript. All predicted gene models are in positive strand. Strandedness will be supported in the next release.
Output is written to standard output in BED format, which can be visualized on UCSC genome browser or other browsers.
By default, gene models built by Gimme contain a minimum number of isoforms. Use --max or -x to force Gimme to report a maximum number of isoforms. You can also use a script in utils to find a minimum set of transcripts. See Utilities for more detail.
Assemble transcripts from sample data
python ./src/gimme.py sample_data/sample.psl > sample.bed
Obtain a maximum number of isoforms
python ./src/gimme.py -x sample_data/sample.psl > sample.max.bed
Run Gimme with multiple input files
python ./src/gimme.py sample1.psl sample2.psl sample3.psl > sample.all.bed
Run Gimme with user defined parameters
python ./src/gimme.py --min_utr=200 --max_intron=100000 --gap_size=15 sample.psl > sample.all.bed
See a program's help
python ./src/gimme.py -h or --help
GAP_SIZE, --gap_size=50 Introns smaller than GAP_SIZE) are filled to construct a more complete exon.
MAX_INTRON, --max_intron=300000 The maximum intron size (bp) allowed. A transcript is split into smaller parts if it contains an intron longer than MAX_INTRON.
MIN_UTR, --min_utr=100 Alternative UTRs smaller than MIN_UTR are merged to overlapping exons.
MIN_TRANSCRIPT_LEN, --min_transcript_len=300 The minimum length (bp) for multiple exon transcript.
MIN_SINGLE_EXON_LEN, --min_single_exon_len=500 The minimum length (bp) for a single exon gene.
MAX_ISOFORMS, --max_isoforms=20 The maximum number of isoforms allowed without -x option. Gimme searches for a minimum number of isoforms if the maximum number exceeds MAX_ISOFORMS.
-x, --max Tell Gimme to search for report all putative isoforms.
--debug Run Gimme with parameters set for debugging.
-v, --version Print out a version number.
-h, --help Print out a help message.
Run nosetests in the main directory to run all tests.
Gimme contains many useful utilities that work with PSL, BED and SAM format. Some programs are useful for building gene models. Others are useful for working with reads, assembly sequences etc.