miRA a micro RNA identification tool
Software and source code for the paper: Conservation-independent identification of novel miRNAs M. Evers, A. Dueck, G. Meister and J. C. Engelmann
How to install:
If you are running a recent 64bit Linux, or OSX Lion or later you can grab the miRA binary here:
You will still need gnuplot and latex installed for your system. You will also need the Varna binary: Varna Please place it into the same Folder the miRA binary is in.
Compiling from source
For an easy setup simply download the latest bundeled release archive: miRA-1.2.0.tar.gz
unpack it, using for example
tar -xvf miRA-1.2.0.tar.gz
Make sure your system supplies the following dependecies for miRA:
- a c compiler supporting the c99 standard
- a java virtual machine version 1.6+ (optional)
- a recent version of gnuplot (optional)
- a recent version of latex (optional)
NOTE: miRA will work without the optional dependencies but will skip some reporting features (creating plots etc.) if they are not available.
Compile it for your system with:
cd miRA-1.2.0 ./configure make
Optionally run the unit tests on your system with:
to check for correct behavior.
How to use:
The simplest and most common way to run miRA is to run the full Suite using the command:
./miRA full -c <configuration file> <input SAM file> <input FASTA file> <output directory>
Batching in version 1.2.0+ (beta)
If you are having memory problems use
./miRA batch -c <configuration file> <input SAM file> <input FASTA file> <output directory>
instead. It will split all files based on the chromosome (rname) and run miRA separately for each, only loading the essential parts into memory. This will reduce the memory footprint of miRA significantly, but will be slower.
You can test miRA with sample data provided in ./example/:
./miRA full -c example/sample_configuration.config example/sample_reads.sam example/sample_sequence.fasta example/sample_output/
You can also run only parts of miRA, it is seperated in 3 parts with distinct calls for each one:
|Clustering||generates a list of main expression contigs based on alignment data||cluster|
|Folding||fold rna sequences and calculate secondary structure information||fold|
|Coverage Testing||coverage based verification and reporting of micro rna candidates||coverage|
For additional help and usage information run:
./miRA <command> -h
where <command> is either "cluster" "fold" or "coverage"
After running miRA all result files will be created in the specified output directory. Depending on the configuration and the available external programs the following files will be created:
- a full pdf report for every microRNA candidate (requires latex)
- final_candidates.bed, a file containing location and properties of all candidates in the bed file format.
- final_candidaes.json, a file containing location and properties of all candidates in the json file format.
Additional comments and known issues
####SAM file format
It is important to make sure that the SAM file was generated by aligning reads to the same FASTA reference genome as the one that is used within miRA. In other words, all chromosome names found in the SAM file must have a matching entry in the FASTA reference genome.
miRA requires a SAM file that does not contain unmapped reads.
For converting and position-sorting a BAM to SAM file, run
samtools sort reads.bam sorted_reads samtools view -h sorted_reads.bam > sorted_reads.sam
To remove unmapped reads from a BAM file, and output a position sorted BAM file, run
samtools view -b -F 4 all_reads.bam | samtools sort - > sorted_mapped_reads
To remove unmapped reads from a SAM file, and output a position-sorted SAM file, run
samtools view -hS -F 4 all_reads.sam | samtools sort - > sorted_mapped_reads
If you are having this issue, consider updating to miRA 1.2.0+ and running miRA batch instead
miRA stores miRNA candidates that passed the folding and read coverage-based verification steps in memory until the generation of the final reports. The memory footprint of miRA therefore depends on the number of validated candidates.
Under certain conditions, miRA may crash with an error
ERROR: initialize_Lfold: argument must be greater 0
The error is almost always associated with an out-of-memory issue, which may be the result of e.g. running miRA on a desktop/notebook computer with little RAM on deep sequencing data resulting in many candidates, and/or using a set of relaxed, non-stringent filtering parameters.