Skip to content

iRAP single library

Nuno Fonseca edited this page Dec 20, 2016 · 4 revisions

iRAP single library

irap_single_lib is a wrapper to the main iRAP script that is designed to process a single RNAseq library. Only a subset of the analysis supported by iRAP may be carried out using irap_single_lib, namely alignment and gene/exon/transcript level quantification.

Usage can be obtained by running

irap_single_lib -h

The required parameters are the fastq file(s) to process (.fastq.gz or .fastq) and a minimal iRAP configuration/control file containing information about the reference genome (fasta file), gene models (GTF file), and methods to use (check iRAPs parameters in the Wiki).

The genome and gene models should be placed in a folder structure as described in here and exemplified here.

Once all files are in place - the genome, annotation, and a iRAP configuration/control file - irap_single_lib should be executed once with the -0 option to bootstrap the analysis - to create the genome indexes and other files that only need to be generated once, independently of the number of libraries processed with a given control file.

irap_single_lib -c irap.conf.file -o selected_output_folder -0

Once the boostrap is complete the library(ies) can be processed by passing the (relative) path to the FASTQ files. For instance, for a paired-end library one could run

irap_single_lib -1 lib_1.fastq.gz -2 lib_2.fastq.gz -c irap.conf.file -o selected_output_folder

The resulting output files are placed in the directory selected_output_folder.

Example

Start by downloading the data and creating the configuration file has described in this example.

To simplify, assuming that the control file is in $IRAP_DIR folder, we first bootstrap by running irap_single_lib with the -0 option as follows:

cd $IRAP_DIR irap_single_lib -c ecoli_example.conf -0 -o output_folder

To process one of the libraries downloaded in the examples, e.g., SRR933983.fastq.gz, the following command would be used:

irap_single_lib -1 data/reference/ecoli_k12/SRR933991.fastq.gz -c ecoli_example.conf -o output_folder

The output folder should contain the following files

output_folder/raw_data/ecoli_k12/SRR933983.cmd
output_folder/raw_data/ecoli_k12/SRR933983.complete
output_folder/raw_data/ecoli_k12/SRR933983.cram
output_folder/raw_data/ecoli_k12/SRR933983.cram.md5
output_folder/raw_data/ecoli_k12/SRR933983.data_info.tsv
output_folder/raw_data/ecoli_k12/SRR933983.fastq.gz.info
output_folder/raw_data/ecoli_k12/SRR933983.f.fastqc.tsv
output_folder/raw_data/ecoli_k12/SRR933983.se.genes.raw.htseq2.tsv
output_folder/raw_data/ecoli_k12/SRR933983.se.hits.bam.gene.stats
output_folder/raw_data/ecoli_k12/SRR933983.se.hits.bam.stats
output_folder/raw_data/ecoli_k12/SRR933983.se.hits.bam.stats.csv
output_folder/raw_data/ecoli_k12/SRR933983.time
output_folder/raw_data/ecoli_k12/SRR933983.versions.tsv