Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

SPEctRA RNAseq Pipeline:


###Getting Started:###

  1. git clone ""
  2. cd /SPEctRA/src/
  3. edit config_template.yaml to set paths to your respective home and genome directories. Scratch space usage is recommended.
  4. Create input file to for mapping run using pipeline_start_template.yaml as a basis for required parameters.

######Pipeline Setup: Please refer to this example configuration YAML file:

  • The Environment header corresponds to the linux shell environment where you are submitting pipeline-generated jobs, which could either be a cluster or a single server.

This would look as follows for a cluster:

           cluster: Minerva

OR for a local/remote server:

           server: local
  • The project_directory header specifies the absolute path to a directory where you wish to save all your pipeline-run tasks. Each task will be separated in subdirectories outlined in the Job execution file

  • For cluster environments The following Short-read_aligners are supported: tophat and STAR. Each of the corresponding subheadings must be specified with the respective module name or path. Example:

  • tophat2: tophat/2.0.12

  • bowtie2: bowtie2/2.1.0 Bowtie module must be specified with Tophat

  • STAR: rna-star/2.3.0e

    To execute SPEctRA locally, please ensure that tophat2, bowtie2, and samtools are added to your PATH.

  • The genomes header outlines paths for genomic reference and annotation files for mapping and QC. As long as the following subheader hiercarchy is adhered, the pipeline can support any built genome for tophat and STAR short-read aligners. The key subheading is the organism name. For example, for a mouse genome, the following YAML structure is as follows:

           rRNApath: /scratch/purusi01/Mus_musculus/Ensembl/NCBIM37/Annotation/Genes/rRNA.bed
            gtf: /scratch/purusi01/Mus_musculus/Ensembl/NCBIM37/Annotation/Genes/genes.gtf
            index: /scratch/purusi01/Mus_musculus/Ensembl/NCBIM37/Sequence/Bowtie2Index/genome
             path: /scratch/purusi01/mm9_star
  • Please provide absolute paths to rRNA bed file, gtf and genome index files (for tophat) and STAR genome to rRNApath,gtf,index(under tophat2 subheading) and STAR``path respectively (note: Mapping rates to exonic, intronic, intragenic, and intergenic features are not yet supported)

######Pipeline Execution Please refer to Job execution YAML file

  • project_Name serves as an identification for the specific analysis (for example: RNAseq_mouse_case_vs_control) and will point to a created directory within the project_directory path set in the configuration YAML file:

  • mapping sets up the pipeline for genome alignment. Please provide the following data in the subheadings only:

  • fastQ_directory_path is simply the directory where your fastq files are stored. Note: data provided by the sequencing core follows a strict protocol. It is as follows:

    • Project_Name > Sample_Name > Sample_Name_R1.fq, Sample_Name_R2.fq
  • proc is the number of processors required (integer)

  • aligner refers to the desired short-read aligner to be used. Maps back to tophat and STAR in the configuration YAML file:

  • genome refers back to the organism name in the config file, and specifically to the built genome corresponding to the short-read aligner chosen.

  • strand: (leave blank for now. Paired-end support is currently being tested. Leaving strand blank will default to "fr-unstranded" in tophat for single-end reads.

  • An example pipeline execution file is as follows:

              project_Name: minerva_test
                    fastQ_directory_path: /scratch/purusi01/test_fastq_pipeline/
                    proc: 20
                    aligner: tophat2
                    genome: mouse

Once these paramenters are specified in detail, the pipeline is ready to run.


     python ./src/ -p {config file}.yaml


A Scalable Pipeline for RNA‬‬‬-seq Ana‬‬lysis



No packages published