These scripts process RNA-seq data using the Subread pipeline. The scripts are designed to be used with the Slurm scheduler on the RCC Midway2 cluster at the University of Chicago, but can be adapted to other computing infrastructure.
-
Install the R/Bioconductor packages biomaRt and Rsubread:
source("https://bioconductor.org/biocLite.R") biocLite(c("biomaRt", "Rsubread"))
-
Clone this repository with
git clone
or download and unzip this zip file.
The scripts should be submitted in the following order. Always run them from the head node in the root of the project directory.
-
Download and index the genome with
download-genome.R
:sbatch --mem=12G --partition=broadwl download-genome.R
-
Download and format the exons with
download-exons.R
:sbatch --mem=2G --partition=broadwl download-exons.R
-
Submit mapping jobs with
submit-subread.sh
, which callsrun-subread.R
for each fastq file:bash submit-subread.sh
-
Submit counting jobs with
submit-subread.sh
, which callsrun-featurecounts.R
for each BAM file:bash submit-subread.sh
These scripts expect the following directory structure. Only fastq/
has to be created manually. The scripts will create bam/
, counts/
,
and genome/
.
.
├── bam
├── counts
├── fastq
│ ├── YG-172S-S8-8_S125_L006_R1_001.fastq.gz
│ └── YG-172S-S8-9_S126_L006_R1_001.fastq.gz
└─── genome
-
fastq/
- Contains raw data infastq.gz
files. -
bam/
- Contains the mapped reads in BAM files. -
counts/
- Contains the read counts in text files. -
genome/
- Contains the index genome and the exons coordinates in SAF format.
The code is available under the CC0 license, i.e. you are free
to do whatever you like with this code, but it comes with no
guarantees. See the file LICENSE
for full details.