RNA-seq Scientific Workflow

Workflow for RNA sequencing using the Parallel Scripting Library - Parsl.

Reference: Cruz, L., Coelho, M., Terra, R., Carvalho, D., Gadelha, L., Osthoff, C., & Ocaña, K. (2021). Workflows Científicos de RNA-Seq em Ambientes Distribuídos de Alto Desempenho: Otimização de Desempenho e Análises de Dados de Expressão Diferencial de Genes. In Anais do XV Brazilian e-Science Workshop, p. 57-64. Porto Alegre: SBC. DOI: https://doi.org/10.5753/bresci.2021.15789

Requirements

Python >= 3.8.2
- Parsl == 1.0.0)
- HTSeq == 0.13.5
Bowtie2 == 2.3
Samtools == 1.10
Picard == 2.18
R >= 3.5.2
- DESeq2 >= 1.22.2

How to install requirements

In order to use RNA-seq Workflow the following tools must be available:

Python (version >= 3.8.2)

ParslRNA-Seq was tested on Python, version 3.8.2.

Parsl - Parallel Scripting Library (version 1.0.0)

The recommended way to install Parsl is the suggest approach from Parsl's documentation:

python3 -m pip install parsl

Bowtie2, version 2.3

You can install Bowtie2 by running:

bowtie2-2.3.5.1-linux-x86_64.zip

Or

sudo yum install bowtie2-2.3.5-linux-x86_64

Samtools, version 1.10

Samtools is a suite of programs for interacting with high-throughput sequencing data.

Picard, version 2.18

Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.

HTSeq, version 0.13.5

HTSeq is a native Python library that folows conventions of many Python packages. You can install it by running:

pip install HTSeq

HTSeq uses NumPy, Pysam and matplotlib. Be sure this tools are installed.

R, version 3.5.2

To use DESEq2 script make sure R language is also installed. You can install it by running:

sudo apt install r-base

Running the workflow

First of all, make a Comma Separated Values (CSV) file. So, onto the first line type: sampleName,fileName,condition. Remember, there must be no spaces between items. You can use the file "table.csv" in this repository as an example. Your CSV file will be like this:

sampleName	fileName	condition
tissue control 1	SRR5445794.merge.count	control
tissue control 2	SRR5445795.merge.count	control
tissue control 3	SRR5445796.merge.count	control
tissue wntup 1	SRR5445797.merge.count	wntup
tissue wntup 2	SRR5445798.merge.count	wntup
tissue wntup 3	SRR5445799.merge.count	wntup

The list of command line arguments passed to Python script, beyond the script's name, must be:

The indexed genome;
The number of threads or splitted files for bowtie, sort, split and htseq tasks;
Path to read fastaq file, which is the path of the input files;
Directory's name where the output files must be placed;
GTF file;
and, lastly the DESeq script.

Make sure all the files necessary to run the workflow are in the same directory and the fastaq files in a dedicated folder, as a input directory. The command line will be like this:

python3 rna-seq.py ../mm9/mm9 24 ../inputs/ ../outputs ../Mus_musculus.NCBIM37.67.gtf ../DESeq.R

Remember to adjust the parameter multithreaded and multicore according with your computational environment. Example: If your machine has 8 cores, you should set the parameter on 8.

Running the workflow in DOCKER

ParslRNA-Seq is also available on docker. You can push it from DockerHub, running the following command:

docker pull lucruzz/parslrna-seq

To run it, create a directory on the host machine with the following hierarchy of directories and mount them in the container:

inputs
- input files
outputs
table.csv
gtf
- file.gtf
genomic_base
- genomic-base-files

To run ParslRNA-Seq in the container, run the following command and keep monitoring the outputs directory:

$ sudo docker run -d -v diretorio_maquina_hospedeira:/workdir -e
$ RNASEQ_TABLE_CSV=/workdir/table.csv -e
$ RNASEQ_GENETIC_BASE=/workdir/base_genetica/prefixo_arquivo -e
$ RNASEQ_NUM_THREADS=4 -e RNASEQ_INPUTS=/workdir/inputs/ -e
$ RNASEQ_OUTPUTS=/workdir/outputs/ -e RNASEQ_GTF=/workdir/arquivo.gtf
$ rnaseq:1.0 /RNA-seq/rna-seq.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RNA-seq Scientific Workflow

Requirements

How to install requirements

Running the workflow

Running the workflow in DOCKER

Files

README.md

Latest commit

History

README.md

File metadata and controls

RNA-seq Scientific Workflow

Requirements

How to install requirements

Running the workflow

Running the workflow in DOCKER