Skip to content
Quality control pipeline for RamDA-seq experiments.
Nextflow Jupyter Notebook Python R Julia Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Dockerfiles/bcl2fastq
QC_PE
QC_SE
R_QCplot
annotations
bamtools_scripts
collect_output_scripts
examples
img
tutorials
LICENSE
README.md

README.md

RamDAQ

RamDAQ is a computational pipeline for quality control (QC) of RamDA-seq experiments.

RamDA-seq is a single-cell total RNA sequencing method. Publication is here. The detailed protocol is here and kit is available from Toyobo.

RamDAQ excutes preprocessing and analysis steps on RamDA-seq data and generates a QC report (See this example).

Requirements

Getting started

1. Installing Nextflow

  1. Make sure 8 or later is installed on your computer by using the command: java -version
  2. Enter the below commands in your terminal (The command creates a file nextflow in ~/bin)
mkdir -p ~/bin
cd ~/bin
wget -qO- https://get.nextflow.io | bash
  1. Run the classic Hello world by entering the following command: ~/bin/nextflow run hello

2. Installing Docker

  • For Mac and Windows users: Download installer from here.
  • For Linux users: Follow instructions for your platforms here.

3. Clone this repository

cd
git clone https://github.com/rikenbit/RamDAQ.git

4. Downloading example FASTQ files

cd $HOME/RamDAQ/examples/download_fastq_files
~/bin/nextflow run download-fastq.nf -c download-fastq.config
mkdir -p $HOME/RamDAQ_example/output_RamDA_human_NSC/human_NSC_001
mv output_download_fastq $HOME/RamDAQ_example/output_RamDA_human_NSC/human_NSC_001/01_fastq_files

For more information, see here.

5. Preparing annotation files

See Preparing human annotation files

6. Modifying config file

First, copy config file to the directory.

cd $HOME/RamDAQ_example
cp $HOME/RamDAQ/QC_SE/RamDAQ_SE_unstranded_human.config .

Then, modify RamDAQ_SE_unstranded_human.config using your favorite as follows:

project_id = "RamDA_human_NSC"
run_ids = [
            ["human_NSC_001"]
	    ]
maxReadLength = 50
minReadLength = 36

7. Running RamDAQ basic pipeline

cd $HOME/RamDAQ_example

~/bin/nextflow run ~/RamDAQ/QC_SE/02_ramdaQC_SE_fastqmcf_fastQC.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.02_ramdaQC_SE_fastqmcf_fastQC.html
~/bin/nextflow run ~/RamDAQ/QC_SE/03_ramdaQC_SE_hisat2.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.03_ramdaQC_SE_hisat2.html
~/bin/nextflow run ~/RamDAQ/QC_SE/04_ramdaQC_SE_RSeQC.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.04_ramdaQC_SE_RSeQC.html
~/bin/nextflow run ~/RamDAQ/QC_SE/06_ramdaQC_SE_featurecounts.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.06_ramdaQC_SE_RSeQC.html
~/bin/nextflow run ~/RamDAQ/QC_SE/07_ramdaQC_SE_createnotebook.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.07_ramdaQC_SE_createnotebook.html

Finally, you can get a QC report in html format under $HOME/RamDAQ_example/output_RamDA_human_NSC/human_NSC_001/${run_id}_notebook_SE_unstranded.html.

Usage

1. Preparing annotation files

2. Preparing input FASTQ files

  • Naming convention: The extensions must be .fastq.gz.
  • FASTQ files must be located in a single directory: my_favorite_path/output_${project_id}/${run_id}/01_fastq_files (my_favorite_path can be any path!)

For PE data

For PE FASTQ files, there exist various file naming conventions (e.g., *_R1.fastq.gz or *.R1.fastq.gz), making it difficult to parse file names. To avoid this diffucluty, users need to prepare a TSV-formatted file called 'samplelist.txt' (In the confing PE data, you will see fastq_filelist = 'samplelist.txt'). The 'samplelist.txt' is a TSV-formatted file with a header line consists of three columns (Sample_ID, Fastq1, Fastq2) and contains a FASTQ file pair (Read1 and Read2) in each line. See example.

Converting FASTQ files from a BCL file

See tutorial on bcl2fastq.

3. Modifying config file

The following section in *.config file should be changed.

  • project_id: (string)
  • run_ids: (string)
  • maxReadLength: (int) Maximum read length to be retaiend after read trimming.
    • Note: For Illumina sequencer data, the last nucleotide of sequenced read (In practice, if read length of your FASTQ file is 51, 76, or 101, you should set maxReadLength to 50, 75, or 101, respectively)
  • minReadLength: (int) Minimum read length to be retaiend after read trimming. We recommend to use the half of readLength.

4-1. Running RamDAQ basic pipeline on single-end (SE) data

# Move to the directory where your FASTQ files are contained in `my_favorite_path/output_${project_id}/${run_id}/01_fastq_files`
cd my_favorite_path

# Copy a config file (The example here is in case of SE)
cp ~/RamDAQ/RamDAQ_SE_unstranded_human.config .

# Rewrite RamDAQ_SE_unstranded_human.config (See instruction above)

# Run RamDAQ pipeline
~/bin/nextflow run ~/RamDAQ/QC_SE/02_ramdaQC_SE_fastqmcf_fastQC.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.02_ramdaQC_SE_fastqmcf_fastQC.html
~/bin/nextflow run ~/RamDAQ/QC_SE/03_ramdaQC_SE_hisat2.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.03_ramdaQC_SE_hisat2.html
~/bin/nextflow run ~/RamDAQ/QC_SE/04_ramdaQC_SE_RSeQC.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.04_ramdaQC_SE_RSeQC.html
~/bin/nextflow run ~/RamDAQ/QC_SE/06_ramdaQC_SE_featurecounts.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.06_ramdaQC_SE_featurecounts.html
~/bin/nextflow run ~/RamDAQ/QC_SE/07_ramdaQC_SE_createnotebook.nf -c RamDAQ_SE_unstranded_human.config -resume -with-report log.07_ramdaQC_SE_createnotebook.html

The output files are save in my_favorite_path/output_${project_id}.

4-2. Running RamDAQ basic pipeline on paired-end (PE) data

# Make a directory with your favorite name
mkdir my_favorite_path

# Move into the directory
cd my_favorite_path

# Copy a config file (The example here is in case of PE)
cp ~/RamDAQ/ramdaQC_PE_unstranded.config .

# Modify ramdaQC_PE_unstranded.config (See instruction above)

# Run RamDAQ pipeline
~/bin/nextflow run ~/RamDAQ/QC_PE/02_ramdaQC_PE_fastqmcf_fastQC.nf -c ramdaQC_PE_unstranded.config -resume -with-report log.2_ramdaQC_PE_fastqmcf_fastQC.html
~/bin/nextflow run ~/RamDAQ/QC_PE/03_ramdaQC_PE_hisat2.nf -c ramdaQC_PE_unstranded.config -resume -with-report log.3_ramdaQC_PE_hisat2.html
~/bin/nextflow run ~/RamDAQ/QC_PE/04_ramdaQC_PE_RSeQC.nf -c ramdaQC_PE_unstranded.config -resume -with-report log.4_ramdaQC_PE_RSeQC.html
~/bin/nextflow run ~/RamDAQ/QC_PE/06_ramdaQC_PE_featurecounts.nf -c ramdaQC_PE_unstranded.config -resume -with-report log.6_ramdaQC_PE_featurecounts.html
~/bin/nextflow run ~/RamDAQ/QC_PE/07_ramdaQC_PE_createnotebook.nf -c ramdaQC_PE_unstranded.config -resume -with-report log.7_ramdaQC_PE_createnotebook.html

The output files are save in my_favorite_path/output_${project_id}.

Exclude 'blacklist' cells/samples from notebook reports (Optional)

If you'd like to exclude some uninterested cells/samples (e.g., RT(-) cells or blanks samples) from notebook reports, all you need is to prepare a text file named exclude_samplelist.txt under the ${run_id}/ directory.

exclude_samplelist.txt is a list of cell/sample names to be excluded from notebook reports, where each line contains one cell/sample name/id.

What occurs in RamDAQ pipeline

RamDAQ basic pipeline

  1. Read trimming (fastq-mcf)
  2. Read mapping (HISAT2)
  3. Expression level quantification (featureCounts)
  4. Automatic reporting (Jupyter notebook)

RamDAQ optional pipelines

  • FASTQ file generation (bcl2fastq)
  • Expression level quantification (Sailfish)
  • High sensitivity rRNA quantification (HISAT2, featureCounts)
    • What is 'High sensitivity rRNA mapping' ?: See here

Computational time and resource usage

  • Machine: Linux-x64 / 24 CPU / Memory 660 GB
  • Data: RamDA-seq data (n=96) on human neural stem cells (NSC)
    • 1,999,153 reads/cell, 51 nt SE, GC% 44.56
  • Computational time
    • 02_ramdaQC_SE_fastqmcf_fastQC: 5m 39s
    • 03_ramdaQC_SE_hisat2: 25m 34s
    • 04_ramdaQC_SE_RSeQC: 10h 22m 7s
      • In this example 1 file takes 3hours, you can estimate this process by CPU number.
    • 06_ramdaQC_SE_featurecounts: 7m 49s
  • Memory usage
    • up to ~6 GB
  • Storage
    • around 300GB

Contact

  • Issues
  • Email: support-bit (at) riken (dot) jp

Maintainers

  • Mika Yoshimura
  • Haruka Ozaki
You can’t perform that action at this time.