Skip to content
Nuno Fonseca edited this page Jun 1, 2018 · 12 revisions

Intoduction

  • Experiment: E-GEOD-48829

  • Species: Escherichia coli

  • Assumptions: iRAP was installed and configured.

  • All analysis should take less than two hours to run.

Setup the data

  • Create the directories to hold the data in the data sub-folders

    mkdir -p $IRAP_DIR/data/reference/ecoli_k12
    mkdir -p $IRAP_DIR/data/raw_data/ecoli_k12
  • Put the genome and annotation files in the respective species folder

    cd $IRAP_DIR/data/reference/ecoli_k12
    wget -c ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/fasta/bacteria_122_collection/escherichia_coli_k_12_gca_000981485/dna/Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.dna.chromosome.I.fa.gz
    wget -c ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/gtf/bacteria_122_collection/escherichia_coli_k_12_gca_000981485/Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf.gz
    gunzip -f Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf.gz
    # remove the header from the GTF
    cat Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf | grep -v "^#" > tmp && mv tmp Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf
  • Put the FASTQ files in the respective species raw_data folder

    cd $IRAP_DIR/data/raw_data/ecoli_k12
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933983/SRR933983.fastq.gz
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933984/SRR933984.fastq.gz
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933985/SRR933985.fastq.gz
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933989/SRR933989.fastq.gz
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933990/SRR933990.fastq.gz
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933991/SRR933991.fastq.gz
    #  (only some of them to keep the example small)
    #wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933986/SRR933986.fastq.gz
    #wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933988/SRR933988.fastq.gz

Configuration/control file

  • Create iRAP’s experiment configuration/control file

    # experiment name
    name=ecoli_ex
    # species
    species=ecoli_k12
    # reference genome
    reference=Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.dna.chromosome.I.fa.gz
    # gtf file
    gtf_file=Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf
    #
    user_trans=auto
    # Enable filtering based on quality
    qual_filtering=on
    # Use a contamination data set to filter out reads
    cont_index=no
    # Toplevel directory with the data
    data_dir=$(IRAP_DIR)/data
    mapper=bowtie2
    # some contrasts...
    # GA=Group A
    contrasts=GAvsGB GBvsGA
    GAvsGB=GA GB
    GBvsGA=GB GA
    GA=FA FB FC
    GB=FD FE
    se=FA FB FC FD FE
    FA=SRR933983.fastq.gz
    FA_rs=50
    FA_qual=33
    FB=SRR933984.fastq.gz
    FB_rs=50
    FB_qual=33
    FC=SRR933985.fastq.gz
    FC_rs=50
    FC_qual=33
    FD=SRR933989.fastq.gz
    FD_rs=50
    FD_qual=33
    FE=SRR933990.fastq.gz
    FE_rs=50
    FE_qual=33
    FF=SRR933990.fastq.gz
    FF_rs=50
    FF_qual=33

Running iRAP

  • Its assumed that the configuration file was named ecoli_example.conf

  • Dryrun to validate the configuration file and see all commands that will be executed (-n option)

irap conf=ecoli_example.conf mapper=hisat2 de_method=deseq max_threads=8 -n

  • Run iRAP to process the experiment

irap conf=ecoli_example.conf mapper=hisat2 de_method=deseq max_threads=8

  • Output files

    • Filtered FASTQ files

ls ecoli_ex/irap_qc/*.f.fastq.gz

  • Bam files

ls ecoli_ex/irap_qc/hisat2/*.bam

  • Gene level quantification

ls ecoli_ex/irap_qc/hisat2/htseq2/genes.raw.htseq2.tsv

  • Transcript/isoform level quantification

ls ecoli_ex/irap_qc/hisat2/htseq2/transcripts.raw.htseq2.tsv

  • Exon level quantification

ls ecoli_ex/irap_qc/hisat2/htseq2/exons.raw.htseq2.tsv

  • Differential expression

ls ecoli_ex/irap_qc/hisat2/htseq2/deseq/*.genes_de.tsv

  • Run iRAP to process the experiment with a different set of methods

irap conf=ecoli_example.conf mapper=hisat2 quant_method=htseq2 de_method=deseq2 max_threads=8

irap conf=ecoli_example.conf mapper=none quant_method=kallisto de_method=deseq2 max_threads=8