Skip to content
mihaux edited this page Nov 16, 2020 · 5 revisions

NOTE: the wiki is currently under development.

Analysis steps:

  1. Quality control on raw fastq files (using FastQC)
  2. Trimming (using Trimmomatic)
  3. Quality control on trimmed fastq files (using FastQC)
  4. Read alignment (using STAR, bowtie2 can be worth trying out)
  5. BAM manipulation [indexing] (using samtools)
  6. Marking duplicates (using Picard Tools)
  7. Obtain read counts (using Cufflinks or featureCounts)

NOTICE: Cufflinks turned out to be dedicated for transcript discovery, therefore a new software "featureCounts" was proposed

Running example

All the steps were ran on a set of 41 samples.

/1_quality_control          => 145 M
/2_trimming                 => 381 G
/3_quality_control_trimmed  => 422 M
/4_alignment                => 111 G
/5_counting                 =>  15 G

Files to be backuped TO BE UPDATED:

=> all arc_files folders (from each directory)
=> /1_quality_control/report
=> /3_quality_control_trimmed/report
=> /4_alignment/Log.out
=> /4_alignment/[sample_name]_Log.final.out (for all samples)
=> /5_counting (all???)

COPIED FROM LOCAL README FILE

Files to be transferred from Arc after each analysis:

/1_quality_control/report

/2_trimming [NO FILES TO BE TRANSFERRED]

/3_quality_control_trimmed/report

/4_alignment/bam/[sample_ID]_Log.final.out (for each sample)

/5_featureCounts/processed/all .csv samples

NOTICE: all arc_files should be transferred to arc home directory (/home/home02/ummz/arc_records/analyses/)

Folder sizes of /nobackup/ummz/analyses/run_IV_Feb20 (TOTAL: 671G)

=> 1_quality_control: 61M

=> /2_trimming: /single-end: 175G /paired-end: 344G

=> /3_quality_control_trimmed: /single-end: 35M /paired-end: 144M

=> /4_alignment: /single-end: 50G /paired-end: /paired: 104G /unpaired: 2.7M

=> /5_featureCounts: /single-end: 18M /paired-end: 26M

Clone this wiki locally