-
Notifications
You must be signed in to change notification settings - Fork 0
Home
NOTE: the wiki is currently under development.
- Quality control on raw fastq files (using FastQC)
- Trimming (using Trimmomatic)
- Quality control on trimmed fastq files (using FastQC)
- Read alignment (using STAR, bowtie2 can be worth trying out)
- BAM manipulation [indexing] (using samtools)
- Marking duplicates (using Picard Tools)
- Obtain read counts (using Cufflinks or featureCounts)
NOTICE: Cufflinks turned out to be dedicated for transcript discovery, therefore a new software "featureCounts" was proposed
All the steps were ran on a set of 41 samples.
/1_quality_control => 145 M
/2_trimming => 381 G
/3_quality_control_trimmed => 422 M
/4_alignment => 111 G
/5_counting => 15 G
Files to be backuped TO BE UPDATED:
=> all arc_files folders (from each directory)
=> /1_quality_control/report
=> /3_quality_control_trimmed/report
=> /4_alignment/Log.out
=> /4_alignment/[sample_name]_Log.final.out (for all samples)
=> /5_counting (all???)
Files to be transferred from Arc after each analysis:
/1_quality_control/report
/2_trimming [NO FILES TO BE TRANSFERRED]
/3_quality_control_trimmed/report
/4_alignment/bam/[sample_ID]_Log.final.out (for each sample)
/5_featureCounts/processed/all .csv samples
NOTICE: all arc_files should be transferred to arc home directory (/home/home02/ummz/arc_records/analyses/)
Folder sizes of /nobackup/ummz/analyses/run_IV_Feb20 (TOTAL: 671G)
=> 1_quality_control: 61M
=> /2_trimming: /single-end: 175G /paired-end: 344G
=> /3_quality_control_trimmed: /single-end: 35M /paired-end: 144M
=> /4_alignment: /single-end: 50G /paired-end: /paired: 104G /unpaired: 2.7M
=> /5_featureCounts: /single-end: 18M /paired-end: 26M