Hey there :)
I thought I’d give the nf-core methylseq workflow a try, partly out of curiosity.
While monitoring the run, I noticed that the file mut_1_1_val_1.fq.gz.temp.2_bismark_bt2_pe.bam is growing by only about 7.5 million entries per hour. Based on the FastQC report, each paired-end sample has roughly 300 million read pairs (around 600 million reads total per sample), so at this rate the alignment step alone seems like it will take a very long time. In this pilot run, I also already have a second sample waiting behind it.
So I was wondering which parameter or setting would be the most relevant one to adjust in order to speed this up.
For context, I already modified the config because my workstation has only about 60 GB free RAM, 24 total threads, and a little over 1 TB free SSD space.
Would you say that ~7.5 million entries per hour for that BAM file is still within an okay-ish range for Bismark in this setup, or does that sound unusually slow?
I started the run with:
nextflow run nf-core/methylseq -r 4.2.0
-profile docker
-c /media/chuddy/heiglthomas/nf/resource_override.config
--input /media/chuddy/heiglthomas/nf/samplesheet_bismark_pilot.csv
--aligner bismark
--em_seq
--fasta /media/chuddy/linux_990pro/bioinfo/refs/mouse_39/genome/GRCm39.noAlt.fa
--outdir /media/chuddy/heiglthomas/nf/results_bismark_pilot
-work-dir /home/chuddy/bioinformatics/nf_work/methylseq_emseq_bismark_work
-resume
and my config file:
process {
withName: 'NFCORE_METHYLSEQ:FASTA_INDEX_METHYLSEQ:BISMARK_GENOMEPREPARATION_BOWTIE' {
cpus = 12
memory = 32.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:TRIMGALORE' {
cpus = 4
memory = 24.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_ALIGN' {
cpus = 10
memory = 28.GB
time = '96h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_DEDUPLICATE' {
cpus = 4
memory = 16.GB
time = '48h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:SAMTOOLS_SORT' {
cpus = 4
memory = 16.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_METHYLATIONEXTRACTOR' {
cpus = 6
memory = 20.GB
time = '72h'
}
}
One caveat: the FASTQ files are currently located on an external SSD but IO should still be good enough and i use a usb-c port.
Best,
Hey there :)
I thought I’d give the nf-core methylseq workflow a try, partly out of curiosity.
While monitoring the run, I noticed that the file mut_1_1_val_1.fq.gz.temp.2_bismark_bt2_pe.bam is growing by only about 7.5 million entries per hour. Based on the FastQC report, each paired-end sample has roughly 300 million read pairs (around 600 million reads total per sample), so at this rate the alignment step alone seems like it will take a very long time. In this pilot run, I also already have a second sample waiting behind it.
So I was wondering which parameter or setting would be the most relevant one to adjust in order to speed this up.
For context, I already modified the config because my workstation has only about 60 GB free RAM, 24 total threads, and a little over 1 TB free SSD space.
Would you say that ~7.5 million entries per hour for that BAM file is still within an okay-ish range for Bismark in this setup, or does that sound unusually slow?
I started the run with:
nextflow run nf-core/methylseq -r 4.2.0
-profile docker
-c /media/chuddy/heiglthomas/nf/resource_override.config
--input /media/chuddy/heiglthomas/nf/samplesheet_bismark_pilot.csv
--aligner bismark
--em_seq
--fasta /media/chuddy/linux_990pro/bioinfo/refs/mouse_39/genome/GRCm39.noAlt.fa
--outdir /media/chuddy/heiglthomas/nf/results_bismark_pilot
-work-dir /home/chuddy/bioinformatics/nf_work/methylseq_emseq_bismark_work
-resume
and my config file:
process {
withName: 'NFCORE_METHYLSEQ:FASTA_INDEX_METHYLSEQ:BISMARK_GENOMEPREPARATION_BOWTIE' {
cpus = 12
memory = 32.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:TRIMGALORE' {
cpus = 4
memory = 24.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_ALIGN' {
cpus = 10
memory = 28.GB
time = '96h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_DEDUPLICATE' {
cpus = 4
memory = 16.GB
time = '48h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:SAMTOOLS_SORT' {
cpus = 4
memory = 16.GB
time = '24h'
}
withName: 'NFCORE_METHYLSEQ:METHYLSEQ:FASTQ_ALIGN_DEDUP_BISMARK:BISMARK_METHYLATIONEXTRACTOR' {
cpus = 6
memory = 20.GB
time = '72h'
}
}
One caveat: the FASTQ files are currently located on an external SSD but IO should still be good enough and i use a usb-c port.
Best,