running wf-transcriptomes on HPC using sbatch #43

cea295933 · 2023-11-10T19:06:40Z

Ask away!

Hi,
I'm trying to run wf-transcriptomes on our HPC and have questions about the best way to take advantage of multiple cpus and multiple threads. I had been using the --ntasks-per-node option but have switched to the --cpues-per-task option. Does this make a meaningful difference and/or should I use them together? We also have multiple nodes. Would it be helpful to request more than one node? For reference, one node has 2 sockets, 32 CPUs per socket, and 2 threads per CPU core. That gives 128 CPUs per node. Can wf-transcriptomes take advantage of all of this, and if so, what is the best way to do so? I am attaching below two separate sbatch scripts. One requests 1 node and 64 cpus-per-task, whereas the other simply requests --exclusive and --mem=MaxMemPerNode

Thanks!

sbatch script one

#!/bin/bash
#SBATCH -J Aitken_epi2me_20231110_poly
#SBATCH -o Aitken_epi2me_20231110_poly.out
#SBATCH --nodes=1
#SBATCH --cpus-per-task=64
#SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace
--fastq /work/caitken/data/DegronNanoporeSequencing/Poly
--ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa
--ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff
--transcriptome_source reference-guided
--ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa
--de_analysis
--sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesPoly.csv
--out_dir /work/caitken/data/DegronNanoporeSequencing/outputPoly
-c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

sbatch script 2

#!/bin/bash
#SBATCH -J Aitken_epi2me_20231110_total
#SBATCH -o Aitken_epi2me_20231110_total.out
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --mem=MaxMemPerNode
#SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace
--fastq /work/caitken/data/DegronNanoporeSequencing/Total
--ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa
--ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff
--transcriptome_source reference-guided
--ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa
--de_analysis
--sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesTotal.csv
--out_dir /work/caitken/data/DegronNanoporeSequencing/outputTotal
-c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

cea295933 · 2023-11-10T21:55:00Z

circling back ... I can get this to run but it crashes during the DE analysis. I receive an error saying the Salmon needs to be upgraded:

ERROR ~ Error executing process > 'pipeline:differential_expression:count_transcripts (1)'

Caused by:
Process pipeline:differential_expression:count_transcripts (1) terminated with an error exit status (1)

Command executed:

salmon quant --noErrorModel -p "4" -t "ammended.ref_transcriptome" -l SF -a "WTpoly1_reads_aln_sorted.bam" -o counts
mv counts/quant.sf "WTpoly1.transcript_counts.tsv"
seqkit bam "WTpoly1_reads_aln_sorted.bam" 2> "WTpoly1.seqkit.stats"

Command exit status:
1

Command output:
(empty)

Command error:
Version Info: ### PLEASE UPGRADE SALMON ###

A newer version of salmon with important bug fixes and improvements is available.

The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.

Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe

salmon (alignment-based) v1.9.0

[ program ] => salmon

[ command ] => quant

[ noErrorModel ] => { }

[ threads ] => { 4 }

[ targets ] => { ammended.ref_transcriptome }

[ libType ] => { SF }

[ alignments ] => { WTpoly1_reads_aln_sorted.bam }

[ output ] => { counts }

Logs will be written to counts/logs
[2023-11-10 20:38:24.291] [jointLog] [info] setting maxHashResizeThreads to 4
[2023-11-10 20:38:24.291] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
Library format { type:single end, relative orientation:none, strandedness:sense }
[2023-11-10 20:38:24.293] [jointLog] [info] numQuantThreads = 2
parseThreads = 2
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "WTpoly1_reads_aln_sorted.bam", fasta = "ammended.ref_transcriptome" . . .done

cea295933 · 2023-11-14T18:18:34Z

update: I can now get this to run if (1) I skip the DE analysis and generate a reference-guide transcript and (2) run the DE analysis separately using a precomputed transcriptome. So the issue then appears to be using the reference-guided transcriptome in the DE analysis. I would really appreciate some guiding getting this to work. There are not great S. cerevisiae reference transcriptomes, and so I would love to use the one I generate via the workflow (or am I not understanding correctly how the pipeline works?). The underlying goal of this analysis is to (1) compare the transcriptome I observe in these samples to existing and (2) generate read counts for each isoform and mRNA to perform DE analysis (either via the workflow here or using DeSEQ2 on my own in R).

cea295933 · 2023-11-15T17:05:22Z

I think this is resolved: I was supplying a reference transcriptome while asking to run the the reference-guided version ... removing the reference transcriptome seems to resolve this issue (though I am now encountering another). But I will post a separate issue for that

cjw85 · 2023-11-15T17:10:30Z

Hi @cea295933,

Please do open a new issue.

cea295933 · 2023-11-15T17:15:21Z

Just did … (#45) … thanks! Colin Echeverría Aitken Assistant Professor Biology Department Biochemistry Program Vassar College ***@***.*** ***@***.***> 845.437.7430

…

On Nov 15, 2023, at 12:10 PM, Chris Wright ***@***.***> wrote: Hi @cea295933 <https://github.com/cea295933>, Please do open a new issue. — Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BD2SWUZIX2G2C35WTNNP33TYETZRDAVCNFSM6AAAAAA7GS24NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJSHEZTIMZUGA>. You are receiving this because you were mentioned.

cea295933 added the question Further information is requested label Nov 10, 2023

cea295933 closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running wf-transcriptomes on HPC using sbatch #43

running wf-transcriptomes on HPC using sbatch #43

cea295933 commented Nov 10, 2023

cea295933 commented Nov 10, 2023

cea295933 commented Nov 14, 2023

cea295933 commented Nov 15, 2023

cjw85 commented Nov 15, 2023

cea295933 commented Nov 15, 2023 via email

running wf-transcriptomes on HPC using sbatch #43

running wf-transcriptomes on HPC using sbatch #43

Comments

cea295933 commented Nov 10, 2023

Ask away!

cea295933 commented Nov 10, 2023

A newer version of salmon with important bug fixes and improvements is available.

salmon (alignment-based) v1.9.0

[ program ] => salmon

[ command ] => quant

[ noErrorModel ] => { }

[ threads ] => { 4 }

[ targets ] => { ammended.ref_transcriptome }

[ libType ] => { SF }

[ alignments ] => { WTpoly1_reads_aln_sorted.bam }

[ output ] => { counts }

cea295933 commented Nov 14, 2023

cea295933 commented Nov 15, 2023

cjw85 commented Nov 15, 2023

cea295933 commented Nov 15, 2023 via email