Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running wf-transcriptomes on HPC using sbatch #43

Closed
cea295933 opened this issue Nov 10, 2023 · 5 comments
Closed

running wf-transcriptomes on HPC using sbatch #43

cea295933 opened this issue Nov 10, 2023 · 5 comments
Labels
question Further information is requested

Comments

@cea295933
Copy link

Ask away!

Hi,
I'm trying to run wf-transcriptomes on our HPC and have questions about the best way to take advantage of multiple cpus and multiple threads. I had been using the --ntasks-per-node option but have switched to the --cpues-per-task option. Does this make a meaningful difference and/or should I use them together? We also have multiple nodes. Would it be helpful to request more than one node? For reference, one node has 2 sockets, 32 CPUs per socket, and 2 threads per CPU core. That gives 128 CPUs per node. Can wf-transcriptomes take advantage of all of this, and if so, what is the best way to do so? I am attaching below two separate sbatch scripts. One requests 1 node and 64 cpus-per-task, whereas the other simply requests --exclusive and --mem=MaxMemPerNode

Thanks!

sbatch script one

#!/bin/bash
#SBATCH -J Aitken_epi2me_20231110_poly
#SBATCH -o Aitken_epi2me_20231110_poly.out
#SBATCH --nodes=1
#SBATCH --cpus-per-task=64
#SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace
--fastq /work/caitken/data/DegronNanoporeSequencing/Poly
--ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa
--ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff
--transcriptome_source reference-guided
--ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa
--de_analysis
--sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesPoly.csv
--out_dir /work/caitken/data/DegronNanoporeSequencing/outputPoly
-c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

sbatch script 2

#!/bin/bash
#SBATCH -J Aitken_epi2me_20231110_total
#SBATCH -o Aitken_epi2me_20231110_total.out
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --mem=MaxMemPerNode
#SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace
--fastq /work/caitken/data/DegronNanoporeSequencing/Total
--ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa
--ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff
--transcriptome_source reference-guided
--ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa
--de_analysis
--sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesTotal.csv
--out_dir /work/caitken/data/DegronNanoporeSequencing/outputTotal
-c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

@cea295933 cea295933 added the question Further information is requested label Nov 10, 2023
@cea295933
Copy link
Author

circling back ... I can get this to run but it crashes during the DE analysis. I receive an error saying the Salmon needs to be upgraded:

ERROR ~ Error executing process > 'pipeline:differential_expression:count_transcripts (1)'

Caused by:
Process pipeline:differential_expression:count_transcripts (1) terminated with an error exit status (1)

Command executed:

salmon quant --noErrorModel -p "4" -t "ammended.ref_transcriptome" -l SF -a "WTpoly1_reads_aln_sorted.bam" -o counts
mv counts/quant.sf "WTpoly1.transcript_counts.tsv"
seqkit bam "WTpoly1_reads_aln_sorted.bam" 2> "WTpoly1.seqkit.stats"

Command exit status:
1

Command output:
(empty)

Command error:
Version Info: ### PLEASE UPGRADE SALMON ###

A newer version of salmon with important bug fixes and improvements is available.

The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.

Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe

salmon (alignment-based) v1.9.0

[ program ] => salmon

[ command ] => quant

[ noErrorModel ] => { }

[ threads ] => { 4 }

[ targets ] => { ammended.ref_transcriptome }

[ libType ] => { SF }

[ alignments ] => { WTpoly1_reads_aln_sorted.bam }

[ output ] => { counts }

Logs will be written to counts/logs
[2023-11-10 20:38:24.291] [jointLog] [info] setting maxHashResizeThreads to 4
[2023-11-10 20:38:24.291] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
Library format { type:single end, relative orientation:none, strandedness:sense }
[2023-11-10 20:38:24.293] [jointLog] [info] numQuantThreads = 2
parseThreads = 2
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "WTpoly1_reads_aln_sorted.bam", fasta = "ammended.ref_transcriptome" . . .done

@cea295933
Copy link
Author

update: I can now get this to run if (1) I skip the DE analysis and generate a reference-guide transcript and (2) run the DE analysis separately using a precomputed transcriptome. So the issue then appears to be using the reference-guided transcriptome in the DE analysis. I would really appreciate some guiding getting this to work. There are not great S. cerevisiae reference transcriptomes, and so I would love to use the one I generate via the workflow (or am I not understanding correctly how the pipeline works?). The underlying goal of this analysis is to (1) compare the transcriptome I observe in these samples to existing and (2) generate read counts for each isoform and mRNA to perform DE analysis (either via the workflow here or using DeSEQ2 on my own in R).

@cea295933
Copy link
Author

I think this is resolved: I was supplying a reference transcriptome while asking to run the the reference-guided version ... removing the reference transcriptome seems to resolve this issue (though I am now encountering another). But I will post a separate issue for that

@cjw85
Copy link
Contributor

cjw85 commented Nov 15, 2023

Hi @cea295933,

Please do open a new issue.

@cea295933
Copy link
Author

cea295933 commented Nov 15, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants