Replies: 1 comment 1 reply
-
It may be that your process is expecting a specific filename, and the output of the previous process doesn't match the filename. This is not uncommon on nf-core pipelines, and is usually solved with something like the snippet below in a configuration file: process {
withName: PICARD_MARKDUPLICATES {
ext.prefix = { "output_${meta.id}" }
}
} A first step should be to check the task dir of the failed task and see what input files are linked there, if any. If you can share a minimal reproducible example or a publicly available pipeline for me to check, I can try to be reproduce the issue on my side and work on a solution 😄 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am running a variant calling pipeline using GATK4 used as a containerized nextflow script for analyzing BGI sequencing data, which I am submitting as a Slurm job to HPC. I set my input to the reads I want analyzed, the workflow begins and completes step 1 but fails step 2 saying that the _aligned_reads.sam file does not exist, which is the output of step 1. The process/error is below:
executor > local (2)
[7b/56c3d5] process > align (1) [100%] 1 of 1 ✔
[83/d89167] process > markDuplicatesSpark (1) [ 0%] 0 of 1
[- ] process > getMetrics -
[- ] process > haplotypeCaller -
[- ] process > selectVariants -
[- ] process > filterSnps -
[- ] process > filterIndels -
[- ] process > bqsr -
[- ] process > analyzeCovariates -
[- ] process > snpEff -
[- ] process > qc -
Error executing process > 'markDuplicatesSpark (1)'
Caused by:
Process
markDuplicatesSpark (1)
terminated with an error exit status (2)Command executed:
mkdir -p /scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/
gatk --java-options "-Djava.io.tmpdir=/scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/" MarkDuplicatesSpark -I _aligned_reads.sam -M _dedup_metrics.txt -O _sorted_dedup.bam
rm -r /scratch/projects/oleksyk-lab/gatk4/gatk_temp/furious_hamilton/
Command exit status:
2
Command output:
(empty)
Command error:
18:17:56.068 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@51e0f2eb{/api,null,AVAILABLE,@spark}
18:17:56.069 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@aa794a3{/jobs/job/kill,null,AVAILABLE,@spark}
18:17:56.069 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@22cb8e5f{/stages/stage/kill,null,AVAILABLE,@spark}
18:17:56.072 INFO ContextHandler - Started o.s.j.s.ServletContextHandler@5ca8c904{/metrics/json,null,AVAILABLE,@spark}
18:17:56.076 INFO MarkDuplicatesSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
18:17:56.118 INFO GoogleHadoopFileSystemBase - GHFS version: 1.9.4-hadoop3
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
18:17:56.286 INFO MemoryStore - Block broadcast_0 stored as values in memory (estimated size 1540.3 KiB, free 17.8 GiB)
18:17:56.593 INFO MemoryStore - Block broadcast_0_piece0 stored as bytes in memory (estimated size 68.4 KiB, free 17.8 GiB)
18:17:56.596 INFO BlockManagerInfo - Added broadcast_0_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.599 INFO SparkContext - Created broadcast 0 from broadcast at SamSource.java:78
18:17:56.719 INFO MemoryStore - Block broadcast_1 stored as values in memory (estimated size 188.3 KiB, free 17.8 GiB)
18:17:56.741 INFO MemoryStore - Block broadcast_1_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 17.8 GiB)
18:17:56.742 INFO BlockManagerInfo - Added broadcast_1_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.742 INFO SparkContext - Created broadcast 1 from newAPIHadoopFile at SamSource.java:108
18:17:56.833 INFO BlockManagerInfo - Removed broadcast_1_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.837 INFO BlockManagerInfo - Removed broadcast_0_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 68.4 KiB, free: 17.8 GiB)
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
WARNING 2023-10-09 18:17:56 SamReaderFactory Unable to detect file format from input URL or stream, assuming SAM format.
18:17:56.903 INFO MemoryStore - Block broadcast_2 stored as values in memory (estimated size 1540.3 KiB, free 17.8 GiB)
18:17:56.912 INFO MemoryStore - Block broadcast_2_piece0 stored as bytes in memory (estimated size 68.4 KiB, free 17.8 GiB)
18:17:56.913 INFO BlockManagerInfo - Added broadcast_2_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.914 INFO SparkContext - Created broadcast 2 from broadcast at SamSource.java:78
18:17:56.917 INFO MemoryStore - Block broadcast_3 stored as values in memory (estimated size 188.3 KiB, free 17.8 GiB)
18:17:56.927 INFO MemoryStore - Block broadcast_3_piece0 stored as bytes in memory (estimated size 41.8 KiB, free 17.8 GiB)
18:17:56.928 INFO BlockManagerInfo - Added broadcast_3_piece0 in memory on hpc-compute-p36.cm.cluster:44093 (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.928 INFO SparkContext - Created broadcast 3 from newAPIHadoopFile at SamSource.java:108
18:17:56.974 INFO BlockManagerInfo - Removed broadcast_2_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 68.4 KiB, free: 17.8 GiB)
18:17:56.977 INFO BlockManagerInfo - Removed broadcast_3_piece0 on hpc-compute-p36.cm.cluster:44093 in memory (size: 41.8 KiB, free: 17.8 GiB)
18:17:56.978 INFO AbstractConnector - Stopped Spark@5cb6966{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
18:17:56.981 INFO SparkUI - Stopped Spark web UI at http://hpc-compute-p36.cm.cluster:4040
18:17:56.989 INFO MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
18:17:57.004 INFO MemoryStore - MemoryStore cleared
18:17:57.004 INFO BlockManager - BlockManager stopped
18:17:57.006 INFO BlockManagerMaster - BlockManagerMaster stopped
18:17:57.008 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
18:17:57.016 INFO SparkContext - Successfully stopped SparkContext
18:17:57.016 INFO MarkDuplicatesSpark - Shutting down engine
[October 9, 2023 at 6:17:57 PM EDT] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=285212672
A USER ERROR has occurred: Failed to load reads from _aligned_reads.sam
Caused by:Input path does not exist: file:_aligned_reads.sam
More Info on what I'm running is below:
Config File:
// Required Parameters
params.reads = "/projects/oleksyk-lab/Kenneth/Golden_Standard/BGI/{E150016531_L01_75_1.fq.gz,E150016531_L01_75_2.fq.gz}"
params.ref = "/projects/oleksyk-lab/Kenneth/Golden_Standard/References/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta"
params.outdir = "/scratch/projects/oleksyk-lab/gatk4"
params.snpeff_db = "GRCh38.105"
params.pl = "bgi"
params.pm = "dnbseq"
// Set the Nextflow Working Directory
// By default this gets set to params.outdir + '/nextflow_work_dir'
workDir = params.outdir + '/nextflow_work_dir'
Slurm Script (dsl1):
module load bwa
module load GATK
export NXF_VER=22.10.7
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
source activate nf-env
nextflow run main.nf -c goldstandardnextflow.config
I cannot find anyone with this error and I'm very confused as to why I am receiving it. Any help is greatly appreciated!!
Beta Was this translation helpful? Give feedback.
All reactions