Skip to content

URL is de-encoded during resume, leading to pipeline failures and re-staging of files #6109

@prototaxites

Description

@prototaxites

Bug report

This is a slightly strange one. I'm encountering a problem where I feed Nextflow a URL which has been encoded - in this case, https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2%237.sub.cram where a # is encoded as %23. This URL is not available in the un-encoded format.

The pipeline runs fine the first time, but upon resuming it de-encodes the URL (subsituting %23 for a #, tries to stage the file with this name, and fails. This behaviour does not happen if I remove the file() call in the initial input channel (but in my real case, I want to run checkIfExists ASAP).

Expected behavior and actual behavior

Actual behaviour: resume fails as Nextflow tries to stage the wrong remote path
Expected behaviour: resume should work and both processes should be cached.

Steps to reproduce the problem

process TEST {
	input:
	val file

	output:
	tuple val(file), val(integer), emit: output

	exec:
	integer = 1
}

process TEST2 {
	input:
	tuple path(file), val(integer)

	script:
	"""
	echo ${file.getName()} > ${integer}.txt
	"""
}

workflow {
	ch_input = Channel.of(file("https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2%237.sub.cram"))

	TEST(ch_input)
	TEST2(TEST.out.output)
}

Program output

Initial run:

> nextflow run main.nf  -ansi-log false

N E X T F L O W  ~  version 25.04.2
Launching `main.nf` [curious_elion] DSL2 - revision: a4a37a61e7
[fe/ed5463] Submitted process > TEST (1)
Staging foreign file: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2%237.sub.cram
[d2/8204f8] Submitted process > TEST2 (1)

Resumed run:

> nextflow run main.nf  -ansi-log false -resume           (base)

N E X T F L O W  ~  version 25.04.2
Launching `main.nf` [tender_hodgkin] DSL2 - revision: a4a37a61e7
[fe/ed5463] Cached process > TEST (1)
Staging foreign file: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram
WARN: Unable to stage foreign file: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram (try 1 of 3) -- Cause: Unable to access path: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram
WARN: Unable to stage foreign file: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram (try 2 of 3) -- Cause: Unable to access path: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram
WARN: Unable to stage foreign file: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram (try 3 of 3) -- Cause: Unable to access path: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram
ERROR ~ Error executing process > 'TEST2 (1)'

Caused by:
  Can't stage file https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram -- reason: Unable to access path: https://tolit.cog.sanger.ac.uk/test-data/Undibacterium_unclassified/genomic_data/baUndUnlc1/hic-arima2/41741_2#7.sub.cram


Command executed:

  echo 41741_2#7.sub.cram > 1.txt

Command exit status:
  -

Command output:
  (empty)

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Environment

  • Nextflow version: 25.04.2.5947
  • Java version: openjdk 17.0.10 2024-01-16
  • Operating system: macOS - but can replicate on Linux

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions