Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX is launched multiple times and fails #1003

Closed
JohannesKersting opened this issue Apr 21, 2023 · 6 comments
Labels
bug Something isn't working
Milestone

Comments

@JohannesKersting
Copy link

Description of the bug

As can be seen in the screenshot down below, the pipeline lists and executes the task FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX six times, which is the number of samples in the sample sheet used. The pipeline fails in the same step. Additionally, some instances of the task are not cleared from the slurm queue after the pipeline fails.

Running everything with -r 3.10.1 works and FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX runs only once.
process

Command used and terminal output

nextflow run nf-core/rnaseq -r 3.11.1 \
-c nextflow.config -profile singularity,slurm \
-resume \
--fasta /nfs/data2/GeneSurge/GeneSurgePipeline/resources/star/genome.fa \
--gtf /nfs/data2/GeneSurge/GeneSurgePipeline/resources/star/genome.gtf \
--star_index /nfs/data2/GeneSurge/GeneSurgePipeline/resources/star/index \
--input sample_sheet.csv \
--save_unaligned \
--outdir results

Execution cancelled -- Finishing pending tasks before exit                                     
-[nf-core/rnaseq] Pipeline completed with errors-                                              
Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX (genome.transcripts.fa)'                                                                               

Caused by:                                                                                     
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX (genome.transcripts.fa)` terminated with an error exit status (1)                                                      

Command executed:                                                                              

  grep '^>' genome.fa | cut -d ' ' -f 1 | cut -d $'\t' -f 1 > decoys.txt                       
  sed -i.bak -e 's/>//g' decoys.txt                                                            
  cat genome.transcripts.fa genome.fa > gentrome.fa                                            
                                                                                               
  salmon \                                                                                     
      index \                                                                                  
      --threads 6 \                                                                                                                                                                           
      -t gentrome.fa \                                                                         
      -d decoys.txt \                                                                          
       \                                                                                       
      -i salmon                                                                                
                                                                                               
  cat <<-END_VERSIONS > versions.yml                                                           
  "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX":                               
      salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")                              
  END_VERSIONS                                                                                 

Command exit status:                                                                           
  1                                                                                            

Command output:                                                                                
  (empty)                                                                                      

Work dir:                                                                                      
  /nfs/data2/GeneSurge/15mer_nf-core_rnaseq/work/fa/b786a33df34e7319bd22e696bdb511             

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

relevant_files.zip

System information

  • Nextflow version: 22.10.6.5843
  • Hardware: HPC
  • Executor: slurm
  • Container engine: Singularity
  • OS: Linux Ubuntu 22.04.1 LTS
  • Version of nf-core/rnaseq: 3.11.1
@JohannesKersting JohannesKersting added the bug Something isn't working label Apr 21, 2023
@EmmanuelLabaronne
Copy link

Hi,
I have similar issue, which I think is related to this one, althought I do not have exactly same error message :

Error executing process > 'NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX (genome.transcripts.fa)'

Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX (genome.transcripts.fa)` terminated with an error exit status (1)

Command executed:

  grep '^>' genome.fa | cut -d ' ' -f 1 | cut -d $'\t' -f 1 > decoys.txt
  sed -i.bak -e 's/>//g' decoys.txt
  cat genome.transcripts.fa genome.fa > gentrome.fa
  
  salmon \
      index \
      --threads 6 \
      -t gentrome.fa \
      -d decoys.txt \
       \
      -i salmon
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX":
      salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Version Server Response: Not Found
  index ["salmon"] did not previously exist  . . . creating it
  [2023-04-21 08:14:02.228] [jLog] [info] building index
  out : salmon
  [2023-04-21 08:14:02.231] [puff::index::jointLog] [info] Running fixFasta
  
  [Step 1 of 4] : counting k-mers
  [2023-04-21 08:14:13.976] [puff::index::jointLog] [warning] Entry with header [rna74458], had length less than equal to the k-mer length of 31 (perhaps after poly-A clipping)
  
  [2023-04-21 08:14:14.653] [puff::index::jointLog] [warning] Removed 150 transcripts that were sequence duplicates of indexed transcripts.
  [2023-04-21 08:14:14.653] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
  [2023-04-21 08:14:14.653] [puff::index::jointLog] [critical] The decoy file contained the names of 195 decoy sequences, but 0 were matched by sequences in the reference file provided. To prevent unintentional errors downstream, please ensure that the decoy file exactly matches with the fasta file that is being indexed.
  [2023-04-21 08:14:14.690] [puff::index::jointLog] [error] The fixFasta phase failed with exit code 1

Work dir:
  /home/emmanuel/pepkon/work/3a/4f6c3e84cc6753389d3f9d1c88c136

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Here are the command that I use :

nextflow run nf-core/rnaseq -r 3.11.1\
    --input data/samplesheet.csv  \
    --outdir results \
    --genome GRCh38 \
    --trimmer fastp\
    --aligner star_rsem\
    -resume\
    -profile docker

@lextallan
Copy link

Same issue, sometime with exit code 1 and sometimes 137. Hard to pinpoint what error even is, only things I see in .command.err and .command.log are what I assume to be unrelated warnings:

ln: failed to create symbolic link 'genome.fa': File exists
INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred

@robsyme robsyme mentioned this issue Apr 22, 2023
9 tasks
@robsyme
Copy link
Contributor

robsyme commented Apr 22, 2023

The problem lies in the invocation of the FASTQ_SUBSAMPLE_FQ_SALMON subworkflow. It is expecting to be provided with a value channel, but instead we provide a queue channel. This is fixed with #1006 by passing the ch_genome_fasta channel through the .first() operator. This ensures that even if multiple samples are provided with auto strandedness, the index process only runs once.

The fix above is made directly to the rnaseq workflow, but that does not preclude modifications to the subworkflow to make it a little bit more defensive.

@drpatelh drpatelh added this to the 3.11.2 milestone Apr 24, 2023
@drpatelh
Copy link
Member

Thanks for reporting @JohannesKersting @lextallan @EmmanuelLabaronne and for the fix @robsyme !

#1006 has been merged into dev so be great if you can test with the commands below to confirm everything is working. I will create a patch release in the next couple of days.

nextflow pull nf-core/rnaseq -r dev
nextflow run nf-core/rnaseq <OTHER_PARAMETERS> -r dev

@EmmanuelLabaronne
Copy link

Hi,
@drpatelh : It works for me !

thank you all !

@drpatelh
Copy link
Member

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants