Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUNZIP module is confused by repeated patterns in sample name and fasta path. #367

Closed
m3hdad opened this issue Apr 26, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@m3hdad
Copy link

m3hdad commented Apr 26, 2024

Description of the bug

A couple of months ago we had a slack thread about how NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA gets confused with meta.id and gz file path if sample names are repeated along full path to fasta file.

The topic is discussed here on slack.
Fix: Changing sample names solves the problem.

ERROR ~ Error executing process > 'NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA ([GCA_001438805.1.fna.gz, GCA_001438805.1_ASM143880v1_genomic.fna.gz])'

Caused by:
  Missing output file(s) `GCA_001438805.1.fna GCA_001438805.1_ASM143880v1_genomic.fna.gz` expected by process `NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA ([GCA_001438805.1.fna.gz, GCA_001438805.1_ASM143880v1_genomic.fna.gz])`

Command executed:

  # Not calling gunzip itself because it creates files
  # with the original group ownership rather than the
  # default one for that user / the work directory
  gzip \
      -cd \
       \
      GCA_001438805.1.fna.gz GCA_001438805.1_ASM143880v1_genomic.fna.gz \
      > GCA_001438805.1.fna GCA_001438805.1_ASM143880v1_genomic.fna.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_PYRODIGAL_FNA":
      gunzip: $(echo $(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*$//')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/test/.work/e1/6e71e01659781bea8f6deb48144838

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ Failed to invoke `workflow.onComplete` event handler

 -- Check script '/home/.nextflow/assets/nf-core/funcscan/./workflows/funcscan.nf' at line: 314 or see '.nextflow.log' file for more details

The input file which resulted in this error was:

sample,fasta
GCA_000184535.1,/home/test/genomes/ncbi_dataset/data/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
GCA_000260455.1,/home/test/genomes/ncbi_dataset/data/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
GCA_000615725.1,/home/test/genomes/ncbi_dataset/data/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna

changing the input file to the following fixed the issue:

sample,fasta
sample-1,/home/test/genomes/ncbi_dataset/data/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
sample-2,/home/test/genomes/ncbi_dataset/data/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
sample-3,/home/test/genomes/ncbi_dataset/data/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna

Command used and terminal output

No response

Relevant files

No response

System information

No response

@m3hdad m3hdad added the bug Something isn't working label Apr 26, 2024
@jfy133
Copy link
Member

jfy133 commented May 15, 2024

Looking again, Im' even more confused. I think the issue is coming further upstream, but I'm not sure where at the moment.

I have a suspion there is a faulty join function that is somehow merging the original FASTAs with some processed onat asome point. Then all downstream modules are for some reason get recieving both FASTAs rather than one.

That said, why cahnging the name would affect that I have no idea.

@jfy133
Copy link
Member

jfy133 commented Jun 23, 2024

I just tried replicating the issue and I am unable to.

Unfortuantely the slack thread doesn't have the command you used, but to describe what I just tested:

  1. Downloaded the assemblies (manually from ENA due to laziness), and placed in this structure
/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna
  1. Created the following samplesheet
sample,fasta
GCA_000184535.1,/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000184535.1/GCA_000184535.1_ASM18453v1_genomic.fna
GCA_000260455.1,/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000260455.1/GCA_000260455.1_ASM26045v1_genomic.fna
GCA_000615725.1,/home/james/git/nf-core/funcscan/testing/temp_inputs/GCA_000615725.1/GCA_000615725.1_ASM61572v1_genomic.fna
  1. Ran the following command (on my local clone, with latest -r dev) to completion with no error
$ nextflow run ../main.nf -profile docker --outdir ./results --input gunzip-confused-bug.csv --run_amp_screening --amp_skip_amplify --amp_skip_hmmsearch --run_arg_screening false 

 N E X T F L O W   ~  version 24.04.2

Launching `../main.nf` [jolly_leavitt] DSL2 - revision: 70185db88e



------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/funcscan v1.2.0dev
------------------------------------------------------
Core Nextflow options
  runName                          : jolly_leavitt
  containerEngine                  : docker
  launchDir                        : /home/james/git/nf-core/funcscan/testing
  workDir                          : /home/james/git/nf-core/funcscan/testing/work
  projectDir                       : /home/james/git/nf-core/funcscan
  userName                         : james
  profile                          : docker
  configFiles                      : 

Input/output options
  input                            : gunzip-confused-bug.csv
  outdir                           : ./results

Screening Type Activation
  run_amp_screening                : true

AMP: AMPlify
  amp_skip_amplify                 : true

AMP: HMMSearch
  amp_skip_hmmsearch               : true

AMP: ampcombi2 parsetables
  amp_ampcombi_parsetables_dbevalue: 5

AMP: ampcombi2 cluster
  amp_ampcombi_cluster_covmode     : 0
  amp_ampcombi_cluster_mode        : 1

ARG: AMRFinderPlus
  arg_amrfinderplus_identmin       : -1

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/funcscan for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7643099

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/funcscan/blob/master/CITATIONS.md
------------------------------------------------------
executor >  local (31)
[-        ] process > NFCORE_FUNCSCAN:FUNCSCAN:GUNZIP_INPUT_PREP                                                  -
[4f/1887d7] process > NFCORE_FUNCSCAN:FUNCSCAN:ANNOTATION:PYRODIGAL (GCA_000184535.1)                             [100%] 3 of 3 ✔
[b9/0695e8] process > NFCORE_FUNCSCAN:FUNCSCAN:ANNOTATION:GUNZIP_PYRODIGAL_FAA (GCA_000184535.1_pyrodigal.faa.gz) [100%] 3 of 3 ✔
[4a/cf8818] process > NFCORE_FUNCSCAN:FUNCSCAN:ANNOTATION:GUNZIP_PYRODIGAL_FNA (GCA_000184535.1_pyrodigal.fna.gz) [100%] 3 of 3 ✔
[a4/b01a85] process > NFCORE_FUNCSCAN:FUNCSCAN:ANNOTATION:GUNZIP_PYRODIGAL_GBK (GCA_000184535.1_pyrodigal.gbk.gz) [100%] 3 of 3 ✔
[d8/1a619d] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:MACREL_CONTIGS (GCA_000184535.1)                               [100%] 3 of 3 ✔
[47/4e6a4a] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:GUNZIP_MACREL_PRED (GCA_000184535.1.macrel.prediction.gz)      [100%] 3 of 3 ✔
[8f/8e6452] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:GUNZIP_MACREL_ORFS (GCA_000184535.1.macrel.all_orfs.faa.gz)    [100%] 3 of 3 ✔
[11/acb89c] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:AMPIR (GCA_000184535.1)                                        [100%] 3 of 3 ✔
[b9/5d5743] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:DRAMP_DOWNLOAD                                                 [100%] 1 of 1 ✔
[d7/95481b] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:AMPCOMBI2_PARSETABLES (GCA_000615725.1)                        [100%] 3 of 3 ✔
[44/3f59b0] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:AMPCOMBI2_COMPLETE (ampcombi2)                                 [100%] 1 of 1 ✔
[32/1cd22f] process > NFCORE_FUNCSCAN:FUNCSCAN:AMP:AMPCOMBI2_CLUSTER (ampcombi2)                                  [100%] 1 of 1 ✔
[e4/aced66] process > NFCORE_FUNCSCAN:FUNCSCAN:MULTIQC                                                            [100%] 1 of 1 ✔
-[nf-core/funcscan] Pipeline completed successfully-

Given we've overhauled the pipeline in a few places in how data is generated and sent around in -r dev, I'm guessing we have inadvertently fixed this issue as a side effect.

Therefore I'm going to close this for now, but if it crops up again when running the -r dev branch or on the next release, feel free to reopen.

@jfy133 jfy133 closed this as completed Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants