Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emote origin did not advertise Ref for branch refs/heads/116-integron_finder_2gff-terminated-with-an-error #118

Closed
JavariaAshraf opened this issue Feb 15, 2024 · 20 comments
Assignees

Comments

@JavariaAshraf
Copy link

Hi Fmalmeida,
I am getting the following error as I try to resume my pipeline analysis.
Pulling fmalmeida/bacannot ... Remote origin did not advertise Ref for branch refs/heads/116-integron_finder_2gff-terminated-with-an-error. This Ref may not exist in the remote or may be hidden by permission settings.
The previous error was in circos,
`*** CIRCOS ERROR ***

    cwd:
    conf

    command: /opt/conda/bin/circos

You have asked to draw [213] ideograms, but the maximum is currently set at
[200]. To increase this number change max_ideograms in etc/housekeeping.conf.
Keep in mind that drawing that many ideograms may create an image that is too
busy and uninterpretable.`

I tried to fix it with changing the 200 with 213 in housekeeping.config file, but the above error is not letting the program to resume.
Kindly help.

@fmalmeida
Copy link
Owner

Hi @JavariaAshraf ,
The error is because this branch does not exist anymore.
I have merged the code in a patch release last week when we finished the other issue.

Try running pointing it to the new version instead of the missing branch, like:

nextflow run fmalmeida/bacannot -r v3.3.2 …

Cheers.

@fmalmeida
Copy link
Owner

One more thing, @JavariaAshraf

However, even after solving the problem of the non-existing branch, I still believe it will fail because you cannot modify the file and run it again. When resuming, nextflow will create a new working directory for the job, and your modifications would be ignored.

Thus, because this is the very last module and having a circos with so many points is not meaningful anyways, I suggest you create the following file, called circos.config in order to make the pipeline ignore the error in this module, and run the pipeline with it.

contents of circos.config file

process {
    withName: 'CIRCOS' {
        errorStrategy = 'ignore'
    }
}

And run like this:

nextflow run fmalmeida/bacannot -r v3.3.2 -c circos.config <rest of your params> -resume

Finally, because this CIRCOS module is the last one, and it is not meaningful, I will add in the weekend two parameters to manage it, one to allow someone to skip it, and another one to allow someone to easily ignore the errors it produces (like with the config I shared with you will do).

The difference is:

  • by skipping, you will never run and will not have any data related to it
  • by ignoring the error, you still run it, and have the intermediate files generated by it to use afterwards, even if it failed during the pipeline.

In both cases, at least, it should avoid breaking the pipeline.

Can you give it a try, using this custom config and the correct revision as suggested and see if it helps?

Depending on the feedback I will know what to set up as an action plan.

Cheers 😄

@JavariaAshraf
Copy link
Author

JavariaAshraf commented Feb 19, 2024

Hi @fmalmeida,
I started the run as suggested above:
Got this error:
Kindly review

`Caused` by:
  Process `BACANNOT:MERGE_ANNOTATIONS (vibrio31)` terminated with an error exit status (1)

Command executed:

  # Rename gff and remove sequence entries
  # bakta has region entries
  awk '$3 == "CDS"' prokka_gff | grep "ID=" > vibrio31.gff ;
  
  ## Increment GFF with custom annotations
  ### VFDB
  if [ ! $(cat vibrio31_vfdb_blastn_onGenes.txt | wc -l) -le 1 ]
  then
    addBlast2Gff.R -i vibrio31_vfdb_blastn_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d VFDB -t Virulence ;
    grep "VFDB" vibrio31.gff > virulence_vfdb.gff ;
  fi
  
  ### Victors
  if [ ! $(cat vibrio31_victors_blastp_onGenes.txt | wc -l) -le 1 ]
  then 
    addBlast2Gff.R -i vibrio31_victors_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d Victors -t Virulence ;
    grep "Victors" vibrio31.gff > virulence_victors.gff ;
  fi
  
  ### KEGG Orthology
  ## Reformat KOfamscan Output
  if [ ! $(cat vibrio31_ko_forKEGGMapper.txt | wc -l) -eq 0 ]
  then
    awk \
      -F'\t' \
      -v OFS='\t' \
      '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' \
      vibrio31_ko_forKEGGMapper.txt  | \
    sed \
      -e 's/\t/,/g' \
      -e 's/,,/\t/g' | \
    awk  '$2!=""' > formated.txt ;
    addKO2Gff.R -i formated.txt -g vibrio31.gff -o vibrio31.gff -d KEGG ;
  fi
  
  ### ICEs
  if [ ! $(cat vibrio31_iceberg_blastp_onGenes.txt | wc -l) -le 1 ]
  then
    addBlast2Gff.R -i vibrio31_iceberg_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d ICEberg -t ICE ;
    grep "ICEberg" vibrio31.gff > ices_iceberg.gff ;
  fi
  
  ### Prophages
  if [ ! $(cat vibrio31_phast_blastp_onGenes.txt | wc -l) -le 1 ]
  then
    addBlast2Gff.R -i vibrio31_phast_blastp_onGenes.txt -g vibrio31.gff -o vibrio31.gff -d PHAST -t Prophage ;
    grep "PHAST" vibrio31.gff > prophages_phast.gff ;
  fi
  
  ### Resistance
  #### RGI
  if [ ! $(cat RGI_vibrio31.txt | wc -l) -le 1 ]
  then
    addRGI2gff.R -g vibrio31.gff -i RGI_vibrio31.txt -o vibrio31.gff ;
    grep "CARD" vibrio31.gff > resistance_card.gff ;
  fi
  
  #### AMRFinderPlus
  if [ ! $(cat AMRFinder_resistance-only.tsv | wc -l) -le 1 ]
  then 
    addNCBIamr2Gff.R -g vibrio31.gff -i AMRFinder_resistance-only.tsv -o vibrio31.gff -t Resistance -d AMRFinderPlus ;
    grep "AMRFinderPlus" vibrio31.gff > resistance_amrfinderplus.gff ;
  fi
  
  #### Resfinder
  if [ ! $(cat results_tab.gff | wc -l) -eq 0 ]
  then
    bedtools intersect -a results_tab.gff -b vibrio31.gff -wo | sort -k19,19 -r | awk -F '\t' '!seen[$9]++' > resfinder_intersected.txt ;
    addBedtoolsIntersect.R -g vibrio31.gff -t resfinder_intersected.txt --type Resistance --source Resfinder -o vibrio31.gff ;
    grep "Resfinder" vibrio31.gff > resistance_resfinder.gff ;
    rm -f resfinder_intersected.txt ;
  fi
  
  #### Custom Blast databases
  for file in input.11 ;
  do
    if [ ! $(cat $file | wc -l) -eq 0 ]
    then
      db=${file%%_custom_db.gff} ;
      bedtools intersect -a ${file} -b vibrio31.gff -wo | sort -k19,19 -r | awk -F '\t' '!seen[$9]++' > bedtools_intersected.txt ;
      addBedtoolsIntersect.R -g vibrio31.gff -t bedtools_intersected.txt --type "CDS" --source "${db}" -o vibrio31.gff ;
      grep "${db}" vibrio31.gff > custom_database_${db}.gff ;
      rm -f bedtools_intersected.txt ;
    fi
  done
  
  ### digIS transposable elements
  touch transposable_elements_digis.gff
  if [ -s digis_gff ]
  then
    ( cat digis_gff | sed 's/id=/ID=/g' > transposable_elements_digis.gff && rm digis_gff ) ;
    cat vibrio31.gff transposable_elements_digis.gff | bedtools sort > tmp.out.gff ;
    ( cat tmp.out.gff > vibrio31.gff && rm tmp.out.gff );
  fi
  
  ### integron_finder results
  ### integrons are unique / complete elements and should not be intersected
  cat vibrio31.gff vibrio31_integrons.gff | bedtools sort > tmp.gff ;
  cat tmp.gff > vibrio31.gff
  rm tmp.gff

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: malformed GFF entry at line 3545. Coordinate detected that is < 1. Exiting.

Work dir:
  `/home/cdc-bioinfo/Vibrio-Feb2024/work/a9/4e823fd5dee91b46737e4606324995`

Please help.

@fmalmeida
Copy link
Owner

Hi @JavariaAshraf ,

Once again it seems you have 0-based annotation because something was found in the very first base.

However this time it is not clear which one is it. Can you send me this working directory (/home/cdc-bioinfo/Vibrio-Feb2024/work/a9/4e823fd5dee91b46737e4606324995) with all the files that are available inside it?

I can take a look during the week. In the meantime I would recommend removing the “problematic” genome from the run.

Cheers.

@JavariaAshraf
Copy link
Author

Please see the attachment.

4e823fd5dee91b46737e4606324995.zip

Do I have to re-run the pipeline from start? It takes a lot of time. Any way to resume from last step? The resume option doesn't work, it starts the pipeline from first step.
Please guide.
Thanks

@fmalmeida
Copy link
Owner

So better to wait for a fix.
I believe it is not resuming because the samplesheet is different (when removing the genome).

@fmalmeida
Copy link
Owner

fmalmeida commented Feb 19, 2024

I think it may still be the integron finder file. Can you send me these files that were not copied in the dir (only the links came):

/home/cdc-bioinfo/Vibrio-Feb2024/work/e8/6a17b21b8ad6b40d2f34e3a95aebb5/vibrio31_integrons.gff
/home/cdc-bioinfo/Vibrio-Feb2024/work/ce/4fede0a759e0aef9aac4cd449b28a1/vibrio31_phast_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/83/f663818f65e05cab793bd795019c3e/vibrio31_vfdb_blastn_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/8e/2ae390aa3b9292f043f643e51f2b5a/vibrio31_victors_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/79/a434755b9f49795debc139b78e9f58/KOfamscan/vibrio31_ko_forKEGGMapper.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/c3/b20059a379c86919f976bbbf494728/vibrio31_iceberg_blastp_onGenes.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/09/33e52bffb0c86717fd85cb2be8b7e8/RGI_vibrio31.txt
/home/cdc-bioinfo/Vibrio-Feb2024/work/c7/034a3019b2ba76d63ca5e9889c7710/resfinder/results_tab.gff
/home/cdc-bioinfo/Vibrio-Feb2024/work/37/e0acf6a9f5f210b6d007903bec86fa/AMRFinder_resistance-only.tsv

@JavariaAshraf
Copy link
Author

Please find the files: They were right-protected and was unable to copy
Now they are attached.
4e823fd5dee91b46737e4606324995copy.zip

@fmalmeida
Copy link
Owner

They are still not copied. Only the links are comming, not the real files. See below:

Screenshot from 2024-02-19 10-03-47

@JavariaAshraf
Copy link
Author

4e823fd5dee91b46737e4606324995copy.zip
Please see... I have renamed them.
I hope they are accessible now.

@fmalmeida
Copy link
Owner

fmalmeida commented Feb 19, 2024

Hi @JavariaAshraf ,

For some reason, some of the integron finder results are with negative coordinates:

13      Integron_Finder integron        69515   74987   .       +       1       ID=integron_01;integron_type=complete
24      Integron_Finder integron        25      12675   .       +       1       ID=integron_01;integron_type=CALIN
25      Integron_Finder integron        19      9958    .       +       1       ID=integron_01;integron_type=CALIN
27      Integron_Finder integron        6936    9536    .       +       1       ID=integron_01;integron_type=complete
31      Integron_Finder integron        478     4564    .       +       1       ID=integron_01;integron_type=CALIN
32      Integron_Finder integron        66      4604    .       +       1       ID=integron_01;integron_type=CALIN
33      Integron_Finder integron        117     4047    .       +       1       ID=integron_01;integron_type=CALIN
37      Integron_Finder integron        -2      3108    .       +       1       ID=integron_01;integron_type=CALIN
38      Integron_Finder integron        2       2804    .       +       1       ID=integron_01;integron_type=CALIN
44      Integron_Finder integron        70      1709    .       +       1       ID=integron_01;integron_type=CALIN
46      Integron_Finder integron        -17     1603    .       +       1       ID=integron_01;integron_type=CALIN

I would also need that you send me the results of integron finder for this tool so I can check again the conversion to GFF module. It seems that the issue described in #116 is not yet finished.

So, I would need the files (for this specific genome) so I can first check the tool's results and make sure they are proper and then assess whether I can use other scripts for converting it to GFF to avoid this issue.

In the meantime, I have the following alternatives:

  • if in a hurry for having the results for the genomes:
    • if you want the integron finder results, you can try removing the problematic genomes and running for the others while I fix it. However, as you saw, sometimes it may not resume, which can be either because of the new samplesheet or the fact that you are running a different pipeline version/branch.
    • if the integron finder results are not that relevant, you can try running an earlier version of the pipeline which does not have the integron finder tool, and thus will not have this problem: -r v3.2
  • or you can wait for me to fix it and provide a new branch or version for you to run for all the genomes ( I cannot guarantee speed but I can make a custom branch with the fix available this week, at least )

@JavariaAshraf
Copy link
Author

Thank you for your quick replies.
I am attaching the folder for specific genome.
integron_finder_v31.zip
I would also run the older version as I need the results.
Thank you

@fmalmeida
Copy link
Owner

fmalmeida commented Feb 19, 2024

Okay,
Let me know how it goes. In the meantime I work on the issue of the current version.

Remember to use the circos configuration to avoid the circos error when running the earlier version. Hopefully it works using that version, if not, we can investigate.

@fmalmeida
Copy link
Owner

Hi @JavariaAshraf ,
It seems that the problem is in the integron finder tool itself. I would have to open an issue in the tool's github.

Can you send me the sequences of the contigs 37 and 46, which are the problematic ones from these genomes, so that I can investigate the issue with the tool's developers.

@JavariaAshraf
Copy link
Author

Hi @fmalmeida
You are right. I have also installed the tool separately and it is giving the following error.
integron_Finder_Error.txt
Thanks.

@fmalmeida fmalmeida self-assigned this Feb 20, 2024
@fmalmeida
Copy link
Owner

Just for reference, I have opened the issue in their git. gem-pasteur/Integron_Finder#114.
Once it is fixed, I can bring the new version to the pipeline.

The only remedy I can do, for now, is releasing a new patch release this week that allows one to skip the integron finder tool, with a param --skip_integron_finder so that if this happens, one can run the rest.

@JavariaAshraf
Copy link
Author

this will be much helpful.
Thank you.

@fmalmeida
Copy link
Owner

Hi @JavariaAshraf ,
While I wait for the real fix in the integron finder tool, I have added the option for skipping INTEGRON_FINDER and/or the CIRCOS module.

Before I make a release, could you give it a try?

I would ask for you to try, first, only using the genome that cause the current problem, vibrio31.

You could try to see if skipping these modules, the pipeline run successfully for this genome. If so, I can then wrap-up as a patch release.

Suggested command line

nextflow run fmalmeida/bacannot \
    -r 118-add-skip-integron-finder-param \
    -latest \
    -resume \
    --skip_integron_finder \
    --skip_circos \
    # ... the rest of your normal input params

Depending on the result, I can merge it (or not).

@JavariaAshraf
Copy link
Author

Hi @fmalmeida,
I have run three troubled sequences with the parameters you suggested above and the run completed smoothly. here's the screenshot.
Screenshot from 2024-02-21 11-18-11

@fmalmeida
Copy link
Owner

Hi @JavariaAshraf ,

Thanks for the feedback. I am currently closing this issue then.
I have merged the code to the dev branch (on PR #119), so, if you need to run the pipeline with these parameters you must refer to the dev branch, with nextflow run fmalmeida/bacannot -r dev -latest.

Finally, I opened a new ticket #120 so I remember to update the docker image with the new version of the integron finder tool once the devs release the fix.

Cheers,
and thanks for reporting and using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants