Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to BUSCO 4.0.6 #81

Closed
wants to merge 3 commits into from
Closed

Conversation

skrakau
Copy link
Member

@skrakau skrakau commented Jul 30, 2020

Here is an update to BUSCO 4.0.6 (v4.1.2 is in conflict with the MAG python version 3.6.7). It contains only the basic functionality so far, without the automatic lineage selection and without automatic DB download which is provided since BUSCO 4 (which would require extra error handling among among others with the current BUSCO version).

I renamed the variable ${assembly} to ${bin} within the busco processes.

Additionally I changed the busco_plot process and the channel logic, so that the process now runs on each assembler-sample combination (instead of calling the busco script generate_plot.py for each combination within the process). This also fixes the grouping of BUSCO summaries for the plot, which was wrong in case one sample name was the prefix of another one.

For the final busco summary for all bins I created an extra process.

Since current versions of the BUSCO generate_plot.py (v4.0.6-4.1.2) can not handle additional dots within the summary file names, I unfortunately had to replace also dots with underscores in the bin names, which will also appear as such in the BUSCO plots.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md

@skrakau skrakau requested a review from d4straub July 30, 2020 09:06
@skrakau
Copy link
Member Author

skrakau commented Jul 30, 2020

And importantly it solves #77.

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you describe are all great additions!

Documentation requires an update I think.

assemblersample=\$(echo \"$assemblersampleunique\" | sed 's/[][]//g')
IFS=', ' read -r -a assemblersamples <<< \"\$assemblersample\"
# replace dots in bin names within summary file names with underscores
# (currently, v4.1.2, generate_plot.py does not allow further dots)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its v4.0.6, isnt it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true (just also holds for newer version), will adjust

main.nf Show resolved Hide resolved
main.nf Show resolved Hide resolved
Comment on lines +1150 to +1151
file("${bin}_buscos.faa") optional true
file("${bin}_buscos.fna") optional true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we are already thinking about making the result folder smaller in size, why not gzip these too? These files might be not large, however there might be many bins. This is optional obviously.

file("${assembly}_buscos.faa")
file("${assembly}_buscos.fna")
set val(assembler), val(sample), file("short_summary.specific.${db}.${bin}.txt") into (ch_busco_multiqc, ch_busco_to_summary, ch_busco_plot)
file("${bin}_busco.log")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least this filename changed, probably the output.md needs an update too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary output and also the plots etc. are yet listed, I will add them. For "${assembly}_buscos.faa" just the variable name changed, not that actual file name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant _busco_log.txt -> _busco.log, and yes, thats nit-picky.

@d4straub
Copy link
Collaborator

Hm, test_hybrid failed when busco found no relevant genes (Prodigal found no genes matching ref database) and therefore short_summary wasn't produced.

and also:

WARN: Access to undefined parameter `spadeshyrid_fix_cpus` -- Initialise it to a default value eg. `params.spadeshyrid_fix_cpus = some_value`

@skrakau
Copy link
Member Author

skrakau commented Jul 30, 2020

yeah, will have a look

@skrakau
Copy link
Member Author

skrakau commented Jul 30, 2020

I can not reproduce the test_hybrid_host_rm error, also not with nextflow 19.10.0 for which the error occurred (the other test run through). Need to dig deeper into it.

@skrakau
Copy link
Member Author

skrakau commented Aug 10, 2020

Will close this, since there remains the issue with BUSCO (https://gitlab.com/ezlab/busco/-/issues/305). Will create a new PR with a newer BUSCO version when this is fixed.

@skrakau skrakau closed this Aug 10, 2020
@skrakau skrakau deleted the update_busco_light branch May 31, 2021 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants