-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to BUSCO 4.0.6 #81
Conversation
And importantly it solves #77. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes you describe are all great additions!
Documentation requires an update I think.
assemblersample=\$(echo \"$assemblersampleunique\" | sed 's/[][]//g') | ||
IFS=', ' read -r -a assemblersamples <<< \"\$assemblersample\" | ||
# replace dots in bin names within summary file names with underscores | ||
# (currently, v4.1.2, generate_plot.py does not allow further dots) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its v4.0.6, isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true (just also holds for newer version), will adjust
file("${bin}_buscos.faa") optional true | ||
file("${bin}_buscos.fna") optional true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we are already thinking about making the result folder smaller in size, why not gzip these too? These files might be not large, however there might be many bins. This is optional obviously.
file("${assembly}_buscos.faa") | ||
file("${assembly}_buscos.fna") | ||
set val(assembler), val(sample), file("short_summary.specific.${db}.${bin}.txt") into (ch_busco_multiqc, ch_busco_to_summary, ch_busco_plot) | ||
file("${bin}_busco.log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least this filename changed, probably the output.md needs an update too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary output and also the plots etc. are yet listed, I will add them. For "${assembly}_buscos.faa" just the variable name changed, not that actual file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant _busco_log.txt
-> _busco.log
, and yes, thats nit-picky.
Hm, test_hybrid failed when busco found no relevant genes (Prodigal found no genes matching ref database) and therefore and also:
|
yeah, will have a look |
I can not reproduce the |
Will close this, since there remains the issue with BUSCO (https://gitlab.com/ezlab/busco/-/issues/305). Will create a new PR with a newer BUSCO version when this is fixed. |
Here is an update to BUSCO 4.0.6 (v4.1.2 is in conflict with the MAG python version 3.6.7). It contains only the basic functionality so far, without the automatic lineage selection and without automatic DB download which is provided since BUSCO 4 (which would require extra error handling among among others with the current BUSCO version).
I renamed the variable ${assembly} to ${bin} within the busco processes.
Additionally I changed the
busco_plot
process and the channel logic, so that the process now runs on eachassembler
-sample
combination (instead of calling the busco scriptgenerate_plot.py
for each combination within the process). This also fixes the grouping of BUSCO summaries for the plot, which was wrong in case one sample name was the prefix of another one.For the final busco summary for all bins I created an extra process.
Since current versions of the BUSCO
generate_plot.py
(v4.0.6-4.1.2) can not handle additional dots within the summary file names, I unfortunately had to replace also dots with underscores in the bin names, which will also appear as such in the BUSCO plots.PR checklist
nextflow run . -profile test,docker
).nf-core lint .
).docs
is updatedCHANGELOG.md
is updatedREADME.md
is updatedLearn more about contributing: https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md