-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sub-workflow for gene statistics #121
Comments
After disussing this with the ToL Genome Notes editor they would like to see OMark https://omark.omabrowser.org/ included in the set of tools to evaluate an annotation set |
To close this issue:
|
Added new subworkflow 'ANNOTATION_STATS' and new parameter --annotation_set to solve issue #121
The work in #135 goes most of the way to solving this issue, but the output from the subworkflow is currently two files with lots of information rather than one file with just the specific data that we are interested in. The next step is to extend the subworkflow to parse the output of the files produced by the AGAT_SPSTATISTICS and AGAT_SQSTATBASIC. What I would like to see is
If you're not sure what any of these terms mean let me know!
|
We want to be able to include some standard basic statistics on the gene/protein annotation set for an assembly in a genome note. This sub workflow should accept an annotation set and calculate some statistics, (exact values still to be determined but will most likely be things like the number of protein coding genes, number of non-coding genes, exons per transcript etc as well as BUSCO scores).
This could be a standalone pipeline or could be added to either the genomenote pipeline or to the ensemblgenedownload pipeline, although it may not always be Ensembl that provides the annotations.
The text was updated successfully, but these errors were encountered: