Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: mention prokaryotic analysis settings #765

Closed
d4straub opened this issue Feb 11, 2022 · 9 comments
Closed

Documentation: mention prokaryotic analysis settings #765

d4straub opened this issue Feb 11, 2022 · 9 comments
Assignees
Labels
Milestone

Comments

@d4straub
Copy link
Contributor

Description of feature

First of all, we are mostly still running version 1.4.2 at our facility (I know, ancient; will be changed soon).
The default settings for 1.4.2 params.fc_count_type is exon. This is perfect for eukaryotes I assume, but not for most prokaryote genomes, because those are typically not spliced and (at least NCBI gff, in my experience) usually rather uses CDS or transcript. Meaning, running the pipeline with standard settings on prokaryotes will only output very few features (those few exon), in a recent case 43 of ~2300 transcripts.
The solution could be to add a small section in the documentation and/or do a sanity check counting the features compared to the total line count in the gtf.
Having said that, I am not sure how newer versions of the pipeline would handle this case. I assume that -featurecounts_feature_type has a similar function.

Additionally, --featurecounts_group_type usually does not appear in prokaryotic genomes and crashes the pipeline (at least 1.4.2), it would be nice if this could be handled more gracefully (again, maybe improved in newer versions).

@drpatelh
Copy link
Member

drpatelh commented Feb 20, 2022

Thanks @d4straub ! Yes, I agree we should improve and extend the documentation to include some description of how to tweak the parameters given a selection of common annotations. Maybe something we can tackle at the Hackathon? I haven't used prokaryotic annotations personally in ages but if you are able to get the latest version of the pipeline running with these parameters and have some recommendations then that would be great.

In recent versions of the pipeline featureCounts is just used to generate the biotype QC as mentioned in the docs and isn't used for any formal quantification. In most cases where you aren't using a standard annotation it is almost easier to use --skip_biotype_qc to skip this step.

@jenmuell
Copy link
Contributor

jenmuell commented Mar 17, 2022

Hello @drpatelh, I mainly work with prokaryotes and struggled with similar problems as Daniel but I could actually solve them in some cases and executed the newest version of rnaseq yesterday that worked rather nicely.
As Daniel mentioned exon is not the right feature for the --featurecounts_feature_type but transcript.
Maybe we can work on the documentation together?

@d4straub
Copy link
Contributor Author

Because I havent used the newer (3.x) versions of the pipeline, most of my experience might be outdated. Why not go ahead with your experience with the current version and write it down? If over time someone has more information to add, than this can be always amended. Once I use a recent pipeline version I'll have a look, but it might take some time until I have my next bacterial RNA-Seq project going.

@jenmuell
Copy link
Contributor

Of course I can do this but where should I actual write it? I'm familiar with the execution and documentations but not with the code or where to add my suggestions. Can you help me with that?

@d4straub
Copy link
Contributor Author

I thought somewhere around https://nf-co.re/rnaseq/usage#running-the-pipeline might be a good place, or probably below https://nf-co.re/rnaseq/usage#full-samplesheet a new paragraph?
That would mean modifying https://github.com/nf-core/rnaseq/blob/dev/docs/usage.md (dev branch, as linked here).
If you lack some basics, you could have a look at #bytesize talks, e.g. https://nf-co.re/events/2021/bytesize-4-github-contribution-basics

@jenmuell
Copy link
Contributor

Thanks, that helps.

@drpatelh
Copy link
Member

Yep, a new section in the usage docs would be fab. Maybe we can add a section after this one specifically for Bacterial genomes? Be great if you can post links to an example genome and annotation too so we can see exactly what we need to change and why.

@jenmuell
Copy link
Contributor

jenmuell commented Mar 17, 2022

@drpatelh Thought the same thing and added the pull request a minute ago (#790). I can add an example genome and annotation from my test run from yesterday.

@drpatelh drpatelh added the WIP Work in progress label Apr 26, 2022
drpatelh added a commit that referenced this issue May 3, 2022
Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765
@drpatelh
Copy link
Member

drpatelh commented May 3, 2022

Added in #790 #820

@drpatelh drpatelh closed this as completed May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants