Documentation: mention prokaryotic analysis settings #765

d4straub · 2022-02-11T08:31:35Z

Description of feature

First of all, we are mostly still running version 1.4.2 at our facility (I know, ancient; will be changed soon).
The default settings for 1.4.2 params.fc_count_type is exon. This is perfect for eukaryotes I assume, but not for most prokaryote genomes, because those are typically not spliced and (at least NCBI gff, in my experience) usually rather uses CDS or transcript. Meaning, running the pipeline with standard settings on prokaryotes will only output very few features (those few exon), in a recent case 43 of ~2300 transcripts.
The solution could be to add a small section in the documentation and/or do a sanity check counting the features compared to the total line count in the gtf.
Having said that, I am not sure how newer versions of the pipeline would handle this case. I assume that -featurecounts_feature_type has a similar function.

Additionally, --featurecounts_group_type usually does not appear in prokaryotic genomes and crashes the pipeline (at least 1.4.2), it would be nice if this could be handled more gracefully (again, maybe improved in newer versions).

The text was updated successfully, but these errors were encountered:

drpatelh · 2022-02-20T18:35:29Z

Thanks @d4straub ! Yes, I agree we should improve and extend the documentation to include some description of how to tweak the parameters given a selection of common annotations. Maybe something we can tackle at the Hackathon? I haven't used prokaryotic annotations personally in ages but if you are able to get the latest version of the pipeline running with these parameters and have some recommendations then that would be great.

In recent versions of the pipeline featureCounts is just used to generate the biotype QC as mentioned in the docs and isn't used for any formal quantification. In most cases where you aren't using a standard annotation it is almost easier to use --skip_biotype_qc to skip this step.

jenmuell · 2022-03-17T08:29:04Z

Hello @drpatelh, I mainly work with prokaryotes and struggled with similar problems as Daniel but I could actually solve them in some cases and executed the newest version of rnaseq yesterday that worked rather nicely.
As Daniel mentioned exon is not the right feature for the --featurecounts_feature_type but transcript.
Maybe we can work on the documentation together?

d4straub · 2022-03-17T08:36:53Z

Because I havent used the newer (3.x) versions of the pipeline, most of my experience might be outdated. Why not go ahead with your experience with the current version and write it down? If over time someone has more information to add, than this can be always amended. Once I use a recent pipeline version I'll have a look, but it might take some time until I have my next bacterial RNA-Seq project going.

jenmuell · 2022-03-17T08:41:16Z

Of course I can do this but where should I actual write it? I'm familiar with the execution and documentations but not with the code or where to add my suggestions. Can you help me with that?

d4straub · 2022-03-17T08:52:52Z

I thought somewhere around https://nf-co.re/rnaseq/usage#running-the-pipeline might be a good place, or probably below https://nf-co.re/rnaseq/usage#full-samplesheet a new paragraph?
That would mean modifying https://github.com/nf-core/rnaseq/blob/dev/docs/usage.md (dev branch, as linked here).
If you lack some basics, you could have a look at #bytesize talks, e.g. https://nf-co.re/events/2021/bytesize-4-github-contribution-basics

jenmuell · 2022-03-17T09:13:46Z

Thanks, that helps.

drpatelh · 2022-03-17T11:51:34Z

Yep, a new section in the usage docs would be fab. Maybe we can add a section after this one specifically for Bacterial genomes? Be great if you can post links to an example genome and annotation too so we can see exactly what we need to change and why.

jenmuell · 2022-03-17T11:55:13Z

@drpatelh Thought the same thing and added the pull request a minute ago (#790). I can add an example genome and annotation from my test run from yesterday.

Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765

drpatelh · 2022-05-03T09:55:13Z

Added in #790 #820

d4straub added documentation enhancement labels Feb 11, 2022

drpatelh modified the milestones: 3.6, 3.7 Feb 14, 2022

drpatelh mentioned this issue Mar 7, 2022

Execution of featurecount incorrect #780

Closed

jenmuell self-assigned this Mar 17, 2022

drpatelh added the WIP Work in progress label Apr 26, 2022

drpatelh added a commit that referenced this issue May 3, 2022

Merge pull request #790 from jenmuell/master

f5d5707

Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765

drpatelh closed this as completed May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: mention prokaryotic analysis settings #765

Documentation: mention prokaryotic analysis settings #765

d4straub commented Feb 11, 2022

drpatelh commented Feb 20, 2022 •

edited

Loading

jenmuell commented Mar 17, 2022 •

edited

Loading

d4straub commented Mar 17, 2022

jenmuell commented Mar 17, 2022

d4straub commented Mar 17, 2022

jenmuell commented Mar 17, 2022

drpatelh commented Mar 17, 2022

jenmuell commented Mar 17, 2022 •

edited

Loading

drpatelh commented May 3, 2022

Documentation: mention prokaryotic analysis settings #765

Documentation: mention prokaryotic analysis settings #765

Comments

d4straub commented Feb 11, 2022

Description of feature

drpatelh commented Feb 20, 2022 • edited Loading

jenmuell commented Mar 17, 2022 • edited Loading

d4straub commented Mar 17, 2022

jenmuell commented Mar 17, 2022

d4straub commented Mar 17, 2022

jenmuell commented Mar 17, 2022

drpatelh commented Mar 17, 2022

jenmuell commented Mar 17, 2022 • edited Loading

drpatelh commented May 3, 2022

drpatelh commented Feb 20, 2022 •

edited

Loading

jenmuell commented Mar 17, 2022 •

edited

Loading

jenmuell commented Mar 17, 2022 •

edited

Loading