Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read length should be not smaller than 50, but ribo-seq reads are ~30 bp #16

Closed
huguanjing opened this issue Jun 7, 2022 · 4 comments
Closed
Labels
enhancement New feature or request question Further information is requested

Comments

@huguanjing
Copy link

-l LEN, --len LEN Sequencing read length, should be not smaller than 50.

Is this correct? Ribo-seq reads are ~ 30 bp

@huguanjing
Copy link
Author

To be clear, my question is whether RiboDetector can be used to detect and remove rRNA from ribo-seq samples? Thanks!

@dawnmy
Copy link
Member

dawnmy commented Jun 7, 2022

-l LEN, --len LEN Sequencing read length, should be not smaller than 50.

Is this correct? Ribo-seq reads are ~ 30 bp

Thank you for pointing this out. You can ignore the help message. this is just a suggestion, but you can still use it for reads shorter than 50 or 40bp. Yes, you can use RiboDetector for Ribo-Seq reads, however, the accuracy will be slightly lower when the reads are short (I think this will be the same for the other methods/tools). I will update the help message in the new release. Thank you!

@ARW-UBT
Copy link

ARW-UBT commented Jun 9, 2022

May I add a related question: After quality filtering by trimmomatic, the (uniform) read length of e.g. 150 bp in paired end mode changes to a length distribution (e.g. 36 to 150 bases, depending on the settings). How should the -l LEN parameter used in this cases? What will happen to reads shorten than the -l value?

@dawnmy
Copy link
Member

dawnmy commented Jun 9, 2022

You can check the mean length by using seqkit stats, then use the mean length LEN for the -l parameter. If the read is longer than the mean, only the first LEN bases will be used to capture the sequence features for classification. If the read is shorter than or equal to the mean, the whole read will be used. In any case, the output files will give you the whole read. So you don't need to worry about the variable length of your input reads.

@dawnmy dawnmy added enhancement New feature or request question Further information is requested labels Jun 9, 2022
dawnmy added a commit that referenced this issue Aug 14, 2022
@dawnmy dawnmy closed this as completed Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants