-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Species "all" is not known to RepeatMasker when running -species all #241
Comments
I am not sure which versions of RepeatMasker would have supported the "all" synonym ( maps to NCBI taxid 1 "root" node ), as a way to search the entire database against your sequence, but newer versions (4.1.3 - 4.1.6) don't accept it, as you reported. I am conflicted about this, as I am not sure this is practical to perform with the current size of the Dfam database and current architecture of RepeatMasker without some care. If you really want to try this, you could get around the error message you are seeing by using any taxa below the root. E.g:
There are two other things to consider. The first, is that this will produce a tremendous amount of false positives (multiple testing problem using many unrelated query sequences). The second, is that you are using '-nolow', which doesn't simply omit simple repeats from reporting, it also doesn't identify them prior to searching against TE families. Many TE families contain stretches of tandem or low-complexity sequences and will falsely label tandem repeat sequences if this option is used. |
Thanks a lot Robert for the quick reply! Also, for the |
RepeatMasker only removes low-divergence simple repeats prior to searching against the TE libraries and then at the end searches for remaining higher divergence simple repeats at the end. In this fashion we avoid the false matching against TE families that contain simple repeats in their models (consensus/pHMM) while still obtaining better alignments to the TEs when the simple repeat contributes a larger alignment to the family. So, if you pre-mask the genome before running RepeatMasker you should take that into account. |
Appreciate the insight Robert! |
Thanks for developing the awesome software. I am running the following command with
-species all
option but encountered an error message. Could you please have a look?nohup RepeatMasker -pa 48 -a -e ncbi -dir all_mask_result -nolow -species all reference-genome.fna
.but I got the following messages:
Here is the software version information:
RepeatMasker version 4.1.3-p1
Search Engine: NCBI/RMBLAST [ 2.14.1+ ]
Using Master RepeatMasker Database: RepeatMasker/Libraries/RepeatMaskerLib.h5
Title : Dfam withRBRM
Version : 3.6
Date : 2022-04-12
Families : 63,852
The text was updated successfully, but these errors were encountered: