-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running with '-m' still predicts genes across regions of N #30
Comments
@johnne can you provide the FASTA file and the parameters you used? I can check the code, as I use prodigal a lot and want to make sure it is working correctly. |
testfiles.tar.gz
` |
-m is really implemented in a bad clunky way, and I don't believe it expressly forbids genes from crossing the gap (it just turns the sequence into N's). So if there's stuff that looks like protein coding on both sides in the correct frame, and can overcome the score penalty of a bunch of N's, it can predict genes across gaps. This -m stuff was never really intended as a permanent solution to this problem. In the development version, I explicitly added a gap-handling mode where the user can specify the behavior upon seeing any stretch of N's (pass across, run into like scaffolds, or hard stop). This is described in the wiki. Unfortunately, I have no ETA for this version to be done (mostly work on plants these days, and hard to find time to come back to this and finish it... there's still a lot to do before even getting to updating the metagenomic side). |
@hyattpd but metagenomics is "hot" these days. do it for the plant microbiome! :) |
Yes, the metagenomic version can be vastly improved and I have many ideas how to do this. Just need to find time to work on it. |
Hi! This issue is still not fixed, rigth? I am masking my contigs in the places where I am predicting a RNA, for gene prediction not giving me a gene there. Somethinng like this:
And I get this:
Besides it should skip the N's, I am puzzled on how prodigal can detect a gene there, with just two valid bases in the whole contig. Best, |
i doubt I will update -m in the 2.x Prodigal (as noted before, this is done in a better way in 3.0). If you are masking the sequence manually, a simple trick is to begin and end the mask with TTAATTAATTAA, which inserts stop codons in all 6 frames. |
I used to use |
I want to know why prodigal can not predict genes across the sequence you provided. Are there some references? Thank you so much! |
I'm running prodigal (Prodigal V2.6.3: February, 2016) and have contigs with some regions masked with 'N' where infernal cmscan has predicted non-coding RNAs. However, although I run prodigal with the '-m' option to "Treat runs of N as masked sequence; don't build genes across them." the output has genes predicted across those regions with protein sequences translated into stretches of 'X'. I saw that the stretches of 'N' need to be at least 50 characters and they all are but nevertheless it doesn't seem to act as a mask.
I know the '-m' option was used originally by JGI for masking, but isn't it implemented anymore in prodigal? Or am I using it wrong?
Sincerely,
John
The text was updated successfully, but these errors were encountered: