-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
obo file and the association file out of sync? #23
Comments
The mapping file used for GO terms is always the latest. The program will fetch the term mapping file if it is not given as an option. That said, the latest file is from Jan. of 2019, so it is possible that some terms have been deprecated since that release. Unless that specific term is important to you I would just ignore this because it is just a warning. For the second part, about the gene names, can you show an example? That would be the best way forward so I can make sure I know what you are referring to. |
Hi, Thanks for the reply. Looking further I can see there are only 349 obsolete terms. So I am wondering if there is a mismatch between the original gene names and names with a suffux? Here is an example of the input files for getorf (used -c):
Output file =
The output of the -run' program 'scan' file includes the gene name with the suffix =
If I run ontologizer2 using the default output and the latest GO obo file I get error messages:
In case my input files for ontologizer are at fault, here they are:
-g is the latest obo file. I realise ontologizer isn't your problem, but I you notice anything wrong with the input that would be helpful. Thanks. |
UPDATE - |
Thanks for the detailed responses. This is very helpful. I'll try to recreate these issues over the next week or two and make a new release. It sounds like the GAF file being produced is no longer compatible with Ontologizer, which needs to be fixed. For the sequence name format, it is information to know the coordinates and frame of the translation so I do not want to omit this by default. It could be an option to remove it if it causes problems downstream though. Perhaps this could be logged in a separate file. I'll have to experiment to understand better, so I'll leave this open until resolved. |
Concerning the issues, please try the latest release (v0.18.0) and take a look at the changes in the release. I have incorporated the comments discussed here, including: keeping the original IDs in the output, discarding obsolete GO terms, updating the GAF file format to work with Ontologizer, and other new features. One of the new features is filling out the GAF fields more fully, such as the taxon ID based on the study, logging the GO DB and HMMER2GO versions be used, logging the run time, adding Dbxrefs to the GAF file, and other improvements. Also, the demonstration has been updated with more context and examples. I'm going to close this issue now, but feel free to comment if you still see issues and I'll reopen the issue. Thanks again for the feedback. |
Hi,
I have followed the tutorial and think I have the correct outputs. Allow the tutorial showed that the *tblout file should be used, which is not created by 'hmmer2go run'. I used *_scan. When I run Ontologizer I get a message that says:
Skipping association of item "NECHADRAFT_88713_8" to GO:0050662 because term is obsolete!
(Are the obo file and the association file in sync?)
I may be interpreting this incorrectly, but I assume this is because there is a discrepancy between the latest obo fle and the one used by 'hmmer2go fetchmap -o pfam2go'. Is it possible to use an updated file for fetchmap, or do I need to use an old obo file?
Also, the output of getorf create gene names are followed by '_N'. So when creating files for -s and -p does it matter that those gene names do not have the suffixes?
I hope that makes sense. Thank you.
The text was updated successfully, but these errors were encountered: