-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document typecheck option in gff3validator #910
Comments
To current GitHub HEAD. Also see genometools#910
At the time I wrote a generic OBO file parser and that the ontology file is parseable by it is a requirement. It doesn't make any assumptions about term names and ontology graphs, but changing established names or graph relationships might invalidate existing GFF3 files. I think the constraints mentioned on top of The-Sequence-Ontology/SO-Ontologies#465 are a good start. This repository is not related to the Perl GFF3 validator. I wrote a new GFF3 parser and validator in C based on the specifications. If I recall correctly the Perl validator didn't meet my performance requirements. It has been extensively tested and at the time had a much better performance than any other GFF3 validator I tested. |
Thanks for the info! Can you describe how you use the graph? Are there assumptions about particular relationship types (e.g. part_of)? For example, is the exon-part_of-transcript relationship used to check that exons are within the bounds of transcripts? |
I parse an OBO file according to the OBO Flat File Format Specification, version 1.2, (see API and implementation). This gives me all the "Term" stanzas. From the set of "Term" stanzas I build the ontology graph (while ignoring obsolete stanzas). See implementation. This gives me all valid terms and the is_a and partof relations. This graph is then used in the GFF3 parser (if enabled) to make sure all terms are valid and all parent-child relationships in GFF3 are part_of in the ontology. In some special cases I also check is_a relationships. |
Many thanks for your explanation! I am assuming that your implementation also uses the member_of relation too; it seems this way: genometools/src/extended/type_graph.c Lines 111 to 116 in 161fb4c
This is good because otherwise most feature graphs would be declared invalid, as the path in SO from a transcript to a gene involves a hop over member-of (which is a sub-relation of member-of). |
You are right, I forgot to mention that. |
Context: The-Sequence-Ontology/SO-Ontologies#465
The gff3 validator uses the sofa.obo file to check gff3. Is there documentation on what the expectations are in the file, both regarding the term names and the ontology graph.
I'm trying to document a kind of service level agreement on the SO side, and want to ensure that future changes to SO don't violate your expectations.
Also, is this the right repo for the canonical gff3 validator? Last I recall it was in perl, not C
The text was updated successfully, but these errors were encountered: