Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up extraneous edges in BioPAX import #14

Closed
3 tasks done
goodb opened this issue Sep 4, 2018 · 8 comments
Closed
3 tasks done

clean up extraneous edges in BioPAX import #14

goodb opened this issue Sep 4, 2018 · 8 comments
Assignees

Comments

@goodb
Copy link
Contributor

goodb commented Sep 4, 2018

examples in the pathway: http://noctua-dev.berkeleybop.org/editor/graph/gomodel:-80976963

  • Only make an inference about which has_input edge should be enabled_by, when there’s no controller. For example, the reaction “phosphorylation of LRP5/6 cytoplasmic domain by CSNKI” currently has multiple enabled_by edges, and it should only have one, pointing to CSNKI (P78368)
  • There are extraneous “enabled_by (protein-containing complex)” triplets. Same example as Fix auto-generated coordinates - with folding in Noctua in mind. #1 above. I’m guessing this comes from the reasoner. These should be removed.
  • When there’s a GO term from Reactome, you can remove the placeholder root molecular function (GO:0003674). Right now there are two function terms for the same activity in these cases, same example as 1 above.
@goodb goodb self-assigned this Sep 4, 2018
@goodb
Copy link
Contributor Author

goodb commented Sep 5, 2018

@thomaspd The extraneous “enabled_by (protein-containing complex)” statements you see in Noctua aren't actually asserted anywhere or inferred by a reasoner, they are an artifact of how the Noctua display works and the lack of a reference ontology for protein complexes. Noctua really needs the nodes in the models to be annotated with OWL classes and we have no classes for most complexes.

To make this work for new complexes, the converter currently adds 2 rdf:types to each new complex node. It first adds the generic type 'protein-containing complex' which is not incorrect and is needed for downstream reasoning. Next it adds a kind of fake class (in that its not represented in any ontology and not useful for inference) with a name like "WNT:FZD:p5S/T-LRP5/6:DVL:AXIN:GSK3B". The only purpose for this one at the moment is to show a label in Noctua. Noctua shows all direct types in the display, hence you see both.

For proteins, Noctua loads the Neo ontology which instantiates a class for each protein it knows about and allows the display to function. If we want to use Noctua for dealing with complexes we either need to have them represented in something like Neo in advance of going into the editor or we need a way to create the needed classes on the fly or in the models that get imported. Right now any new classes (additions to the 'tbox') in new models are ignored.

@goodb
Copy link
Contributor Author

goodb commented Sep 5, 2018

Moved last point to new issue #20

@goodb
Copy link
Contributor Author

goodb commented Sep 6, 2018

@cmungall what is your take on how to represent complexes in GO-CAMs ? This needs a decision so that my code, curators entering new complexes, and the UI can move forward effectively. Specifically, when a new complex comes into a model that does not have an associated class in neo, (e.g. in the above WNT:FZD:p5S/T-LRP5/6:DVL:AXIN:GSK3B) do we need to generate a new class? If so, how should we approach that? If not, how should we give the entity a visible name in the editor and assure that the system knows what it is for reasoning purposes.

For the short term, I suggest the latter approach. Tag the entity with a generic 'protein complex' class, do not create a new class, have the UI use the RDFS:label to accept and display an informative name, and add the parts of the complex using the has_part relationship such that if and when there are logically defined complex classes in the GO or elsewhere a more specific classification can be inferred automatically. The only code required would be logic in the Noctua UI to show the label instead of 'protein complex' for the nodes in question.

@cmungall
Copy link
Member

cmungall commented Sep 6, 2018

s/protein complex/macromolecular complex/

I agree, but I don't know if there is a need to make an rdfs:label (and this adds complexities, e.g. what if a user wants to change this). I would just do a pure post-composition approach.

I can see a use for a generic name-this-individual function (whether for complexes or any other subgraph), perhaps using DOSDPs. But is the immediate need not somewhat subsumed by having the visuals fold/compact these? (which may not always work, but that is something to be fixed separately)

@goodb
Copy link
Contributor Author

goodb commented Sep 6, 2018

There is still a need to refer to the complex individual in the folded view. See for example, the reaction/function 'Autoglucosylation of GYG2 complexed with GYS2-b' http://noctua-dev.berkeleybop.org/editor/graph/gomodel:-670700788

Its more informative and natural for a user to see enabled_by "GYG2:GYS2-b tetramer" than it is to see enabled_by 'protein complex'.

(But yes, this also brings up the problem of folding in complexes with parts into single function nodes which i have added to geneontology/noctua#581 .)

@goodb
Copy link
Contributor Author

goodb commented Sep 6, 2018

I guess if you really want to avoid the label (which the UI already supports to some extent via geneontology/noctua#536 ) UI work could be moved over to the folding/expansion problem. Users could either see the generic complex node and click to expand to see what it was made out of or there could be a way to dynamically compose a name to show based on the names of the parts.

@goodb
Copy link
Contributor Author

goodb commented Sep 10, 2018

Switching off the UI optimizations for Noctua 1.0 flag addresses: "There are extraneous “enabled_by (protein-containing complex)” triplets". There will now only be one type shown for these complexes, but its going to be the generic, but ontologically extant 'protein-containing complex'. The linguistically meaningful but logically useless fake classes are out per discussion above with @cmungall . Showing a name for these new unidentified complexes is now going to be a UI problem (which can be approached based on inspection of their parts which do have names).

@goodb
Copy link
Contributor Author

goodb commented Sep 10, 2018

Closing - data issues resolved, UI issues e.g. geneontology/noctua#581 (comment) remain.

@goodb goodb closed this as completed Sep 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants