Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify model for with/from gpad to OWL #21

Open
goodb opened this issue Jan 30, 2019 · 23 comments
Open

clarify model for with/from gpad to OWL #21

goodb opened this issue Jan 30, 2019 · 23 comments

Comments

@goodb
Copy link
Contributor

goodb commented Jan 30, 2019

No description provided.

@goodb goodb created this issue from a note in Make a one-model-per-gene converter. (To do) Jan 30, 2019
@goodb goodb changed the title clarify model for with/from gpad to OWL @ukemi @goodb clarify model for with/from gpad to OWL Feb 1, 2019
@vanaukenk
Copy link
Contributor

@goodb - Do you need any additional examples or input from @ukemi or me on this?

@goodb
Copy link
Contributor Author

goodb commented Feb 5, 2019

@vanaukenk yes, we never saw a fully worked example. We need examples of gpad rows that use this structure and that express the more challenging cases discussed at the meeting with a link to a demo go-cam that fits the structure.

@ukemi
Copy link

ukemi commented Feb 5, 2019

I'll provide some examples. I'm wading through the MGI GPAD file.

@ukemi
Copy link

ukemi commented Feb 5, 2019

Should I make the example as a development model on the production site or on the development site?

@goodb
Copy link
Contributor Author

goodb commented Feb 5, 2019

@ukemi I think you should use the production site with the development status tag. That will make things more stable over time. Generally speaking, I think you should always use production if you intend to have your models stay around for any length of time. (I know @kltm concurs on that.)

@ukemi
Copy link

ukemi commented Feb 6, 2019

Here is an example of a pipe-separated 'with' field and a comma-separated 'with' field (field 5; MGI:MGI:4127851|MGI:MGI:4440463). Pinging @vanaukenk for additional comments. The model is 5c4605cc00000457
The left hand side shows an annotation that has a pipe-separated 'with' field that is broken up into two evidence statements. The two right hand models show a comma-separated 'with' field in which all of the alleles support a single piece of evidence. There are two graphs for the latter annotation line because there is a pipe-separated annotation extension describing the role of Ctnnb1 in the specification of two different cell types.

  1. pipe-separated values:
    MGI | MGI:2442827 | acts_upstream_of_or_within | GO:0061512 | MGI:MGI:4439200|PMID:20159594 | ECO:0000315 | MGI:MGI:4127851|MGI:MGI:4440463 |   | 20131001 | MGI | transports_or_maintains_localization_of(MGI:MGI:95728)
    -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

  2. comma-separated values
    MGI | MGI:88276 | acts_upstream_of_or_within | GO:0001708 | MGI:MGI:3578464|PMID:15866163 | ECO:0000315 | MGI:MGI:2148591,MGI:MGI:2148594,MGI:MGI:2450929 |   | 20050728 | MGI | results_in_specification_of(CL:0000138)|results_in_specification_of(CL:0000062)
    -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2019

Thanks @ukemi we can do this. One question/thought. Right now, the pattern for comma-separated values produces an OWL statement like this:
lego:evidence-with "MGI:MGI:2148591,MGI:MGI:2148594,MGI:MGI:2450929"

It is simply a text field with the identifiers concatenated together just as they are in the gpad. The data is there, but is not accessible to the OWL reasoning system at all in this form. In the future, assuming the 'with' pattern is kept, we might want to generate semantically meaningful statements instead. e.g. something along the lines of lego:evidence-with OWL:Intersection[MGI:MGI:2148591 & MGI:MGI:2148594 & MGI:MGI:2450929].

@ukemi
Copy link

ukemi commented Feb 6, 2019

Or lego:evidence-with OWL:Union[MGI:MGI:2148591 & MGI:MGI:2148594 & MGI:MGI:2450929]? This statement means that all of the identifiers need to be considered as qualifying the evidence. I'm never really certain how to handle intersection versus union in cases like this. I agree, the string including the commas is undesirable. @balhoff, @cmungall @kltm, I think we talked a bit about this at the hackathon.

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2019

If I understand the model, I don't think you would use union there as all of them are required for the statement to be true, so Intersection.

If we went down this path, we could model the pipe-separated case as one union block e.g. OWL:Union(MGI:MGI:4127851 | MGI:MGI:4440463) instead of creating multiple evidence statements.

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2019

This is a bit like what I am proposing for modeling complexes. Basically replacing a bunch of linked OWL individuals with logical statements in OWL. See geneontology/pathways2GO#34 (comment) for the Intersection model of a complex.

@ukemi
Copy link

ukemi commented Feb 6, 2019

So in this case, what's the difference between the '|" and the '&'? I think you will need to explain this to us on a call.

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2019

I was using | to mean OR and & to mean AND

@cmungall
Copy link
Member

cmungall commented Feb 6, 2019

Makes sense to use OWL constructs here but remember that the arguments must be classes, fine for genes

Also this has to work all the way up the stack, in the GE and in ART and in Form

@goodb
Copy link
Contributor Author

goodb commented Feb 6, 2019

@cmungall is this a change that should be made now, before the migration, or something that should be punted and done as a project later on? My intuition is the former because it involves changes in Noctua and Minerva code - and, as you allude to, the problem that we are not confident that all referred to entities will be classes. Nothing insurmountable but a fair chunk of work. Let us know what you think.

@cmungall
Copy link
Member

cmungall commented Feb 7, 2019 via email

@goodb
Copy link
Contributor Author

goodb commented Feb 7, 2019

This would also be an opportunity to materialize an ontology to capture the lego relationships. (And perhaps do away with "With" in favor of something a little more meaningful to less informed users and their computers ?)

I suggest pushing back after initial MOD conversion given timelines as I understand them. Could be a nice smallish hackathon group project at some point downstream.

@cmungall
Copy link
Member

cmungall commented Feb 7, 2019 via email

@goodb
Copy link
Contributor Author

goodb commented Feb 7, 2019

That figure doesn't line up with what is currently coming out in the OWL export e.g. in http://noctua.geneontology.org/download/gomodel:5c4605cc00000457/owl I see
http://geneontology.org/lego/evidence-with where it ought to be RO:0002614 according to that model. and http://geneontology.org/lego/evidence where it should be RO:0002612 etc. I guess the first step here would be to update the stack to reflect the model there.

@balhoff
Copy link
Member

balhoff commented Feb 7, 2019

Actually this is more complicated than I thought as we are using string literals for with as well as for reference.

I would like to better understand the issues which keep us using strings for contributor IDs and references, etc. As far as I know it leads to a display problem, which maybe we could work around with some "hide" annotations on those nodes. But I bet this has been discussed before.

@cmungall
Copy link
Member

cmungall commented Feb 7, 2019

Ah yes, it was in my imagination we had transitioned the lego relation to RO

Yes, this should be easier when we start sending back inferred types. Should be easy to hide. Would also be good to change some assumptions in GE display and make it more activity-centric, cc @kltm

@goodb
Copy link
Contributor Author

goodb commented Feb 11, 2019

@cmungall @ukemi we need a decision here so that @dustine32 can finish his conversion script. I think I over-complicated this in the comment above where I mentioned that its just a bunch of strings.. Is it okay to leave them like that, as @ukemi , describes above for the purposes of the batch MOD conversion project ? This won't make them any less useful than they currently are. The conversion to a more useful OWL representation could be done as part of a separate project that can happen later.

@cmungall
Copy link
Member

cmungall commented Mar 1, 2019

Yes, keep as strings. We know we need to change the OWL representation anyway (e.g. use IRIs rather than string literals for publications)

@vanaukenk
Copy link
Contributor

For testing purposes, check the mec-3 (WBGene00003167) GO-CAM import.

There are currently three BP annotations, all with WBVariations as pipe-separated With/From values, that are not imported at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants