Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define what qualifies a GPAD line as a distinct assertion individual in GO-CAM #45

Open
dustine32 opened this issue Apr 26, 2019 · 19 comments

Comments

@dustine32
Copy link
Collaborator

dustine32 commented Apr 26, 2019

So far we've been "collapsing/consolidating," based on certain criteria (e.g. GP + term + extensions are same), multiple GPAD lines into distinct assertion individuals containing multiple evidence. I'd like to get clarification and document this criteria here first, then we can move it to the wiki page.
In my head, this is basically a header vs line situation so I'll present it like so:

Header:

  1. GP ID
  2. qualifiers aka relations
  3. primary term
  4. With/From (if primary term is GO:0005515 or descendant)
  5. Annotation Extensions

Line:

  1. Reference
  2. Evidence Code
  3. With/From (if primary term is not GO:0005515 or descendant)
  4. Source line
  5. Date
  6. Assigned by
  7. Annotation Properties

@vanaukenk @ukemi @thomaspd This mainly came about recently when trying to figure out how to group lines by the with/from field, hence the question mark. Do multiple GPAD lines that differ only in with/from values represent the same assertion individual in GO-CAM or multiple?

@ukemi
Copy link

ukemi commented Apr 26, 2019

The with/from field qualifies (or was originally intended to qualify) the evidence, so the individuals should be collapsed, but each ref/evidence_code/with is a separate piece of evidence.

The one caveat to that is that I thought we had decided to express binding annotations in a formally correct way with the bound entity being an input. then we would spit them back out as they currently are in the with field. @vanaukenk is that still the plan?

@vanaukenk
Copy link
Contributor

@ukemi
Yes, that is still the plan for protein binding annotations. If we hadn't done this for protein binding, then not only would we have been inconsistent with GO-CAM, but we would not have been able to collapse the evidence for this particular term.

@dustine32
Copy link
Collaborator Author

OK, I think we can still work with that protein binding caveat. I now see the protein binding section on the wiki.

So basically, DO split out distinct with/from values into multiple assertions (translated with different has_input entities) IF the primary term is protein binding (and descendants or just GO:0005515?), ELSE collapse these different with/from values onto the same assertion individual.

@ukemi @vanaukenk Sound correct?

@ukemi
Copy link

ukemi commented Apr 26, 2019

Yes. That sounds correct.

@dustine32
Copy link
Collaborator Author

Cool, thanks! I updated the header/line thingy in my first comment to clarify how with/from is being handled.

@vanaukenk
Copy link
Contributor

@dustine32

We will want to split out distinct With/From values into multiple assertions for GO:0005515 and also its children, as we may have annotations to terms like 'protein kinase binding' GO:0019901 that still refer to different entities in the With/From field.

I'll update the protein binding section of the wiki to make it clearer which of the options we chose.

@dustine32 dustine32 self-assigned this Apr 26, 2019
@vanaukenk
Copy link
Contributor

@dustine32

I've updated the import rules section for protein binding. Please let me know if anything is unclear or doesn't look right to you:

http://wiki.geneontology.org/index.php/Noctua_MOD_Imports#Protein_Binding_Annotations

Thx.

@dustine32
Copy link
Collaborator Author

Thanks @vanaukenk ! That definitely is more straight-forward. I just wanted to make sure I understand the last point:

Evidence will not be combined for annotations to protein binding or its children

So multiple GPAD lines with same GP-term-with/from-etc (the header fields above) values won't be collapsed into the same assertion if their evidence code-references (line fields) vary? More simply, each protein binding GPAD line will have its own assertion individual in GO-CAM?

@ukemi
Copy link

ukemi commented Apr 29, 2019

Is this true even for annotation lines that have the same value in the 'with' field?
Ex:

MGI MGI:1340046 enables GO:0005515 MGI:MGI:4845793|PMID:21068328 ECO:0000353 UniProtKB:Q8K1S1 20130311 MGI
MGI MGI:1340046 enables GO:0005515 MGI:MGI:4441002|PMID:20220021 ECO:0000353 UniProtKB:Q8K1S1 20130311 MGI

@vanaukenk
Copy link
Contributor

@dustine32
Yes, we can combine protein binding GPAD lines if they are the same GP-term-with/from but different references. @ukemi - is that what you are meaning to illustrate above?

@vanaukenk
Copy link
Contributor

image
Another potential illustration.
These three annotations could be combined as evidence for a single Noctua instance since they refer to the same binding (EGL-1 binds CED-9) but just cite different references (and have different annotation dates).

@vanaukenk
Copy link
Contributor

Whereas in this example (just looking at the WB annotations, ignore the SWIE one), we would have two separate protein binding annotations:

image

One that combined three pieces of evidence and the WB:WBGene00000418 in the With/From field and one with a single piece of evidence and WB:WBGene00001170.

@vanaukenk
Copy link
Contributor

vanaukenk commented Apr 29, 2019

Here's an example where it looks like we could combine evidence for a cellular component annotation, but haven't yet:

image

image

ced-9 (WB:WBGene00000423)

lat-1 (WB:WBGene00002251) is another example of two CC annotations whose evidence could be merged into a single individual.

@dustine32
Copy link
Collaborator Author

@vanaukenk Oh ok, that's what I was thinking too. Differing references alone shouldn't require multiple assertion individuals. I also forgot that all protein binding and descendant term annotations should be using the same IPI evidence code, so that removes one variable.

Thanks so much for clarifying!

@dustine32
Copy link
Collaborator Author

@vanaukenk I think those two examples are now collapsing correctly. Here they are on my dev server:

@vanaukenk
Copy link
Contributor

@dustine32 - the two CC examples above are indeed now collapsing correctly. Thanks!

@ukemi
Copy link

ukemi commented May 7, 2019

@vanaukenk have you set up a formal testing document? If not, I will have a shot at it.

@vanaukenk
Copy link
Contributor

Yes, I started with this spreadsheet here:

https://docs.google.com/spreadsheets/d/1XFuD6LOyFKXNk94jIK8zv1TrESfwCJo-RnrXQ3tzmJg/edit

@dustine32
Copy link
Collaborator Author

@vanaukenk @ukemi The latest iteration of WB, MGI models are now up on noctua-dev so this can be tested there now. This won't have the fix for the comma-separated with/from snafu that @ukemi pointed out here but I've since fixed it on my USC server.

Here are some stats from the import attached to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants