-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F-P GAF has redundant annotations #576
Comments
Are they duplicate because they were assigned by different sources ? |
Well yes. PomBase made them from the paper, and the inference pipeline assigned the duplicates from our annotation ! @rachhuntley made the same point about a slightly different type of redundancy from the inference pipeline here: geneontology/go-annotation#1487 Although I do not understand this comment about the source of the annotation: The annotation that is created in this instance is odd; it states that P46153 is inferred to be involved in |
If I understood correctly, |
Another example with the redundancy Lots of annotations to "cell" when we have more specific annotations (over occurs_at links) |
Also the term GO:0005622 intracellular, should we block this one too? I don't think it makes useful inferences anyway..... |
We just need a single example of where this is not behaving as expectedly. Here just a gene ID and an inferred term ID. @yy20716 will investigate this Let me try an restate The pombase source has a direct annotations of ste11 to GO:0045944
The prediction file includes an inference of the same gene to this term:
in this case it does not matter where the inference comes from. If we are intending to filter out redundant annotations, this is a bug. If we are leaving it to the consumer to filter these out this needs documented. See my comments here on where this should be documented: #2226 |
Here is a single example: (number 2 above) vht1 We made the following process annotations from this paper: PMID:12557275 vht1 | GO:1905135 | biotin import across plasma membrane | IMP | the we get the following less specific inferences from the same source: PMID:12557275
|
They are quite annoying when we make great efforts to present non-redundant annotation ;) We filter redundant non experimental from any source, but these show up because they are experimental. I guess we could change so that we filter duplicates from GOC assigner, but I think it would be better generally for everyone not to generate more duplicate annotations, so I haven't actioned this yet. |
Your example for ste11 is slightly different than the examples I am using. It is from 2 independent sources, and it's a different evidence code. |
Responding to Val's comment: The annotation that is created in this instance is odd; it states that P46153 is inferred to be involved in It looks like this inference is no longer made. |
closed duplicate has more examples... |
@cmungall you can probably close this one? |
We will develop guidelines: geneontology/go-annotation#2060 |
There is a lot of redundancy with existing annotation in the inferred GAF
Here are some examples:
thi1. 2 redundant annotations:
ii) redundant and identical (we made this annotation from this paper)
GO:0045944 | positive regulation of transcription by RNA polymerase II | IMP | Tang CS et al. (1994)
ii) redundant and less specific
GO:0006366 | transcription by RNA polymerase II | IMP | Tang CS et al. (1994)
vht1, lots of redundant less specific
GO:0042886 | amide transport | IMP | Stolz J (2003)
GO:1905039 | carboxylic acid transmembrane transport | IMP | Stolz J (2003) |
GO:0015718 | monocarboxylic acid transport | IMP | Stolz J (2003)
GO:0006611 | protein export from nucleus IMP | Takeda K et al. (2010)
so can we filter
i) exact duplicates
ii) annotations less specific than the existing annotations
For the non experimental annotations they are filtered by our pipeline, but we have a rule not to filter any experimental annotation.
Having duplicated annotation isn't a show stopper, but it isn't useful and it looks sloppy/confusing to users
The text was updated successfully, but these errors were encountered: