Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rules for annotating Protein Complexes #1056

Open
pgaudet opened this issue Apr 11, 2019 · 4 comments
Open

New rules for annotating Protein Complexes #1056

pgaudet opened this issue Apr 11, 2019 · 4 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Apr 11, 2019

From the 2019-04 GOC meeting in Cambridge, these are new proposed rules for annotation to Protein Complex binding:
(NOTE: CP : Complex portal)

  1. If we know GP1 binds the Cpx1 (e.g. complexes is identified as a whole or binding is via a composite binding site):
  • GP1 | protein-containing complex binding | IPI | with/from "CP AC for Cpx1" has_direct_input "CP AC of Cpx1"
  1. If we know GP1 binds GP2 where GP2 is part of Cpx1 (e.g. binary link has been shown but complex evidence is also in the experiment):
  • GP1 | protein-containing complex binding | IPI | with/from "UniProt/MOD AC of GP2" has_direct_input "CP AC of Cpx1"
  • With/from could contain a list of binding partners.
    ** NB: The complex AC goes in the AE because the AE extends the term and contains the physiological partner of the GP [optional].
    ** NB: “complex” can be replaced with other types of GPs or list of GPs
    ** CP will provide “rapid curation” ACs in future → request via YELLOW BUTTON on CP homepage (manually change resource from IntAct to CP)

@bmeldal and @pgaudet to formalize these rules

@bmeldal
Copy link
Contributor

bmeldal commented Apr 16, 2019

@pgaudet
Copy link
Contributor Author

pgaudet commented Apr 16, 2019

Sure, please do ! It dates form 2015.

@bmeldal
Copy link
Contributor

bmeldal commented Apr 23, 2019

New annotation rules and annotation reviews required:

More details in mtg minutes:

General notes:

  • CP AC = Complex Portal accession number.
  • CP ACs can be requested via YELLOW BUTTON on CP homepage (manually change resource in drop-down menu from IntAct to Complex Portal)
  • When annotating binding to a complex we always assume GP1 is NOT part of that complex but a binding partner in some process.

General actions:

  • New Rules: probably need a warning in syntax check for historic annotations as we won’t update them all immediately, esp adding “missing with/from fields and AEs”.
  • @alexsign : implement new rules in P2GO for GO:0044877 protein-containing complex binding (with/from) and GO:0065003 protein-containing complex assembly (AE has output)
  • @alexsign : Add Warning to Reports for GO:0044877 protein-containing complex binding (with/from) and GO:0065003 protein-containing complex assembly (AE has output) (as there will still be 100s of historic annotations)
  • develop strategy to replace historic annotations (@sylvainpoux @pgaudet @pgarmiri ?)

1. Guidelines for annotation: GP1 part_of complex:

  • @bmeldal to prepare a simplified version of the CP annotation guidelines for GO
  • do we need New Guidelines?: If GP1 is a component of Cpx1 annotate directly to the Complex:
    GP1 | part_of | GO complex term | IDA (or IPI?).
  • What evidence code should be used, IDA or IPI ? Note that CP requires IPI evidence to annotate a complex.
  • Request new GO complex term if no appropriate term is available

2. [GP1] | “x complex binding”

Discussion

  • do not use any more
  • instead, use GO:0044877 protein-containing complex binding with partner in with/from field, see further rules below.

Usage of GO:0044877 protein-containing complex binding (as of 18/4/19) – incl all children:

  • Total no of annotations to GO:0044877 protein-containing complex binding: 337,563
  • About 70 children of GO:0044877 protein-containing complex binding
  • Total no of manual annotations: 16,436 (incl 11,442 IBA)
  • Total no of manual experimental annotations: 2603 (IPI, IDA, IMP, IGI, )
  • Total no of manual annotations with AE: 196
  • Total no of ISS annotations: 2199 (1182x ISO, 1005x ISS, 12x ISA)

Annotation review:

  • obsolete all terms of this pattern and replace with with/from annotations as per new rules below
  • Request new GO complex term if no appropriate term is available
  • @gpaudet to raise annotation review ticket

3. colocalizes_with:

  • Discussion/Review

  • We decided that this qualifier should no longer be used with protein-containing complex

  • accept New Rule GORULE:0000035 for: "Colocalizes_with qualifier not allowed with protein-containing complex (GO:0032991) and children."

Usage of colocalizes_with GO:0032991 protein-containing complex (as of 18/4/19) – incl children:

  • Total no of annotations: 4284
  • Total no of manual annotations: 4284 (incl 2752 IBA)
  • Total no of manual experimental annotations: 561 (531 IDA, 23x IPI (NB: this is direct binding!), 6x IMP, 1x IGI)
  • Total no of ISS annotations: 957 (367x ISO, 590x ISS)

Annotation review:

PROTEIN-CONTAINING COMPLEX BINDING:

Discussion

Usage of GO:0044877 protein-containing complex binding (as of 18/4/19) – exact term:

  • Total no of annotations to GO:0044877 protein-containing complex binding: 12,428
  • Total no of manual annotations: 1512 (incl 235 IBA)
  • Total no of manual experimental annotations: 672 (IPI, IDA, IMP)
  • Total no of manual annotations with AE: 139
  • Total no of ISS annotations: 177

4. IPI: GP1-Cpx1 or Cpx2-Cpx3:
If we know GP1 binds Cpx1 or Cpx2 binds Cpx3 (e.g. complex is identified as a whole or binding is via a composite binding site):

GP1 | protein-containing complex binding | IPI | with/from "CP AC for Cpx1" has_direct_input "CP AC of Cpx1"
or
CP AC of Cpx2 | protein-containing complex binding | IPI | CP AC of Cpx3 has_direct_input " CP AC of Cpx3"

  • The complex AC goes in the AE because the AE extends the term and contains the physiological partner of the GP (AE is optional but longterm should be systematically added).

  • New Rule 1: When annotating a GP or complex to protein-containing complex binding by IPI the CP AC of the complex that is being bound to MUST be added in the with/from field. Optionally, the CP AC can additionally be added to the AE with qualifier has_direct_input.

5. IPI: GP1-[GP2 part_of Cpx1]:
If we know GP1 binds GP2 where GP2 is part of Cpx1 (e.g. complex component(s) is/are identified or binary link has been shown but complex evidence is also in the experiment):

GP1 | protein-containing complex binding | IPI | with/from "UniProt/MOD AC of GP2 " has_direct_input "CP AC of Cpx1"

  • with/from could also contain (a list of) binding partners from various sources, incl. ChEBI or CP ACs if binding a subcomplex.

  • with/from should be as specific to the immediate binding partner(s) as possible.

  • The complex AC goes in the AE because the AE extends the term and contains the physiological partner of the GP (AE is optional but longterm should be systematically added).

  • New Rule 2: If a GP binds to another GP (GP2 or a list of GPs) that has been identified as part of a complex, the UniProt/MOD/ChEBI AC/CP AC of GP2 (or the list) must be entered in the with/from field. GP2(s) must also be directly annotated to Cpx1 (see New Rule above). Optionally, the CP AC can additionally be added to the AE with qualifier has_direct_input.

Annotation review (pts 4 & 5):
368 IPI annotations, only 20 with AEs:

  • update GO complex terms in with/from and AE to CP ACs (other types of AEs will remain as is).

  • add CP ACs in AE where missing. (too much work?)

  • review 1 annotation (without AE) between 2 complexes: curiously it’s between the same complex (BHF-UCL).

  • review if the binding partners (GP2s) are directly annotated to the known complex and add annotations if missing.

  • @pgaudet to raise review ticket

  • Note: for the 20 annotations with AEs, with/from either contains UniProt or CP AC; the others have a variety of ACs

  • Note: AEs have a mix of has_(direct)_input “isoform AC/GO complex term/CP AC”, occurs_in “cell type or GO CC[location]”, part_of “GO BP”, happens_during “GO BP”)

6. ISS/ISO:

GP1(sp2) | protein-containing complex binding | ISS/ISO | with/from "GP1(sp1)” has_direct_input "CP AC Cpx1(sp2)"
or
Cpx2(sp2) | protein-containing complex binding | ISS/ISO | with/from "Cpx2(sp1)” has_direct_input "CP AC Cpx3(sp2)"

  • The complex AC in the annotation extension relates to the annotation object as GP1(sp2)/Cpx2(sp2) is the one binding Cpx1(sp2)/Cpx3(sp2).

  • The complex AC goes in the AE because the AE extends the term and contains the physiological partner of the GP [AE is optional but longterm should be systematically added].

  • New Rule 3: When inferring protein-containing complex binding by ISS/ISO the CP AC of the complex that is being bound to MUST be added in the annotation extension. The species of the annotated GP/Cpx and its complex binding partner must be the same.

Annotation review:
177 ISS annotations (42 with AEs) and 426 ISO annotations (no AEs):

  • update GO complex terms in AE to CP ACs.
  • add CP ACs in AE where missing or remove annotations? ISS/ISO annotations without binding partner make no sense.
  • @pgaudet to raise review ticket

Note:

  • For the 42 ISS annotations with AEs, with/from contains UniProt; the others have a variety of ACs
  • AEs have a mixture of has_(direct)_input “GO complex term/CP AC/isoform AC” and part_of “GO BP”

Other evidence codes used but not yet discussed:

7. 291 IDA annotations:

  • none with value in with/from!
  • 70 annotations with binding partner in AEs: has_(direct)_input “GO complex term/CP AC”
  • other EAs: occurs_in “cell type”, part_of “GO BP”, has_regulation_target “GO complex term”
  • Questions: shouldn’t they be IPI with partner in with/from? Why use IDA if binding observed?

If IDA use is valid:

  • New Rule 4: When annotating a GP (or complex) to protein-containing complex binding by IDA the CP AC of the complex that is being bound to MUST be added in the with/from field. Optionally, the CP AC can additionally be added to the AE with qualifier has_direct_input.

Annotation review:

  • update GO complex terms in AE to CP ACs.
  • copy CP AC from AE to with/from
  • add CP ACs in with/from where missing or remove annotations? Does IDA without binding partner make any sense? Seems common-ish practice. Could they be misannotated and be of type “GP part_of GO complex”, in which case it should be annotated that way (see pt 1)?
  • If IDA use not valid update all tickets to match rules in pts 4 & 5.
  • @pgaudet to raise review ticket

8. 13 IMP annotations:

  • none with value in with/from!
  • 5 annotations with binding partner in AEs: has_(direct)_input GO complex term

Annotation review:

  • same as for IDA?
  • @pgaudet to raise review ticket

9. 1 IC annotation:

  • with/from “GO:0005515 protein binding”
  • AE has_direct_input GO complex term

Annotation review:

10. 1 TAS annotation:

  • no value in with/from!
  • AE has_input GO complex term

Annotation review:

I hope that's all. Please discuss!

Birgit

@kltm kltm added the revisit label Nov 19, 2019
@kltm
Copy link
Member

kltm commented Nov 19, 2019

Noting from conversation with @pgaudet and @dougli1sqrd , we are currently holding on further action here.

@pgaudet pgaudet moved this from TODO to To spec out & prioritize in GORULES (low-hanging fruit) Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
GORULES (low-hanging fruit)
  
To spec out & prioritize
Development

No branches or pull requests

3 participants