Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand entity support: complexes & families #130

Closed
jvwong opened this issue Mar 24, 2023 · 2 comments
Closed

Expand entity support: complexes & families #130

jvwong opened this issue Mar 24, 2023 · 2 comments
Assignees

Comments

@jvwong
Copy link
Member

jvwong commented Mar 24, 2023

Summary

Goal: Extend grounding support for complexes & families.
Why: Second largest class of Biofactoid entity grounding errors. Ubiquitous in the literature (protein gene - 56%; Family/complex - 17.7%). Been tabled for years.
How: Re-use FamPlex, a curated resource for disambiguation of (human) complexes and families.

Background

It is common for researchers to refer to complexes and members of a family. These authors may not be concerned or possibly even aware of the precise individual component(s) of a complex or member(s) of a family to which they refer, but rather wish to convey information about a general class of function or structure. The result is that authors name entities using these broader terms, with the implicit assumption that there are individual components/members.

Example: NF-κB

Nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) is a protein complex that controls transcription of DNA, cytokine production and cell survival.

There are five proteins in the mammalian NF-κB family

Class Protein Aliases Gene
I NF-κB1 p105 → p50 NFKB1
I NF-κB2 p100 → p52 NFKB2
II RelA p65 RELA
II RelB   RELB
II c-Rel   REL

Various NF-κB complexes

Fig. 1. A general model by which different NF-κB dimers contribute to selectivity of the transcriptional response to an NF-κB-inducing stimulus. The model shown is based on published studies of the selective functions of different NF-κB dimers, as discussed in the text. [Smale, S. T. Dimer-specific regulatory mechanisms within the NF-κB family of transcription factors. Immunol Rev 246, 193–204 (2012).]

Screen Shot 2023-03-24 at 10 39 44 AM

In Biofactoid

Complexes and families represent the second largest class of errors in entity grounding. See PathwayCommons/factoid#1003 (comment).

Example

Screen Shot 2023-03-24 at 10 45 28 AM

NF-κB-p62-NRF2 survival signaling is associated with high ROR1 expression in chronic lymphocytic leukemia. Sanchez-Lopez et al. Cell Death Differ. 2020 Jul;27(7):2206-2216

Screenshot 2023-04-03 at 12 05 21 PM

Phosphorylated RB Promotes Cancer Immunity by Inhibiting NF-κB Activation and PD-L1 Expression. Mol Cell
. 2019 Jan 3;73(1):22-35.e6.

Implementation

FamPlex is a resource that helps improve named entity recognition, grounding, and relationship resolution. The repository provides several comma-separated files that can be used to populate our grounding resource (Table I).

Table I. Relationship between FamPlex data and ground-search fields

FamPlex file. Description Count ground-search field(s)
entities.csv FamPlex namespaced entities 754 name/id.
descriptions.csv Description text 431 summary
grounding_map.csv Synonyms 2163 (FamPlex) synonyms
equivalences.csv Mappings 2489(FamPlex) xrefs
relations.csv Components, members 4711 ?type?

Top entities referenced

Rank Name Count
1 ERK 6301
2 AKT 5839
3 NFkappaB 5768
4 TGFB 2877
5 PI3K 2486
6 JNK 2401
7 p38 2345
8 VEGF 2326
9 Cyclin 2087
10 Wnt 1622
11 Integrins 1498
12 RAS 1402
13 Actin 1299
14 PKC 1234
15 PKA 1058

Caveats

  • Human bias, mostly if not exclusively
  • categorization: "type" is contiguous rather than discrete ((a) family [ERK] (b) family of complexes [NFKappaB]) (c) Unique complex [IL23:IL12B+IL23A] - more the case that something is a family
  • Includes other types to exclude (processes, etc)

References

@jvwong
Copy link
Member Author

jvwong commented May 1, 2023

Entity type

When it comes to integrating with factoid model, it's important to assign some sort of 'type' to a Famplex entity. To attempt this, will we use the Famplex provided relations.csv containing:

  • entities (describing complexes and families)
  • (directed) relationships ('isa' and 'partof') between entities

A first attempt is to follow a simple heuristic: A complex namedComplex is an entity that has some other entity that is partof it. Otherwise, it is a family.


Example below for "AMPK":

AMPK

@jvwong
Copy link
Member Author

jvwong commented May 11, 2023

Turning this into an itemized issue, to stage the changes and reduce risk (in particular, changes in factoid):

  • Add grounding-search as datasource (including tests), but do not expose in aggregate search
  • Add Famplex entities of type family
  • Add Famplex entities of type complex namedComplex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant