Skip to content

Enrichment network structure

Chris Churas edited this page Aug 19, 2024 · 7 revisions

This document describes the node attributes that this tool looks at when trying to extract proteins/genes from a network. The actual parsing logic can be found starting in downloadNetworks

Extracting genes/proteins can be done in two ways: Via Node Attributes, or Configuration Mapping. Both are explained below

Via Node Attributes (Preferred method)

This is the classic way of denoting nodes that are proteins/genes. In this approach the code looks for specific node attributes. It is assumed the proteins/genes have a format matching this regex: (^[A-Z][A-Z0-9-]*$)|(^C[0-9]+orf[0-9]+$) and can optional have the prefix hgnc.symbol:

Type node attribute

To denote a note contains one or more proteins/genes a node attribute of type string named type (case is ignored) must be added to the node with the following possible values (case is ignored):

  • gene or protein or geneproduct - Denotes the node name contains a valid gene/protein symbol matching regex noted above

    Example CX2 attributeDeclarations from ErbB1 downstream signaling (v2.0). The type node attribute has a default value of protein set in attributeDeclarations:

         "attributeDeclarations": [
       {
         "nodes": {
           "represents": {
             "a": "r",
             "d": "string"
           },
           "name": {
             "a": "n",
             "d": "string"
           },
           "member": {
             "d": "list_of_string"
           },
           "alias": {
             "d": "list_of_string"
           },
           "type": {
             "v": "protein",
             "d": "string"
           }
         }, ...

    Example CX2 node from ErbB1 downstream signaling (v2.0) (note attributeDeclarations in this network has default value for type set to protein so the value is not needed unless an overriding value is needed):

        {
         "id": 2,
         "x": 4340,
         "y": 4200,
         "v": {
           "n": "ACTR3",
           "r": "uniprot:P32391",
           "alias": [
             "uniprot:P61158",
             "uniprot:Q53QM2"
           ]
         }
       },
  • complex or proteinfamily or compartment - Denotes node is a complex containing more then one protein/gene that is stored in another node attribute of type list_of_string named member that contains one or more genes/proteins that match regex noted above

       {
         "id": 55,
         "x": 3990,
         "y": 3430,
         "v": {
           "n": "RAS family",
           "r": "RAS family",
           "member": [
             "hgnc.symbol:HRAS",
             "KRAS",
             "hgnc.symbol:NRAS"
           ],
           "type": "proteinfamily"
         }
       },

Configuration Mapping

In this approach, a mapping of protein/gene name to node id for each network is provided under xx in json file passed to --dbresults when tool is run in --createdb mode to create the IQuery/Enrichment database

Format:

"networkToGeneToNodeMap": {
    "UUID of network on NDEx": {
       "GENE/PROTEIN NAME NO hgnc.symbol: prefix": [ LIST OF NODE IDS ]
    }
}

Example:

Example:

"networkToGeneToNodeMap": {
    "0f066d06-5d8e-11ea-bfdc-0ac135e8bacf": {
      "P2RY1": [
        8577
      ],
      "GPR84": [
        8628, 8690
      ]
     },
}

Clone this wiki locally