-
Notifications
You must be signed in to change notification settings - Fork 3
Enrichment network structure
This document describes the node attributes that this tool looks at when trying to extract proteins/genes from a network. The actual parsing logic can be found starting in downloadNetworks
Extracting genes/proteins can be done in two ways: Via Node Attributes, or Configuration Mapping. Both are explained below
This is the classic way of denoting nodes that are proteins/genes. In this approach the code looks for specific node attributes. It is assumed the proteins/genes have a format matching this regex: (^[A-Z][A-Z0-9-]*$)|(^C[0-9]+orf[0-9]+$) and can optional have the prefix hgnc.symbol:
To denote a note contains one or more proteins/genes a node attribute of type string named type (case is ignored) must be added to the node with the following possible values (case is ignored):
-
geneorproteinorgeneproduct- Denotes the nodenamecontains a valid gene/protein symbol matching regex noted aboveExample CX2
attributeDeclarationsfrom ErbB1 downstream signaling (v2.0). Thetypenode attribute has a default value ofproteinset inattributeDeclarations:"attributeDeclarations": [ { "nodes": { "represents": { "a": "r", "d": "string" }, "name": { "a": "n", "d": "string" }, "member": { "d": "list_of_string" }, "alias": { "d": "list_of_string" }, "type": { "v": "protein", "d": "string" } }, ...
Example CX2
nodefrom ErbB1 downstream signaling (v2.0) (note attributeDeclarations in this network has default value fortypeset toproteinso the value is not needed unless an overriding value is needed):{ "id": 2, "x": 4340, "y": 4200, "v": { "n": "ACTR3", "r": "uniprot:P32391", "alias": [ "uniprot:P61158", "uniprot:Q53QM2" ] } }, -
complexorproteinfamilyorcompartment- Denotes node is a complex containing more then one protein/gene that is stored in another node attribute of typelist_of_stringnamedmemberthat contains one or more genes/proteins that match regex noted above{ "id": 55, "x": 3990, "y": 3430, "v": { "n": "RAS family", "r": "RAS family", "member": [ "hgnc.symbol:HRAS", "KRAS", "hgnc.symbol:NRAS" ], "type": "proteinfamily" } },
In this approach, a mapping of protein/gene name to node id for each network is provided under xx in json file passed to --dbresults when tool is run in --createdb mode to create the IQuery/Enrichment database
Format:
"networkToGeneToNodeMap": {
"UUID of network on NDEx": {
"GENE/PROTEIN NAME NO hgnc.symbol: prefix": [ LIST OF NODE IDS ]
}
}
Example:
Example:
"networkToGeneToNodeMap": {
"0f066d06-5d8e-11ea-bfdc-0ac135e8bacf": {
"P2RY1": [
8577
],
"GPR84": [
8628, 8690
]
},
}