## Normalization of Node Identifiers 

This notebook demonstrates the normalization of the node identifiers in a set of networks. A set of example networks is cloned to your account and then each network is updated in turn. The normalization of the identifiers is performed using the mygene.info resource. The example networks are copies of selected NCI-PID networks, made in 2017.

The tutorial demonstrates
* Using Network Sets
* Cloning networks
* Using the mygene.info resource
* Updating networks
* Use of the @context aspect of a network

#### Updates to Each Node:

Identifiers are formatted using standard prefixes and the namespaces used are defined in the @context aspect of the network. 
* Node "name" = HGNC gene symbol (without prefix) i.e. **TP53**
* Node "represents" = NCBI gene id, i.e. **ncbigene:7689**
* The *former* node name is added to the values of the node attribute "alias" (i.e. the aliases)
* The HGNC gene symbol *with* prefix is added the aliases
* Uniprot identifiers in the aliases are updated to use the standard "uniprot" prefix


### Import Packages

In [None]:
import ndex2
import json
import requests
from os.path import isfile, expanduser


### NDEx Credentials

Get the username and password to access your account from ndex_tutorial_config.json in your home directory. This file should have the following structure:

    {
      "username" : "<my_username>",
      "password" : "<my_password>"
    }


In [None]:
config_file = expanduser("~/ndex_tutorial_config.json")
my_username = None
my_password = None
my_server = 'public.ndexbio.org'

if(isfile(config_file)):
    file = open(config_file, "r")
    data = json.load(file)
    file.close()
    if data.get("password") and data.get("username"):
        my_username = data.get("username")
        my_password = data.get("password")
    else:
        print("Error: " + config_file + " does not define username and password")
else:
    print("Error: " + config_file + " was not found")

### Get the Example Network Set by UUID

In [None]:
set_uuid = None # get uuids from set

network_set = ndex2.get_network_set(set_uuid)

uuids = ["6e798e11-6186-11e5-8ac5-06603eb7f303"] # for one or more individually specified networks

# if set_uuid: --- TBD when 

    

### Functions to Access mygene.info and Update Nodes

In [None]:
def query_mygene_x(q, tax_id='9606', entrezonly=True):
    if entrezonly:
        r = requests.get('http://mygene.info/v3/query?q='+q+'&species='+tax_id+'&entrezonly=true')
    else:
        r = requests.get('http://mygene.info/v3/query?q='+q+'&species='+tax_id)
    result = r.json()
    hits = result.get("hits")
    if hits and len(hits) > 0:
        return hits[0]
    return False

def query_batch(query_string, tax_id='9606', scopes="symbol, entrezgene, alias, uniprot", fields="symbol, entrezgene"):
    data = {'species': tax_id,
            'scopes': scopes,
            'fields': fields,
            'q': query_string}
    r = requests.post('http://mygene.info/v3/query', data)
    json = r.json()
    return json

def query_mygene(q):
    hits = query_batch(q)
    for hit in hits:
        symbol = hit.get('symbol')
        id = hit.get('entrezgene')
        if symbol and id:
            return (symbol, id)
    return None

# per node update method
def update_node (node, nicecx):
    print("\nnode %s" % node.get_name())
    aliases = nicecx.get_node_attribute(node, "alias")
    #print("aliases: %s" % aliases)
    # if aliases:
        # aliases.push(name)
    # else:
        # aliases = [name]
    
    hit = query_mygene(node.get_name())
    if hit:
        print("hit: %s" % json.dumps(hit, indent=4))
    else:
        succeed = False
        for alias in aliases:
            # assume uniprot
            id = alias.split(':')[-1]
            hit = query_mygene(id)
            if hit:
                print("hit: %s" % json.dumps(hit, indent=4))
                succeed = True
                break
        if not succeed:
            print("no gene hit for node %s " % node.get_name())

In [None]:
# TBD: create output network set

# iteration over networks
for network_uuid in uuids:
    # load network in NiceCX
    ncx = ndex2.create_nice_cx_from_server(server=my_server, uuid=network_uuid)
    for id, node in ncx.get_nodes():
        update_node(node, ncx)
    # output network (TBD: in output set)
    #print("writing %s " % ncx.get_name())
    #ncx.upload_to(my_server, my_username, my_password)
    