## NCI-PID Node Identifier Reorganization

This notebook modifies the nodes in a network (or networks) in a set of the NCI-PID pathway networks as they were structured in November 2017

This task was motivated by the needs of the CRAVAT/MuPIT application and their copy of the NCI-PID pathways.

(But it is a general improvement for these networks and probably for other PathwayCommons EBS derived networks)

The changes for each node representing:

* Node name = HUGO gene symbol
* represents attribute = NCBI gene id
* Make the former name be an alias


<hr>
Imports, username, password

In [1]:
import ndex2
import json
import requests
import ndex2.client as nc

my_username = "drh"
my_password = "drh"
my_server = 'public.ndexbio.org'

Network or NetworkSet UUID

In [4]:
# TODO: get UUIDs from a network set 

# network set --> 71cde621-deb7-11e7-adc1-0ac135e8bacf
# 26e3478c-deb7-11e7-adc1-0ac135e8bacf

username = 'scratch'
password = 'scratch'
server = 'http://public.ndexbio.org'
my_network_set = '71cde621-deb7-11e7-adc1-0ac135e8bacf'

ndex2_client = nc.Ndex2(host=server, username=username, password=password, debug=True)
set_response = ndex2_client.get_network_set(my_network_set)
uuids = set_response.get('networks') # for one or more individually specified networks
    

GET route: http://public.ndexbio.org/v2/networkset/71cde621-deb7-11e7-adc1-0ac135e8bacf
status code: 200


mygene.info access function

In [10]:
def query_mygene_x(q, tax_id='9606', entrezonly=True):
    if entrezonly:
        r = requests.get('http://mygene.info/v3/query?q='+q+'&species='+tax_id+'&entrezonly=true')
    else:
        r = requests.get('http://mygene.info/v3/query?q='+q+'&species='+tax_id)
    result = r.json()
    hits = result.get("hits")
    if hits and len(hits) > 0:
        return hits[0]
    return False

def query_batch(query_string, tax_id='9606', scopes="symbol, entrezgene, alias, uniprot", fields="symbol, entrezgene"):
    data = {'species': tax_id,
            'scopes': scopes,
            'fields': fields,
            'q': query_string}
    r = requests.post('http://mygene.info/v3/query', data)
    json = r.json()
    return json

def query_mygene(q):
    hits = query_batch(q)
    for hit in hits:
        symbol = hit.get('symbol')
        id = hit.get('entrezgene')
        if symbol and id:
            return (symbol, id)
    return None

# per node update method
def update_node (node, nicecx):
    print("\nnode %s" % node.get_name())
    aliases = nicecx.get_node_attribute(node, "alias")
    print("aliases: %s" % aliases)
    # if aliases:
        # aliases.push(name)
    # else:
        # aliases = [name]
    
    hit = query_mygene(node.get_name())
    if hit:
        print("hit: %s" % json.dumps(hit, indent=4))
    else:
        succeed = False
        for alias in aliases:
            # assume uniprot
            id = alias.split(':')[-1]
            symbol, gene_id = query_mygene(id)
            if symbol:
                print("hit: %s" % json.dumps(hit, indent=4))
                succeed = True
                node.set_node_represents(symbol)
                break
        if not succeed:
            print("no gene hit for node %s " % node.get_name())

In [11]:
# TBD: create output network set
# HUGO example: hgnc.symbol:tp53
# Entrez NCBI example: ncbigene:7157
# iteration over networks
for network_uuid in uuids:
    # load network in NiceCX
    ncx = ndex2.create_nice_cx_from_server(server=my_server, uuid=network_uuid)
    for id, node in ncx.get_nodes():
        update_node(node, ncx)
    # output network (TBD: in output set)
    #print("writing %s " % ncx.get_name())
    #ncx.upload_to(my_server, my_username, my_password)
print(ncx.to_cx())
    


node GTP
aliases: [u'cas:86-01-1']
hit: [
    "MTG1", 
    92170
]

node RHOA
aliases: [u'uniprot knowledgebase:P06749', u'uniprot knowledgebase:P61586', u'uniprot knowledgebase:Q53HM4', u'uniprot knowledgebase:Q5U024', u'uniprot knowledgebase:Q9UDJ0', u'uniprot knowledgebase:Q9UEJ4']
hit: [
    "RHOA", 
    387
]

node VCAM1
aliases: [u'uniprot knowledgebase:A8K6R7', u'uniprot knowledgebase:B4DKS4', u'uniprot knowledgebase:E9PDD1', u'uniprot knowledgebase:P19320', u'uniprot knowledgebase:Q6NUP8']
hit: [
    "VCAM1", 
    7412
]

node FAK1
aliases: [u'uniprot knowledgebase:B4E2N6', u'uniprot knowledgebase:F5H4S4', u'uniprot knowledgebase:J3QT16', u'uniprot knowledgebase:Q05397', u'uniprot knowledgebase:Q14291', u'uniprot knowledgebase:Q8IYN9', u'uniprot knowledgebase:Q9UD85']
hit: [
    "PTK2", 
    5747
]

node ITB1
aliases: [u'uniprot knowledgebase:A8K6N2', u'uniprot knowledgebase:D3DRX9', u'uniprot knowledgebase:D3DRY3', u'uniprot knowledgebase:D3DRY4', u'uniprot knowledgebase:D3DR

TypeError: 'NoneType' object is not iterable