# Load an edge table of gene similarities into NDEx

This tutorial shows how to convert an edge table containing similarity weights for genes with the following format:

```Bash
SOURCE TARGET WEIGHT
GENE1  GENE2  0.123
GENE1  GENE3  0.144
.
.
```

to an adjacency matrix as well as how to upload this data to NDEx. 

**WARNING:** Large tables take lots of ram. For example running this workflow with an edge table with 180 million rows containing around 19,000 genes will consume **20-25** gigabytes of ram and will take 10 - 20 minutes to run

This tutorial requires that the following packages are installed

* ddot
* ndex2
* simplejson
* pandas
* numpy


# Import needed modules

In [None]:
import os
import sys
import getpass

import ddot
import ndex2
import pandas as pd
import numpy as np

# Load edge table via Pandas

Enter path to edge table file. It is assumed this file has the following header line: `SOURCE TARGET WEIGHT`

Example: `/tmp/foo.tsv`

In [None]:
# sys.version_info gets the version of python, needed to use correct call to get user input
sys.stdout.write('Enter path to edge table file:\n')
if sys.version_info[0] >= 3:
    edgetable = os.path.abspath(input())
else:
    edgetable = os.path.abspath(raw_input())


In [None]:
sys.stdout.write('Edge table file set to: ' + edgetable + '\n')

In [None]:
# Loads table into Pandas, for a table of 180 million rows this takes a minute or two and
# requires ~6gb of ram
df = pd.read_csv(edgetable, sep='\t', dtype={'WEIGHT': np.float64})

df.head()

In [None]:
# using the pandas call pivot, convert the data into an adjacency matrix
# this easily doubles memory usage
pivotdf = df.pivot(index='SOURCE',columns='TARGET', values='WEIGHT')

# delete the original data frame since its no longer needed
del df

pivotdf.head()

# Extract gene names

To upload the matrix to NDEx the gene names in the rows and columns must be extracted and
put into separate lists

In [None]:
gene_rows = list(pivotdf.keys())
gene_cols = list(pivotdf.index.values)
print('# genes in rows: ' + str(len(gene_rows)))

print('# genes in columns: ' + str(len(gene_cols)))

# Create NDEx NiceCXNetwork object

NDEx utilizes CX format for storage of data. The next command converts the matrix data into this format

In [None]:
network = ddot.utils.create_edgeMatrix(pivotdf.values, gene_cols, gene_rows,verbose=True,ndex2=True)

# sets the name of the network
network.set_name('test similarity network')
network

# Get NDEX credentials

In [None]:
sys.stdout.write('Enter NDEx username:\n')
if sys.version_info[0] >= 3:
    user = input()
else:
    user = raw_input()

sys.stdout.write('Enter NDEx password:\n')
password = getpass.getpass()


In [None]:
# NDEx server to use for production use public.ndexbio.org
server_url = 'test.ndexbio.org'

res = network.upload_to(server_url, user, password)

sys.stdout.write('If successful the value below will be low level URL\n')
sys.stdout.write('The network will be private by default so\n')
sys.stdout.write('to see the network visit http://' + server_url + ' and login with user account entered earlier\n')

res

In [None]:
print('Tutorial complete. Have a nice day.')