In this notebook we do clustering on the FULL version of gene network, and save the clusters for further analysis.  Can be used for other tissues in the same way.

In [4]:
import networkx as nx
import pandas as pd
import numpy as np
import community

    The input file has to describe an undirected, weighted graph. 
    It contains info about the edges, has to be in the following format: 
        3 columns, first two are node IDs, the third one is the weight.
        Node IDs should be Entrez IDs of genes, weights are floats 
        (functional interaction between genes).
        
    Example rows:
        
    Gene1 Gene2 Weight
    9976  9987  0.134438
    998   9986  0.158842
    
    Network used in this Notebook: GIANT Network (Troanskaya Labs - Princeton / Flatiron):
    https://hb.flatironinstitute.org/download
    Top Edges version

In [None]:
# Read GIANT Network for brain tissue - Top Edges
df_giant = pd.read_csv("Data/brain_top", sep='\t', names = ["g1", "g2", "w"])

In [None]:
# build networkx graph
G = nx.Graph()
for row in df_giant.itertuples():
    G.add_edge(row[1], row[2], weight=row[3])

In [None]:
# quick look at the graph
print(nx.is_connected(G))
print(len(G.edges()))
print(len(G.nodes()))

In [None]:
# Louvain community detection - https://python-louvain.readthedocs.io/en/latest/
partition = community.best_partition(G)

In [None]:
for com in set(partition.values()) :
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    
    outstr = "full_giant_louvain/giant-" + str(com) + ".txt"
    
    # uncomment to save gene list into file
    # np.savetxt(outstr, list_nodes, "%i")
    
    # see the size of each partition
    print(str(com) + ": " + str(len(list_nodes)))

