# Objective: Compute the degree of each community.

If A is an symmetric matrix, where A[i,j] represent the number of edges between community i and community j, and we ignore the edges from the same communities, then 

Degree of Community i = (The sum of the ith row of A) - A[i,i]

In [1]:
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix

def load_sparse_csr(filename):
    loader = np.load(filename)
    return csr_matrix((  loader['data'], loader['indices'], loader['indptr']),
                         shape = loader['shape'])

A=load_sparse_csr("sparse_matrix.npz")

In [2]:
A

<664414x664414 sparse matrix of type '<type 'numpy.int32'>'
	with 270236207 stored elements in Compressed Sparse Row format>

In [3]:
A1 = A          # A1 is A in CSR form : Compressed Sparse Row matrix, efficient row slicing
A2 = A.tocsc()  # A2 is A in CSC form : Compressed Sparse Column matrix, efficient column slicing

In [4]:
A1

<664414x664414 sparse matrix of type '<type 'numpy.int32'>'
	with 270236207 stored elements in Compressed Sparse Row format>

In [5]:
A2

<664414x664414 sparse matrix of type '<type 'numpy.int32'>'
	with 270236207 stored elements in Compressed Sparse Column format>

##  Our A=load_sparse_csr("sparse_matrix.npz") is an upper triangular matrix, instead of constructing a symmetric matrix from A, we will use the following formula to compute the sum of each row.

# For a symmetric matrix A, the sum of each row (excluding the diagonal element) is

Sum of elements in the ith row - A[i,i] = $$\sum_{j\neq i} A_{ij}= \sum_{j>i} A_{ij}+ \sum_{j<i} A_{ij} \\= \sum_{j>i} A_{ij}+ \sum_{i<j} A_{ji}$$

In [6]:
import time
done = 0
degrees = np.zeros((A.shape[0]))
t0=time.time()

for row in xrange(A.shape[0]):
    rowSum = (A1[row,row+1:].sum(1)+A2[:row,row].sum(0)).tolist()  # data type is 1x1 numpy.matrixlib.defmatrix.matrix
    degrees[row] = rowSum[0][0]  
    done +=1
    if done in [10**3, 10**4, 10**5, 2*10**5, 3*10**5, 4*10**5, 5*10**5]:
        print "rows done:", done, "{0:.2f}".format(time.time()-t0), "secs"
        
        
print "rows done:", done, "{0:.2f}".format(time.time()-t0), "secs"

rows done: 1000 0.28 secs
rows done: 10000 2.84 secs
rows done: 100000 28.38 secs
rows done: 200000 56.97 secs
rows done: 300000 85.12 secs
rows done: 400000 113.38 secs
rows done: 500000 141.68 secs
rows done: 664414 187.99 secs


In [7]:
degrees
sort_comm = list(degrees.argsort())
sort_comm.reverse()

print "Top 30 communities with the hightest degree" 
print sort_comm[0:30]   

print "\n Top 5 communities:"
print "community id, degree"
for i in range(5):
    comm_id = sort_comm[i]
    print comm_id, degrees[comm_id]


Top 30 communities with the hightest degree
[664006, 661354, 653762, 652794, 650822, 650830, 650282, 659019, 648906, 661646, 656578, 649876, 660448, 643217, 649277, 646467, 646601, 646457, 655297, 656293, 662541, 647361, 654692, 646443, 644803, 645070, 651333, 655187, 646614, 641906]

 Top 5 communities:
community id, degree
664006 108161320.0
661354 12756652.0
653762 12741425.0
652794 11824516.0
650822 10304786.0
