### This file
- Imports two dictionaries:
    - page_likes
    - page_categories
- It orders them and checks whether they have the same keys in the same order.
- Iterates through the pages and and pairs each page with every other. Finds the number of common likes, puts their numbers to a list and records their position in the matrix as lists of raw and column indices.
- It uses the three lists to construct a COO sparse matrix.

In [2]:
import pickle
import numpy as np
from scipy.sparse import coo_matrix
%matplotlib inline
import matplotlib.pyplot as plt

<br>Import '*page_likes*' dictionary. Sort by keys.

In [3]:
f = open("page_likes.pkl", "rb")
temp = pickle.load(f)
f.close()

In [3]:
d = sorted(temp.keys())

In [4]:
page_likes = {}

for i in range(0, len(d)):
    key = d[i]
    page_likes[key] = temp[key]

<br>Import '*page_categories*' dictionaries. Sort by keys.

In [5]:
g = open("page_categories.pkl", "rb")
temp = pickle.load(g)
g.close()

In [6]:
e = sorted(temp.keys())

<br>Compare keys. It will be important later to compare the sparse matrix and the labels. 

In [7]:
diff = 0
for i in range(0, len(d)):
    if e[i] != d[i]:
        diff = diff + 1
diff

0

The two dictionaires have the same keys in the same order. 

<br> 
### Building the sparse matrix

We iterate through the dictionary, pairing each key with every other key. This will result in 8743 x 8743 pairs. In all pairs, we find the common set of user_ids and check the size of this set. If the size is larger than zeo (the set is not empty) we add the size of this set to the list of sparse matrix elements, and its location (raw and column index) is recorded in the list of *row_indices* and *column_indices*. The location of the element will be the location of the page_ids in the page_id list. 

Since the matrix is symmetric, in order to shorten computation time, only the upper triangle is constructed through the paring process. The lists of the row and column indices and the list of the matrix elements are then manipulated to get the inputs for the whole sparse matrix components. 

In [None]:
r = 0 # the index of the row of the matrix
c = 0 # the index of the columns of the matrix
row_indices = []
column_indices = []
matrix_elements = []

for key_r in page_likes.keys(): # key_r is the key for the rows
    x = page_likes[key_r]
    for key_c in page_likes.keys(): # key_c is the key for the columns
        if c >= r:
            y = page_likes[key_c]
            common_set = [i for i in x if i in y]
            common_set_size = len(common_set)
            if common_set_size > 0:
                row_indices.append(r)
                column_indices.append(c)
                matrix_elements.append(common_set_size)
        c = c + 1
    r = r + 1
    c = 0

Saving the lists for further use.

In [None]:
np.savetxt("sparse_mx_row_indices.csv", row_indices, delimiter=',')
np.savetxt("sparse_mx_column_indices.csv", column_indices, delimiter=',')
np.savetxt("sparse_mx_matrix_elements.csv", matrix_elements, delimiter=',')

The resulting lists: row_indices[ ], column_indices[ ] and matrix_elements[ ] add up to the upper triangle matrix only. Further manipulation is needed to get the parameters for the complete sparse matrix. 

In [None]:
row_indices_2 = row_indices
column_indices_2 = column_indices
matrix_elements_2 = matrix_elements

for i in range(0, len(row_indices)):
    if row_indices[i] != column_indices[i]:
        column_indices_2.append(row_indices[i])
        row_indices_2.append(column_indices[i])
        matrix_elements_2.append(matrix_elements[i])

Build the complete sparse matrix in COO format.

In [18]:
row2  = np.array(row_indices_2)
col2  = np.array(column_indices_2)
val2 = np.array(matrix_elements_2)
mx = coo_matrix(data, (row, column), shape=(8743, 8743))