## Multilevel Clustering Technique (MCT)
The following sections describe the implementation of the *multilevel clustering technique* for detecting *microcosms* on Twitter. The focus is on providing an answer to the question: what is the probability of detecting ***microcosms*** given a collection of a network data $\mathcal{D}$ such that 
    $$p(structural~equivalence, textual~equivalence|\mathcal{D}) = p(\mathcal{S,T}|\mathcal{D})$$
The implementation is carried-out in two parts: identifying **Structural** and **Textual clusters** 

### I.Structurally-related clusters

Given a finite collection of network data expressed in a window of tweets $w_z^k$, where $k$ identifies the window (the index of the window) and $z$ the size of the size, e.g. 300 network objects. The window of tweets is a subset of $\mathcal{D}$ such that $\mathcal{w}_z^k \subset \mathcal{D}$
#### Implementation:

    returns the likelihood of reciprocity between pairs

 - ***initialisation:*** $\{\} \longleftarrow \mathcal{S}_r; \{\} \longleftarrow \mathcal{S}_u$
 - ***input:*** a finite collection of network data $\mathcal{D} \neq \emptyset$
 - $\forall v_i, v_j \in \mathcal{D}$, compute $p(R_{v_i,v_j}) \hspace{28mm} \triangleright~ v_i\neq v_j$ 
 - if $p(R_{v_i,v_j}) \geq \tau \hspace{58mm} \triangleright ~\tau$, *a predefined threshold*
     - $\mathcal{S}_r \gets (v_i,v_j) \hspace{54mm} \triangleright$ *structurally-related*
 - else:
     - $\mathcal{S}_u\gets (v_i,v_j) \hspace{54mm} \triangleright$ *structurally-unrelated*
 - ***output:***
     - $\mathcal{S}_r,\mathcal{S}_u, \mathcal{M}_{sa}^{n\times n}\hspace{52mm}\triangleright~$ *adjacency matrix* 

**1: Load Relevant Packages**

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.ticker
import seaborn as sns
import pandas as pd
import numpy as np
import itertools
from itertools import zip_longest as zipper
from collections import Counter,defaultdict
import tweepy
from mlxtend.plotting import ecdf
import time, json, re, math
from scipy.stats import logistic
import scipy.sparse
#suppress warnings:
import warnings
warnings.filterwarnings('ignore')

**2: Datasets and Data Prep:** network dataset $\mathcal{D}$ consisting of all relevant features denoted by $\mathcal{A}_f$, and subset features by $\chi_f$

    data prep required only if the data is not in the proper format for the analysis

In [49]:
D = pd.read_csv('data/data_for_computing_structural_equivalence.csv',low_memory=False)
X = X.dropna()
len(X), D.columns

(437521, Index(['UserID', 'ScreenName', 'RawTweets', 'Indegree', 'Outdegree',
        'Category'],
       dtype='object'))

    Relevant Stats: The dataset consists of 438044 nodes. Each of the node in the dataset is compared with other nodes to identify nodes with high degree of structural similarity.

**Load preprocessed data for the analysis**

**Some Helper Functions**

In [2]:
# a simple function that returns the correct designation based on the ratios/values of correponding features:
def get_abs_values(x): # this function applies to indegree and outdegree only ... 
    if x >= 0.75 and x <= 1.25:
        return 1
    else:
        return 0
# function to compute the Jaccard similarity:
def jaccard_sim(l):
    a = [i for i in l if i>0]
    b = [i for i in l if i<1]
    return (len(a)/(len(a)+len(b)))
# binning communities to a high-level structure acording:
comm_bands = np.linspace(0.75, 1.25) # create evely spaced-communities
# account for outliers/very high values:
comm_bands = np.append(comm_bands, [5,10,50,100,150,300,700,1000])
# return the closest value to each computed values about reciprocity:
#dk['CommBand']=dk.RecipComm.apply(lambda x: min(comm_bands, key=lambda v: abs(v-x)))

**extract stucturally-related and unrelated nodes**

    takes chunks of data and returns probable dyads

In [72]:
# code segment to return the probability of a reciprocal tie between pair of nodes based on structural similariy

SP = {'Va_ID':[],'Va_Name':[],'Vi_ID':[],'Vi_Name':[],'JSim_Vai':[],'Prob_Tie':[],'Dyads':[],'NetSize':[],\
      'RecipComm':[],'Ind_Vai':[],'Out_Vai':[],'Cat_Vai':[],'AbsInd_Vai':[],'AbsOut_Vai':[],'AbsCat_Vai':[]}
               
# a loop for the anchor node is needed such that after full iteration, another node is selected as the anchor node:
for r in range(len(X)-2):
    va_uid,va_name,va_ind = X.iloc[r].UserID, X.iloc[r].ScreenName, int(X.iloc[r].Indegree)
    va_out,va_cat = int(X.iloc[r].Outdegree), bool(X.iloc[r].Category)
    netsize = (va_ind+va_out)
    try:
        for i in range(r+1, len(X)+1): # keep increasing the node for comparison until the end ... 

                vi_uid, vi_name, vi_ind = X.iloc[i].UserID, X.iloc[i].ScreenName, int(X.iloc[i].Indegree)
                vi_out, vi_cat = int(X.iloc[i].Outdegree), bool(X.iloc[i].Category)
        
                # compute the ratios of features components, see text for details:
                Ind_Vai = va_ind/vi_ind
                Out_Vai = va_out/vi_out
                        # evaluate the boolean values and assign numeric values - true(1) and false (0)
                if (va_cat) | (vi_cat) == True: # evaluate boolean ...and assign numeric values ... 
                    Cat_Vai = 1
                else:
                    Cat_Vai = 0  
                # extract subset of features to compute the final similarity based on Jaccard index:
                AbsInd_Vai = get_abs_values(Ind_Vai)
                AbsOut_Vai = get_abs_values(Out_Vai)
                AbsCat_Vai = Cat_Vai
                # a high-level reciprocal communities
                recip_comm = (Ind_Vai + Out_Vai)/2#+Cat_Vai)/3 
                # compute the J-sim and assign to the dataframe:
                JSim_Vai = jaccard_sim([AbsInd_Vai,AbsOut_Vai,AbsCat_Vai])
                
                # CONSTANT ERROR AND PROBABLE RECIPROCITY: expressed as a function of the similaity value:
                Error_Vai = round((1/(1+np.exp(1+np.log(JSim_Vai + 0.3)*(0.3)))),3)
                
                # comput phi i.e. the related terms ... 
                SimAndError_Vai = round(-np.log(Error_Vai + JSim_Vai) * (Error_Vai + JSim_Vai),3)
                Prob_Tie =  1/(1+np.exp(SimAndError_Vai))
                #Denote pairs as dyads (1) or otherwise (0) based on a threshold, 0.45 in this instance
                if Prob_Tie > 0.45:
                    Dyads = 1
                else:
                    Dyads = 0
                netsize = va_ind+va_out
                
                #update data structure:
                SP['Va_ID'].append(va_uid), SP['Va_Name'].append(va_name),SP['Vi_ID'].append(vi_uid)
                SP['Vi_Name'].append(vi_name), SP['JSim_Vai'].append(JSim_Vai), SP['Prob_Tie'].append(Prob_Tie)
                SP['Dyads'].append(Dyads)
                SP['NetSize'].append(netsize)
                SP['RecipComm'].append(recip_comm)
                SP['Ind_Vai'].append(Ind_Vai)
                SP['Out_Vai'].append(Out_Vai)
                SP['Cat_Vai'].append(Cat_Vai)
                SP['AbsInd_Vai'].append(AbsInd_Vai)
                SP['AbsOut_Vai'].append(AbsOut_Vai)
                SP['AbsCat_Vai'].append(AbsCat_Vai)
    except:
        continue

In [73]:
# convert output to a dataframe object:
dk = pd.DataFrame(SP)
len(dk), len(set(dk.Va_Name))

(664877, 9830)

**Bin users according to network size**

In [74]:
# assign the length of the network size for binning:
s=set(dk.NetSize.values)
n_bins = len(s)
# reduce the bins ... 
n_bins = n_bins-1000 # previous reduction
#create the network bins:
net_bins = np.linspace(dk['NetSize'].min(), dk['NetSize'].max(), n_bins, dtype=np.int64)
net_bin_index = np.digitize(dk['NetSize'], net_bins) # get the index of each item in the unbinned data
dk['NetBinIndex'] = net_bin_index # add the column containing the time bin of each data item
#associate each network bin to its corresponding network size:

# this code include the actual bin for each time:
net_bins_list = net_bins.tolist() # convert the net_bins to list from np.ndarray to support appending
net_bands = [] # stores binned posting times as periods
for index in dk.NetBinIndex:
    net_bands.append(net_bins_list[index])
    net_bins_list.append(net_bins_list[index])#replace item back to the list to avoid exhausting the list b4 end
dk['NetBand'] = net_bands

**Bin users according to high-level communities**

In [75]:
dk['CommBand']=dk.RecipComm.apply(lambda x: round(min(comm_bands, key=lambda v: abs(v-x)),2))
#or: dk['CommBand']=dk.CommBand.apply(lambda x: round(x,2))

**Data Categorisation and Storage for further analysis ...**

In [76]:
#convert to dataframe and store for further analysis ...
#structurally-related nodes:
sr  = dk[dk.Dyads>0]
# structurally-unrelated nodes:
su = dk[dk.Dyads<1]
# sizes 
len(sr),len(su),len(set(sr.Va_Name)),len(set(su.Vi_Name))

(96511, 568366, 8643, 9823)

In [77]:
# save files:
#all data - structurally similar and dissimilar nodes
dk.to_csv('data/mct_structurally_related_nodes03.csv', index_label=False)
#structurally similar:
sr.to_csv('data/mct_structurally_similar_nodes03.csv', index_label=False)
#structurally-dissimilar:
su.to_csv('data/mct_structurally_dissimilar_nodes03.csv', index_label=False)

**dyads for constructing the adjacency matrix**

In [None]:
#load dataframe of related and unrelated nodes for spectral analysis ... 
dsr = pd.read_csv('data/mct_structurally_related_nodes.csv')
# drop duplicates in the structurally-related nodes:
df = dsr.drop_duplicates(subset='Va_Name')
len(df),len(set(df.Va_Name)), len(set(df.Vi_Name))

    #first and lst 5 samples showing relevant features and values ... 

In [3]:
df.head() #first 5 samples:

Unnamed: 0,Va_ID,Va_Name,Vi_ID,Vi_Name,JSim_Vai,Prob_Tie,Dyads,NetSize,RecipComm,Ind_Vai,Out_Vai,Cat_Vai,AbsInd_Vai,AbsOut_Vai,AbsCat_Vai,NetBinIndex,NetBand,CommBand
25,59757778,Emirati_Sheikha,104576154,wufandmew,0.666667,0.485004,1,17852,5.261233,9.39759,1.124875,1,0,1,1,13,18304,5.0
35,59757778,Emirati_Sheikha,88635165,Intersymbol,0.666667,0.485004,1,17852,3.292146,5.794948,0.789345,1,0,1,1,13,18304,5.0
61,59757778,Emirati_Sheikha,735872026968088576,nativetexan27,0.666667,0.485004,1,17852,2.65284,4.542807,0.762873,1,0,1,1,13,18304,1.25
106,2307161366,Fred_Mattera,834612585337077760,3192_717,0.666667,0.485004,1,48,0.556829,0.76,0.353659,1,1,0,1,1,1408,0.75
132,721336278411821056,cakey_taylor,758472230565216256,shannon120469,0.666667,0.485004,1,595,3.622097,6.264706,0.979487,1,0,1,1,1,1408,5.0


In [4]:
df.tail() #last 5 samples: 

Unnamed: 0,Va_ID,Va_Name,Vi_ID,Vi_Name,JSim_Vai,Prob_Tie,Dyads,NetSize,RecipComm,Ind_Vai,Out_Vai,Cat_Vai,AbsInd_Vai,AbsOut_Vai,AbsCat_Vai,NetBinIndex,NetBand,CommBand
14811,4003432061,HeartOMfilm,612770276,ADHRB,0.666667,0.485004,1,661,0.422138,0.017034,0.827243,1,0,1,1,1,1408,0.75
14812,4003432061,HeartOMfilm,563992851,GoodGovSeminar,0.666667,0.485004,1,661,1.173375,1.01875,1.328,1,1,0,1,1,1408,1.17
14821,25502169,TinaGhelli,19496043,PshrinkEmeritus,1.0,0.570527,1,2812,1.023669,1.083217,0.964122,1,1,1,1,2,2816,1.03
14823,25502169,TinaGhelli,2746240046,IntJewCon,0.666667,0.485004,1,2812,1.191275,0.953818,1.428733,1,1,0,1,2,2816,1.19
14824,25502169,TinaGhelli,2808451328,stump54jumper,1.0,0.570527,1,2812,0.934321,0.874153,0.994488,1,1,1,1,2,2816,0.93


**Construction of Matrices:** *adjacency, degree, and Laplacian* ..#Include information about the degree matrix in the dataframe

    # get count of edges of each anchor node in both structurally-related and unrelated data:
***PD: Nodes adjacency matrix:*** $\mathcal{M}_{sc_{va}}^{n\times n}$

In [None]:
# Create a dataframe based on the lenght of users to enable formation of an Adjency Matrix .... 
#empty date frame ... with columns and index based on the union of the users ...
columns = set.union(set(df.Va_Name),set(df.Vi_Name))
index = set.union(set(df.Va_Name),set(df.Vi_Name))
sr_amt =pd.DataFrame(np.zeros(shape=(len(index),len(columns))),columns=columns, index=index)
# update with reevan values ... 
for v1, v2, r in zip(df.Va_Name, df.Vi_Name,df.Dyads):
    if r>0:
        sr_amt.at[v1,v2]=r
    else:
        sr_amt.at[v1,v2]=0
# get the users with reciprocal tiesonly
# check entries: sr_amt[sr_amt.any()>0]
k=sr_amt[sr_amt.values>0] 
sr_amt.shape, k.shape

In [137]:
# examples ... to np ndarray: np.array(sr_amt)
sr_amt.head()

Unnamed: 0,ItaliaCamp,LuluLAngeles,SugarAndMusk,GorillaCapitlst,NewsGrit,PshrinkEmeritus,CorbettTracie,oidptg,NietzscheAndDan,ColombianRefug,...,6549lmartin,ilaeornom,Pairsonnalites,MariaLCX,fo77owme,cesarshrimp100,silaas3005,DebbyRevere,HarunMaruf,kgerstman
ItaliaCamp,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
LuluLAngeles,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
SugarAndMusk,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
GorillaCapitlst,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
NewsGrit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


***Nodes degree [diagonal] matrix of edges:*** $\mathcal{M}_{c_{vd}}^{n \times n}$

    #Essentially, this is the degree matrix of the nodes. We can observe some nodes have higher number of reciprocal ties than some,e.g node 1 has 17 ties while node 2 has 1

In [226]:
# diagonal matrix of a structurally-related set of nodes:
sr_dmt = np.diag(sr_amt.sum(axis=1))
sr_amt.shape, sr_dmt.shape

((346, 346), (346, 346))

In [227]:
sr_dmt

array([[35.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0., 53.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ...,
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

***Nodes Laplacian matrix:*** $\mathcal{M}_{c_{vl}}^{n\times n}$

    # this matrix encodes two things: the degree of each node and the presence/absence of edges among nodes. A  negative entry signifies an edge while a zero entry signifies no connection.

In [228]:
#the Laplacin matrix, lm:
sr_lmt = sr_dmt - np.array(sr_amt)
sr_lmt, sr_lmt.shape

(array([[35.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., 53.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  1., ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]]), (346, 346))

In [150]:
# view first 5 smaples, if using dataframe: sr_lmt.head()

***Nodes and reciprocal-communities:*** $\mathcal{M}_{c_{vr}}^{n\times k}$

    # a matrix of nodes according to the degree of affiliation to communities. For instance, a given node can be related to many nodes but with different intensity/magnitude in reciprocal community: (v1:v2 = 0.75, v1:v3 =0.95, v1:v5 = 1.0). This also enables us to examine which community is more prevalence among the users. The first 5 samples (shown below) shows community 0.75 is more dominant, then followed by community 1.25 (basically, the extremum of the range).

In [151]:
# a matrix of nodes vs. communities:
columns = set(df.CommBand)
index = set.union(set(df.Va_Name),set(df.Vi_Name))
sr_vrcom =pd.DataFrame(np.zeros(shape=(len(index),len(columns))),columns=columns, index=index) # empty df
# update with relevant values ... 
for v1, v2, r in zip(df.Va_Name, df.CommBand,df.Dyads):
    if r>0:
        sr_vrcom.at[v1,v2]=r
    else:
        sr_vrcom.at[v1,v2]=0
# get the users with reciprocal ties only
p=sr_vrcom[sr_vrcom.values>0] 
sr_vrcom.shape, p.shape

((346, 58), (921, 58))

In [2]:
#sr_vrcom.head()

In [155]:
sr_vrcom = np.array(sr_vrcom) # nodes vs. communities
sr_vrcom

array([[1., 1., 0., ..., 0., 0., 0.],
       [1., 1., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

***Reciprocal-communities adjacency:*** $\mathcal{M}_{c_{ra}}^{k\times k}$

In [None]:
# a matrix of nodes vs. communities:
columns = set(df.CommBand)
index = set(df.CommBand)
sr_rcom_adj = pd.DataFrame(np.zeros(shape=(len(index),len(columns))),columns=columns, index=index) # empty df
# update with relevant values ... 
for v1, v2, r in zip(df.CommBand, df.CommBand,df.Dyads):
    if r>0:
        sr_rcom_adj.at[v1,v2]=r
    else:
        sr_rcom_adj.at[v1,v2]=0
# get the users with reciprocal ties only
p=sr_rcom_adj[sr_rcom_adj.values>0] 
sr_rcom_adj.shape, p.shape

In [3]:
#sr_rcom_adj.head()

In [158]:
sr_rcom_adj =  np.array(sr_rcom_adj) # community adjacency
sr_rcom_adj

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [214]:
sr_rcom_adj = np.array(sr_rcom_adj)
sr_rcom_adj

array([[0.6 , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.65, 0.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.65, ..., 0.  , 0.  , 0.  ],
       ...,
       [0.  , 0.  , 0.  , ..., 0.6 , 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 1.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.6 ]])

**all shapes:**

In [None]:
sr_amt.shape, sr_dmt.shape, sr_lmt.shape, sr_vrcom.shape, sr_rcom_adj.shape, sr_rcom_dmt.shape

**Offset matrix:** $Z_{i,j} =1$ if $\mathcal{M}_{c_{vl_{i,j}}} \geq 0$ or $Z_{i,j} =-1$ if $\mathcal{M}_{c_{vl_{i,j}}} \leq 0$

    #to prevent negative entries in the matrix

In [233]:
# set the offset and create the new Laplacian matrix with the offse muliplication Z:
sr_lmt2 = np.multiply(sr_lmt, np.where(sr_lmt<0, -1,1))
sr_lmt, sr_lmt2

(array([[35.,  0.,  0., ...,  0.,  0.,  0.],
        [-1., 53.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  1., ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]]),
 array([[35.,  0.,  0., ...,  0.,  0.,  0.],
        [ 1., 53.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  1., ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]]))

**$\mathcal{S}_r$ Model:**

#### I.I Structurally-related clusters: Model and Training

Having obtain all the relevant input/data, the next step is implementation. This study utlises *Pytorch* for the implementation of the following algorithm:
 - ***initialisation:*** $\{\} \longleftarrow \mathcal{S}_r; \{\} \longleftarrow \mathcal{S}_u$
 - ***input:*** a finite collection of network data $\mathcal{D} \neq \emptyset$
 - $\forall v_i, v_j \in \mathcal{D}$, compute $p(R_{v_i,v_j}) \hspace{28mm} \triangleright~ v_i\neq v_j$ 
 - if $p(R_{v_i,v_j}) \geq \tau \hspace{58mm} \triangleright ~\tau$, *a predefined threshold*
     - $\mathcal{S}_r \gets (v_i,v_j) \hspace{54mm} \triangleright$ *structurally-related*
 - else:
     - $\mathcal{S}_u\gets (v_i,v_j) \hspace{54mm} \triangleright$ *structurally-unrelated*
 - ***output:***
     - $\mathcal{S}_r,\mathcal{S}_u, \mathcal{M}_{sa}^{n\times n}\hspace{52mm}\triangleright~$ *adjacency matrix* 

**Matrix Factorisation:** 

***Initialisations:*** Matrices and corresponding dimensions:
 - $\mathcal{M}_{sc_{va}} \mapsto n \times n \hspace{5mm}:$ *adjacency matrix of structurally-related nodes*
 - $\mathcal{M}_{c_{vd}} \mapsto n \times n \hspace{5mm}:$ *nodes diagonal matrix*
 - $\mathcal{M}_{c_{vl}} \mapsto n \times n \hspace{5mm}:$ *nodes Laplacian matrix*
 - $\mathcal{M}_{c_{vr}} \mapsto n \times k \hspace{3mm}:$ *a collection of nodes according to reciprocal-communities*
 - $\mathcal{M}_{c_{ra}} \mapsto k \times k \hspace{3mm}:$ *reciprocal-communities adjacency matrix*
 - $\mathcal{M}_{c_{rd}} \mapsto k \times k \hspace{3mm}:$ *reciprocal-communities diagonal matrix*

For simplicity, the following notations are used:

$\mathcal{M}_{c_{vl}} \mapsto D; \hspace{3mm} \mathcal{M}_{c_{vr}} \mapsto P=[p_{ij}]; \hspace{3mm} \mathcal{M}_{c_{ra}} \mapsto Q=[q_{ij}]$

In [248]:
# initialise the matrices with random values using the embedding layer:
D = sr_lmt2 # nodes laplacian
P = np.random.rand(D.shape[0], sr_vrcom.shape[1]) # nodes and communites, the basis
Q = np.random.rand(sr_rcom_adj.shape[0], sr_rcom_adj.shape[1]) #communitites, the coeeficients

In [355]:
D.shape

(346, 346)

In [356]:
D

array([[35.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1., 53.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       ...,
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])

In [357]:
P, Q # matrices to use in learning D

(array([[0.62343883, 0.49745284, 0.93557885, ..., 0.94973768, 0.30951235,
         0.21644183],
        [0.35973188, 0.78482078, 0.99396444, ..., 0.28427653, 0.06707378,
         0.1379731 ],
        [0.78404398, 0.77323096, 0.73186601, ..., 0.77355178, 0.0201696 ,
         0.47191441],
        ...,
        [0.67413265, 0.53575594, 0.56811568, ..., 0.60683693, 0.13955843,
         0.04703396],
        [0.99082355, 0.66730111, 0.45232921, ..., 0.72140445, 0.74832162,
         0.15618759],
        [0.01171263, 0.74254346, 0.28524965, ..., 0.45281675, 0.67962215,
         0.90577343]]),
 array([[0.93597051, 0.19046834, 0.82319459, ..., 0.88318124, 0.6699693 ,
         0.9615813 ],
        [0.04296863, 0.34943073, 0.58855389, ..., 0.9576318 , 0.47145496,
         0.50039365],
        [0.22315847, 0.57978463, 0.2119457 , ..., 0.4057317 , 0.43701906,
         0.80107925],
        ...,
        [0.14551776, 0.77703052, 0.18273403, ..., 0.50910067, 0.50908546,
         0.972487  ],
        [0.7

In [358]:
D.shape, P.shape, Q.shape, P.T.shape

((346, 346), (346, 58), (58, 58), (58, 346))