# Case study : Classification of graph entities - nodes and edges

##### License: Apache 2.0


The following notebook shows how it is possible to use the diffusion modules to classify nodes and edges of different generated graphs by taking into account their structural similarity. The idea is studied in the paper $\href{https://arxiv.org/abs/1710.10321}{Learning Structural Node Embeddings Via Diffusion Wavelets}$ where the authors propose a method called GraphWave which exploits heat diffusion processes on graphs to embed nodes into a multidimensional euclidean space. The following procedure is a modification and extension version of GraphWave which allows to embed, and then classify, higher order graph structures like edges by exploiting concepts of Topological Data Analysis. 

In [None]:
import numpy as np 
import networkx as nx

from giotto.graphs.create_clique_complex import CreateCliqueComplex, CreateLaplacianMatrices, CreateBoundaryMatrices
from giotto.graphs.heat_diffusion import HeatDiffusion
from giotto.graphs.graph_entropy import GraphEntropy
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans


# Plotting Functions #

In [None]:
import matplotlib
import matplotlib.pyplot as plt
from plotting import plot_network_diffusion, plot_entropies

# Example 1 : The Barbell Graph#

The first graph analyzed is the Barbell Graph which is studied also in the GraphWave paper. We generate here the graph and plot it. The graph is made of two fully connected clusters linked by a chain of nodes.

In [None]:
g = nx.barbell_graph(9,7)
#Layout for the plotted graph
pos = nx.shell_layout(g)
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos)
_ = plt.axis('off')

# Create Complex and Laplacians #

In this phase the clique complex of the graph is computed and returned as a dictionary. Once the dictionary $cd$ is computed, the relative zero and one laplacian matrices are computed by using $CreateLaplacianMatrices$.

In [None]:
cc = CreateCliqueComplex(g)
cd = cc.create_complex_from_graph()
laplacians = CreateLaplacianMatrices().fit_transform(cd, (0,1))

# Find Embedding for nodes and edges#

First of all it is necessery to set the hyperparameters of the algorithm that in this case are the points in time at which sampling node and edge diffusions.

In [None]:
#Hyperparameters 

#Temporal instants to take for node diffusions
taus_n = np.linspace(0, 2, 40)

#Temporal instants to take for edge diffusions
taus_e = np.linspace(0, 2, 40)

The Heat Diffusion process is here computed for nodes and edges. Given that the parameter $\textit{initial_condition}$ is not passed to the $\textit{HeatDiffusion}$ object, the identity matrix is taken. This means that for each node (edge) there is a heat diffusion process with inititial condition a delta on that node (edge).

In [None]:
heat_vectors_n = HeatDiffusion().fit_transform(laplacians[0], taus=taus_n)
heat_vectors_e = HeatDiffusion().fit_transform(laplacians[1], taus=taus_e)

Now that we have sampled the heat processes, given a fixed simplex (be either a node or an edge) and point in time, we treat the heat values as a probability distribution over the graph of which entropy is computed. The entropy values computed at different point in time represent the structural features of the simplex.

In [None]:
mh_n = GraphEntropy().fit_transform(heat_vectors_n).T
mh_e = GraphEntropy().fit_transform(heat_vectors_e).T

# Clusters of nodes (0-simplices) based on structural embedding #

In this step we cluster the points related to each node by using the KMeans algorithm. We can then represent each node with the centroid of its corresponding class.

In [None]:
#Simple K-mean
kmeans = KMeans(n_clusters=6)
kmeans.fit(np.transpose(mh_n))
y_kmeans_n = kmeans.predict(np.transpose(mh_n))

#Just for Visualization, plot 2D PCA embedding with points colored by classes
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_n))
#Colors for classes (at most 8)
col = ['blue', 'yellow', 'black', 'grey', 'green', 'brown', 'red', 'orange']
barbell_node_cols = [col[e] for e in y_kmeans_n]
_ = plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=barbell_node_cols, s=100)
_ = plt.title("2D-PCA representation of nodes in structural space")

We now plot the graph by coloring nodes with respect to their classes. We can see that different structural nodes, that is nodes with different structures in the neighborhood, are colored differently.

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos ,node_vector=barbell_node_cols)
_ = plt.axis('off')

# Cluster the 1-simplexes based on the embedding space #

In addition to the node classfication we are able to provide a similar edge analisys thanks to the topological properties of higher order laplacians and heat diffusion processes defined over higher order structures.

In [None]:
#Simple K-mean
kmeans = KMeans(n_clusters=6)
kmeans.fit(np.transpose(mh_e))
y_kmeans_e = kmeans.predict(np.transpose(mh_e))

#Just for Visualization
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_e))
#Associate colors to each nodes w.r.t. its class
barbell_edge_cols = [col[e] for e in y_kmeans_e]
_ = plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=barbell_edge_cols, s=80)
_ = plt.title("2D-PCA representation of edges in structural space")

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos, edge_vector=barbell_edge_cols)
_ = plt.axis('off')

# Example 2: Torus #

We run exactly the same experiment with a graph which is a triangulation of a torus.

In [None]:
g = nx.triangular_lattice_graph(10, 10, periodic=True)
# Needed for the specific nx.grid_2d_graph labeling
mapping = dict(zip(list(g.nodes), range(0, nx.number_of_nodes(g))))
g = nx.relabel_nodes(g, mapping)

In [None]:
plt.figure(figsize=(20,8))
pos = nx.spring_layout(g, iterations=1000)
plot_network_diffusion(g, pos)
_ = plt.axis('off')

By this experiment we want to highlight the fact that, from the point of neighborhood structural similarity, all nodes of a torus are equivalent. Our method is able to capture this fact.

In [None]:
cc = CreateCliqueComplex(g)
cd = cc.create_complex_from_graph()
laplacians = CreateLaplacianMatrices().fit_transform(cd, (0,1))

heat_vectors_n = HeatDiffusion().fit_transform(laplacians[0], taus=taus_n)
heat_vectors_e = HeatDiffusion().fit_transform(laplacians[1], taus=taus_e)
mh_n = GraphEntropy().fit_transform(heat_vectors_n).T
mh_e = GraphEntropy().fit_transform(heat_vectors_e).T

#Simple K-mean
kmeans = KMeans(n_clusters=1)
kmeans.fit(np.transpose(mh_n))
y_kmeans_n = kmeans.predict(np.transpose(mh_n))

#Just for Visualization, plot 2D PCA embedding with points colored by classes
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_n))
torus_node_cols = [col[e] for e in y_kmeans_n]
_ = plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=torus_node_cols, s=80)
_ = plt.title("2D-PCA representation of nodes in structural space")

All nodes have the same color given that they represent the same structural class.

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g,pos,node_vector=torus_node_cols)
_ = plt.axis('off')


Again it is interesting to see that, in the edge space, the 1-simplices can be grouped into two different classes corresponding to the direction of the two representative vector of the 1-homology class. Indeed in this specific triangulation of the torus there are exactly 2 1-dimensional holes.

In [None]:
#Simple K-mean
kmeans = KMeans(n_clusters=2)
kmeans.fit(np.transpose(mh_e))
y_kmeans_e = kmeans.predict(np.transpose(mh_e))

#Just for Visualization
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_e))
#Associate colors to each nodes w.r.t. its class
torus_edge_cols = [col[e] for e in y_kmeans_e]
_ = plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=torus_edge_cols)
_ = plt.title("2D-PCA representation of edges in structural space")

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos, edge_vector=torus_edge_cols)
_ = plt.axis('off')

# Node Heat Diffusion Process #

Here we want to give a glimpse on the behaviour of the heat diffusion process on graph nodes. The following plots represent 3 snapshots of the process taken at 3 different points in time.  

In [None]:
# Plotting
plt.figure(figsize=(20,6))
for i in range(3):
    plt.subplot(1,3,i+1)
    plot_network_diffusion(g, pos, node_vector=heat_vectors_n[:,15,i*5])
    plt.axis('off')

# Example 3 : 2-D grid graph #

In [None]:
g = nx.triangular_lattice_graph(10,10, periodic=False)
mapping = dict(zip(list(g.nodes), range(0, nx.number_of_nodes(g))))
g = nx.relabel_nodes(g, mapping)
pos =nx.spring_layout(g, iterations=2000)

In [None]:
cc = CreateCliqueComplex(g)
cd = cc.create_complex_from_graph()
laplacians = CreateLaplacianMatrices().fit_transform(cd, (0,1))

In [None]:
heat_vectors_n = HeatDiffusion().fit_transform(laplacians[0], taus=taus_n)
heat_vectors_e = HeatDiffusion().fit_transform(laplacians[1], taus=taus_e)
mh_n = GraphEntropy().fit_transform(heat_vectors_n).T
mh_e = GraphEntropy().fit_transform(heat_vectors_e).T

In [None]:
#Simple K-mean
kmeans = KMeans(n_clusters=6)
kmeans.fit(np.transpose(mh_n))
y_kmeans_n = kmeans.predict(np.transpose(mh_n))

#Just for Visualization, plot 2D PCA embedding with points colored by classes
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_n))

grid_nodes_cols = [col[e] for e in y_kmeans_n]
_ = plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=grid_nodes_cols)
_ = plt.title("2D-PCA representation of nodes in structural space")

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos, node_vector=grid_nodes_cols)
_ = plt.axis('off')

In [None]:
#Simple K-mean
kmeans = KMeans(n_clusters=5)
kmeans.fit(np.transpose(mh_e))
y_kmeans_e = kmeans.predict(np.transpose(mh_e))

#Just for Visualization
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(np.transpose(mh_e))
#Associate colors to each nodes w.r.t. its class
grid_edge_cols = [col[e] for e in y_kmeans_e]
plt.scatter(np.ravel(principalComponents[:,0]),np.ravel(principalComponents[:,1]),  c=grid_edge_cols)
_ = plt.title("2D-PCA representation of edges in structural space")

In [None]:
#Plot the diffusion starting from a specific node at a specific time as a function defined over the nodes of the graph.
plt.figure(figsize=(20,8))
plot_network_diffusion(g, pos, edge_vector=grid_edge_cols)
_ = plt.axis('off')