<a href="https://colab.research.google.com/github/kicasta/Modeling_WUGS_WSBM/blob/master/example/example_modeling_wugs_wsbm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Prepare everything

We first need to install every dependency needed to run this notebook in colab. 

In [None]:
!pip install pymc3 --upgrade

!echo "deb http://downloads.skewed.de/apt bionic main" >> /etc/apt/sources.list
!apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25
!apt-get update
!apt-get install python3-graph-tool python3-cairo python3-matplotlib

!apt install libgraphviz-dev
!pip install pygraphviz

!pip install pyvis

We also need to clone the repository to your content, so every module is accessible from this notebook. Notice that your content in colab gets purged everytime your environment is restarted. 

You might need to refresh the content directory to see the repository cloned. 

In [None]:
!git clone https://github.com/kicasta/Modeling_WUGS_WSBM.git

We now add the path to the system path to easily import the modules.

In [None]:
import sys
sys.path.insert(0,'/content/Modeling_WUGS_WSBM/src/')

Import everything 

In [None]:
import wsbm as wsbm
import plot_utils as pltutil

import pickle
from collections import Counter

In the next cell we load a wug and find the distribution that best fits it while also returning the corresponding partition. 

In [None]:
g_name = "zersetzen"
g_path = "/content/Modeling_WUGS_WSBM/example/wug_example/" + g_name

In [None]:
#load the graph    
graph, s_gt, pos = wsbm.open_graph(g_path)

#find the best distribution and the wug partition wrt. to that distribution
dist, state = wsbm.find_best_distribution(s_gt)
b = wsbm.get_blocks(state)

#compute measures wrt. to the best partition
mri, purity, acc = wsbm.partition_and_stat_gt(graph, s_gt, b, verbose=False)

print("DISTRIBUTION ", dist)
print("MRI", mri)
print("PURITY", purity)
print("ACCURACY", acc)

With the above partition and distribution we can then visualize the inferred parameters based on the fitted ones. We show a joint picture considering all the communities/blocks in the same plot, but also a breakdown inside each community and between each of them in the detailed picture. 

In [None]:
# create a directory to save all the images
!mkdir best_fit
best_fit = "/content/best_fit/"

# get the edges and vertices of the wug
edges = s_gt.get_edges([s_gt.ep.orig_weight])
vertices = s_gt.get_vertices()

# get weights of all the edges between communities
outside_edges = [item for item in edges if b[vertices[int(item[0])]] != b[vertices[int(item[1])]]]
outside_weights = [item[2] for item in outside_edges]

# get weights of all the edges inside every community
inside_edges = [item for item in edges if b[vertices[int(item[0])]] == b[vertices[int(item[1])]]]
inside_weights = [item[2] for item in inside_edges]

# infer the parameters of the specific distribution for the weights both inside and between communities
inferred_p_inside = wsbm.infer_p(inside_weights, distribution=dist)
inferred_p_outside = wsbm.infer_p(outside_weights, distribution=dist)

# get the communities labels
c = Counter(b.a)
communities = list(c.values())

# generate the joint plot
pltutil.plot_values(inside_weights, outside_weights, inferred_p_inside, inferred_p_outside, "joint_" + g_name + ".png", distribution=dist, xticks_shifted=False, path=best_fit)

# compute the wug partition with labels
partition = wsbm.compute_partition(g, b)
inferred_ps = dict()

# infer the distribution parameters for/between each community
for k,v in partition.items():
  inferred_ps[k] = wsbm.infer_p(v, distribution=dist)

blocks = list(b.a)

# generate the single plots for each community or between all community pairs
for k in inferred_ps.keys():
  in_community = len(k) == 1

  if in_community:
    title = "Block '" + k + "' - Vertex Count: " + str(blocks.count(int(k))) + " - " + "Edge Count: " + str(len(partition[k]))
  else:
    title = "Between Blocks '" + k + "' - Edge Count: " + str(len(partition[k]))
  pltutil.plot_values_oneside(partition[k], inferred_ps[k], in_community, title, fig_title=k, xticks_shifted=False, path=best_fit, distribution=dist)

# generate a detailed plot showing all the single plots previously generated        
pltutil.combine_community_plots(list(partition.keys()), len(communities), "detailed_distribution_" + g_name + ".png", path=best_fit)
    

We already saw how to find the best distribution for each graph and compute some relevant measures wrt. to the partition generated. All this data is saved in two dictionaries for later reuse and because there is no warranty that the best partition is the same twice, specially for complex graphs. 

In the next sections we work with precomputed values, saved in the data directory in the repository.

In [None]:
# Load the dictionaries
output_path = "/content/Modeling_WUGS_WSBM/data/best_fit/"

with open(output_path + "g_dist_states", 'rb') as f:
  states = pickle.load(f)

with open(output_path + "g_accuracies", 'rb') as f:
  accuracies = pickle.load(f)

From those dictionaries we can then plot a lot of useful statistics.

In [None]:
# Plot the amount of graphs best fitted by each distribution
# Corresponds to plot_distributions.py
dists = [d for d,s in states.values()]
dist_count = {d:dists.count(d) for d in set(dists)}

dist_labels = dist_count.keys()
dist_values = [dist_count[d] for d in dist_labels]
dist_labels = [d.split("-")[1] for d in dist_labels]

pltutil.plot_dist_dist(dist_labels, dist_values)

In [None]:
# Plot the amount of blocks of the best fit found for each graph 
# Corresponds to plot_number_of_blocks.py
block_counts = {}
for g,v in states.items():
  state = v[1]
  block_counts[g] = len(set(wsbm.get_blocks(state)))

blocks_y = [block_counts[k] for k in block_counts.keys()]
pltutil.plot_single_stat(block_counts.keys(), blocks_y, "Number of Blocks", limy=False)

In [None]:
# Plot the accuracy of the best fits 
# Corresponds to plot_accuracies.py
acc_y = [accuracies[k] for k in accuracies.keys()]
pltutil.plot_single_stat(accuracies.keys(), acc_y, "Accuracy")