First, we need to read the data into a network that can be read by gephi, using networkx. We create nodes for each user and video and connect them via edges for each entry in the table (as each entry consists of a 'triangle'). Of course, we do not want any duplicates or self loops. We can then, using notworkx, write the data to a gexf file, which can be read by gephi.

In [1]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

df = pd.read_csv('pairwise_52seconds_share.csv')

G = nx.Graph()

for _, row in df.iterrows():
    user_1 = f"user_{row['userID_1']}"
    user_2 = f"user_{row['userID_2']}"
    video_node = f"video_{row['videoID']}"
    
    if not G.has_node(user_1):
        G.add_node(user_1, type='user', color='blue')
    
    if not G.has_node(user_2):
        G.add_node(user_2, type='user', color='blue')
        
    if not G.has_node(video_node):
        G.add_node(video_node, type='video', color='green')
        
    G.add_edge(user_1, video_node, timestamp=row['timestamp_1'])
    G.add_edge(user_2, video_node, timestamp=row['timestamp_2'])
    G.add_edge(user_2, user_1, timestamp=row['timestamp_1'])
    
G.remove_edges_from(nx.selfloop_edges(G))

In [2]:
nx.write_gexf(G, "blue_helm.gexf")


Using the number of nodes and edges from the base network, we can then generate various other network types, to compare them to the base network. We choose here a watts strogatz model and a barabasi albert model, as both of these simulate the social media structure that we are expecting from the base network, and in which we want to find irregularities. The parameters here are easily imputed from the base network, k (for watts strogatz) is simply the average amound of connections per node in the base network (2m/n), and p is set to 0.1. This is the same for barabasi-albert too.

In [3]:
n = G.number_of_nodes()
m = G.number_of_edges()

k = int((2 * m) / n)

H = nx.watts_strogatz_graph(n= n, k = k, p = 0.1)

nx.write_gexf(H, "watts_strogatz_random.gexf")

In [4]:

m_par = int((2*m)/n)

K = nx.barabasi_albert_graph(n = n, m = m_par)

nx.write_gexf(K, "barabasi_albert.gexf")

We can then define various metrics that we would like to assess for the different networks and apply them to compare.

In [5]:
def transitivity(net):
    return nx.transitivity(net)


def rich_club(net):
        largest_cc = max(nx.connected_components(net), key=len)
        net_sub = net.subgraph(largest_cc)
        return nx.rich_club_coefficient(net_sub)

def average_degree(net):
    return sum(dict(net.degree()).values()) / net.number_of_nodes()

def deg_assortativity(net):
    return nx.degree_assortativity_coefficient(net)

def diameter(net):
        largest_cc = max(nx.connected_components(net), key=len)
        net_sub = net.subgraph(largest_cc)
        return nx.diameter(net_sub)

def avg_shortest_path(net):
        largest_cc = max(nx.connected_components(net), key=len)
        net_sub = net.subgraph(largest_cc)
        return nx.average_shortest_path_length(net_sub)

In [6]:
nx.is_connected(G)

False

In [10]:
def compare_networks(net, base_networks):
    metrics = [transitivity, rich_club, average_degree, deg_assortativity, diameter, avg_shortest_path]
    metric_names = [metric.__name__ for metric in metrics]
    
    results = {metric_name: [] for metric_name in metric_names}
    networks = [('blue_helm', net)] + [(f'baseline{i+1}', base_net) for i, base_net in enumerate(base_networks)]
    
    for name, network in networks:
        for metric in metrics:
            results[metric.__name__].append(metric(network))
            
    return results
    

In [None]:
result = compare_networks(G, [H, K])

We can show the results by just printing this dictionary. The metrics are in the same order as the input in the compare networks function input.

In [9]:
result

{'transitivity': [0.010092078570901202, 0.3554355065492489, 0],
 'average_degree': [3.8392857142857144, 4.0, 1.9995421245421245],
 'deg_assortativity': [-0.14743884690490577,
  -0.025173702539830305,
  -0.05714008686882524],
 'diameter': [17, 21, 22],
 'avg_shortest_path': [4.771102215238985,
  11.179890533479954,
  7.947287703899795]}