# Static Community Detection Using CDLIB, NetworkX and iGraph - JUST

After calculating similairties and updating the edge list with the values, we will look to execute the well-known Louvain using NetworkX and CDLib, Leiden using iGraph.

## Importing Edge List w/ Weights to NetworkX

NetworkX's read_weighted_edgelist function expects a simple text file with lines of the form <node1> <node2> <weight>, without headers. Since our data is in CSV format, you'll need to use Pandas (or another method) to load the CSV and adjust it to become readable.

In [1]:
import pandas as pd

edge_list_df = pd.read_csv('Input/JUST_edge_list_with_similarity.csv')

print(edge_list_df)

            source    target    weight
0        u74717431  t7748381  0.957671
1       u127821914  t3529910  0.426984
2       u174194590  t5762915  0.369877
3       u141847381  t6987845  0.745354
4        u87215499  t4082536  0.946200
...            ...       ...       ...
641279     ci20717       co3 -0.007842
641280     ci20718       co9  0.022677
641281     ci20719       co5 -0.050816
641282     ci20720       co3 -0.178938
641283     ci20721       co4  0.128145

[641284 rows x 3 columns]


In [2]:
negative_weights = edge_list_df[edge_list_df['weight'] < 0]
print(f"Number of edges with negative weights: {len(negative_weights)}")

Number of edges with negative weights: 102041


Since Louvain is not made to consider negative edge weights, we will rescale the weights such that instead of [-1, 1] being the range, it is now [0, 1], where 0 now represents perfect dissimilarity, 0.5 represents orthogonality, and 1 represents perfect similarity.

In [3]:
edge_list_df['weight'] = (edge_list_df['weight'] + 1) / 2

print(edge_list_df)

            source    target    weight
0        u74717431  t7748381  0.978835
1       u127821914  t3529910  0.713492
2       u174194590  t5762915  0.684939
3       u141847381  t6987845  0.872677
4        u87215499  t4082536  0.973100
...            ...       ...       ...
641279     ci20717       co3  0.496079
641280     ci20718       co9  0.511339
641281     ci20719       co5  0.474592
641282     ci20720       co3  0.410531
641283     ci20721       co4  0.564072

[641284 rows x 3 columns]


Before we continue with the creation of a graph, NetworkX specifies that an undirected, weighted graph must not have self-loop, parallel edges (A->B, B->A), or duplicate edges.

In [4]:
duplicate_edges = edge_list_df.duplicated(subset=['source', 'target'], keep=False)
print(f"Number of duplicate edges: {duplicate_edges.sum()}")

self_loops = edge_list_df[edge_list_df['source'] == edge_list_df['target']]
print(f"Number of self-loops: {len(self_loops)}")

print(edge_list_df.isnull().sum())

Number of duplicate edges: 0
Number of self-loops: 0
source    0
target    0
weight    0
dtype: int64


In [5]:
# Find duplicate edges (ignoring the weight column)
duplicate_edges = edge_list_df.duplicated(subset=['source', 'target'], keep=False)

# Filter to get only the duplicate edges
parallel_edges_df = edge_list_df[duplicate_edges]

# Sort to better visualize parallel edges
parallel_edges_sorted = parallel_edges_df.sort_values(by=['source', 'target'])

print(parallel_edges_sorted)

Empty DataFrame
Columns: [source, target, weight]
Index: []


We check for any non-numeric values in the weight column, since this will not be valid when input into a graph object.

In [6]:
# Check for any non-numeric values in the 'weight' column
non_numeric_weights = edge_list_df[pd.to_numeric(edge_list_df['weight'], errors='coerce').isna()]

# Display rows with non-numeric or NaN weights
print(non_numeric_weights)

Empty DataFrame
Columns: [source, target, weight]
Index: []


In [7]:
# Convert the 'weight' column to floating point values
edge_list_df['weight'] = edge_list_df['weight'].astype(float)

# Check the data type of the column to confirm the conversion
print(edge_list_df.dtypes)

source     object
target     object
weight    float64
dtype: object


In [8]:
display(edge_list_df)

Unnamed: 0,source,target,weight
0,u74717431,t7748381,0.978835
1,u127821914,t3529910,0.713492
2,u174194590,t5762915,0.684939
3,u141847381,t6987845,0.872677
4,u87215499,t4082536,0.973100
...,...,...,...
641279,ci20717,co3,0.496079
641280,ci20718,co9,0.511339
641281,ci20719,co5,0.474592
641282,ci20720,co3,0.410531


## Creating Undirected Weighted NX Graph

We iterate over the edge list DataFrame rows to add edges along with their weights to a new NetworkX graph.


In [9]:
import networkx as nx

def get_graph_info(graph):
    print("Number of nodes:", graph.number_of_nodes())
    print("Number of edges:", graph.number_of_edges())
    
    # Checking the graph type to provide appropriate information
    if isinstance(graph, nx.DiGraph):
        print("Graph is Directed")
    else:
        print("Graph is Undirected")


In [10]:
# Initialize a new graph
G = nx.MultiGraph()

# Add edges and weights
for index, row in edge_list_df.iterrows():
    source = row['source']
    target = row['target']
    weight = row['weight']
    
    # Add the edge with weight
    G.add_edge(source, target, weight=weight)

In [11]:
get_graph_info(G)

Number of nodes: 245760
Number of edges: 641284
Graph is Undirected


## Running Louvain Using CDLIB + NX

CDlib (Community Discovery Library) is designed for community detection and analysis, providing easy access to various algorithms, including Louvain and Leiden, and tools for evaluating and visualizing the results.

In [12]:
from cdlib import algorithms


Note: to be able to use all crisp methods, you need to install some additional packages:  {'graph_tool', 'wurlitzer', 'infomap', 'bayanpy'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'pyclustering', 'ASLPAw'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'wurlitzer', 'infomap'}


In [13]:
communities_louvain = algorithms.louvain(G)

# Accessing the number of communities/partitions
num_partitions = len(communities_louvain.communities)
print(f"Number of partitions: {num_partitions}")

# Accessing modularity
modularity = communities_louvain.newman_girvan_modularity().score
print(f"Modularity: {modularity}")

Number of partitions: 47135
Modularity: 0.6535567993402418


## Creating Undirected Weighted iGraph for Leiden

Provided that running Leiden on the MusicMicro dataset was problematic, we decided to isolate the issue and directly use iGraph, as suggested by Leiden's authors.  

We iterate over the edge list DataFrame rows to add edges into a tuple list, along with their weights in a separate list for input into a new iGraph graph.

In [14]:
import igraph as ig

# Assuming edge_list_df is your DataFrame
edges_with_weights = [(row['source'], row['target'], row['weight']) for index, row in edge_list_df.iterrows()]

# Creating the igraph Graph
g = ig.Graph.TupleList(edges_with_weights, edge_attrs={'weight': [w for _, _, w in edges_with_weights]})

# Now, check again if the weights have been correctly assigned
print(g.es['weight'][:5])
print(g.summary())

[0.9788354573755722, 0.7134917524156085, 0.6849385653326384, 0.8726768455516055, 0.9730999967361242]
IGRAPH UNW- 245760 641284 -- 
+ attr: name (v), weight (e)


## Running Leiden Using iGraph + leidenalg

In [15]:
import leidenalg

# Run the Leiden algorithm
partition = leidenalg.find_partition(g, partition_type=leidenalg.ModularityVertexPartition, weights='weight')

# Extract the number of communities
num_communities = len(partition)

# Calculate the modularity score
modularity = partition.modularity

print(f"Number of communities: {num_communities}")
print(f"Modularity score: {modularity}")

Number of communities: 595
Modularity score: 0.5911447964957371


## Saving Node List w/ Community Assignments

In order for us to visualize the partitions, we need to iterate through each partition and assign an ID to every node in it. This way we can color code when visualizing to see which nodes were assigned together. 

### Load Node ID List

This was copied from Similar+Weights folder.

In [16]:
# Load node IDs into a DataFrame
node_df = pd.read_csv('Input/JUST_node_indexes.txt', header=None, names=['nodeID'])

display(node_df)

Unnamed: 0,nodeID
0,u174194590
1,u26432623
2,t141574
3,t1479214
4,t141567
...,...
245755,ci15519
245756,ci4030
245757,u464842393
245758,ci11319


In [17]:
def get_type(node_id):
    if node_id.startswith('t'):
        return 'track'
    elif node_id.startswith('a'):
        return 'artist'
    elif node_id.startswith('u'):
        return 'user'
    elif node_id.startswith('ci'):
        return 'city'
    elif node_id.startswith('co'):
        return 'country'
    else:
        return 'unknown'

node_df['type'] = node_df['nodeID'].apply(get_type)

display(node_df)

Unnamed: 0,nodeID,type
0,u174194590,user
1,u26432623,user
2,t141574,track
3,t1479214,track
4,t141567,track
...,...,...
245755,ci15519,city
245756,ci4030,city
245757,u464842393,user
245758,ci11319,city


In [18]:
import matplotlib.cm as cm
import matplotlib

### For Louvain

In [19]:
# Correctly accessing the communities for iteration
n_communities = len(communities_louvain.communities)
colors = cm.get_cmap('viridis', n_communities)

# Initialize the mapping dictionary
node_community_color_map = {}

for community_id, community_nodes in enumerate(communities_louvain.communities):
    color = colors(community_id / n_communities)  # Get a color from the colormap
    color_hex = matplotlib.colors.rgb2hex(color)  # Convert the color to hex format
    
    for node in community_nodes:
        node_community_color_map[str(node)] = {"communityID": community_id, "color": color_hex}

In [20]:
node_df_louvain = node_df.copy()

# Add community ID, color, and type to the DataFrame
node_df_louvain['communityID'] = node_df['nodeID'].apply(lambda x: node_community_color_map[x]['communityID'] if x in node_community_color_map else -1)
node_df_louvain['color'] = node_df['nodeID'].apply(lambda x: node_community_color_map[x]['color'] if x in node_community_color_map else '#000000')

display(node_df_louvain)

Unnamed: 0,nodeID,type,communityID,color
0,u174194590,user,107,#440154
1,u26432623,user,39,#440154
2,t141574,track,0,#440154
3,t1479214,track,0,#440154
4,t141567,track,0,#440154
...,...,...,...,...
245755,ci15519,city,39819,#98d83e
245756,ci4030,city,11992,#3a538b
245757,u464842393,user,368,#440256
245758,ci11319,city,19643,#287c8e


In [21]:
# Select relevant columns if necessary and export to CSV
node_df_louvain[['nodeID', 'communityID', 'color', 'type']].to_csv('Output/node_metadata_JUST_Louvain.csv', index=False, sep=';')

### For Leiden

In [22]:
# Correctly accessing the communities for iteration with Leiden results
n_communities = len(partition)
colors = cm.get_cmap('viridis', n_communities)

# Initialize the mapping dictionary for Leiden communities
node_community_color_map = {}

# Mapping nodes to communities and colors
for community_id, community_nodes in enumerate(partition):
    color = colors(community_id / n_communities)  # Get a color from the colormap
    color_hex = matplotlib.colors.rgb2hex(color)  # Convert the color to hex format
    
    for node in community_nodes:
        node_community_color_map[str(g.vs[node]['name'])] = {"communityID": community_id, "color": color_hex}

In [23]:
node_df_leiden = node_df.copy()

# Add community ID and color to the DataFrame based on Leiden results
node_df_leiden['communityID'] = node_df['nodeID'].apply(lambda x: node_community_color_map[x]['communityID'] if x in node_community_color_map else -1)
node_df_leiden['color'] = node_df['nodeID'].apply(lambda x: node_community_color_map[x]['color'] if x in node_community_color_map else '#000000')

# Display the updated DataFrame
display(node_df_leiden)

Unnamed: 0,nodeID,type,communityID,color
0,u174194590,user,3,#440256
1,u26432623,user,10,#46075a
2,t141574,track,2,#440154
3,t1479214,track,1,#440154
4,t141567,track,1,#440154
...,...,...,...,...
245755,ci15519,city,1,#440154
245756,ci4030,city,5,#450457
245757,u464842393,user,22,#470e61
245758,ci11319,city,12,#46085c


In [24]:
# Select relevant columns if necessary and export to CSV
node_df_leiden[['nodeID', 'communityID', 'color', 'type']].to_csv('Output/node_metadata_JUST_Leiden.csv', index=False, sep=';')