## Telegram Network Preparation

This step loads the raw Telegram interaction networks (in GML format) and prepares them for backbone extraction. For each dataset (`1ROUN`, `2ROUN`, `RIOTS`), the following operations are performed:

- Convert the original directed graph to an edge list
- Prepare two weight dimensions:
  - **`weight`**: number of interactions between users
  - **`deltasmed`**: median time between messages, converted to minutes, rounded up, and **inverted** (so shorter times represent stronger ties)

> ⚠️ The **Polya Urn backbone extraction** is not performed here. It must be executed externally using the official **MATLAB implementation**:
>
> - [MATLAB Code](https://www.mathworks.com/matlabcentral/fileexchange/69501-pf)  
> - [Related Paper](https://www.nature.com/articles/s41467-019-08667-3)

The edge lists with transformed weights are saved in the `edgelist/` folder and are ready for filtering.


In [4]:
%%time

import pandas as pd
import numpy as np
import networkx as nx
import os

# List of Telegram datasets (one per context/event)
datasets = ['1ROUN', '2ROUN', 'RIOTS']

for dataset_name in datasets:
    print(f"Processing: {dataset_name}")

    # Load the directed interaction graph from GML format
    graph = nx.read_gml(f'dataset/2time_{dataset_name}_graph.gml', label='id')

    # Convert to DataFrame (edge list format)
    df = nx.to_pandas_edgelist(graph, source='src', target='trg')
    df = df.drop_duplicates(subset=['src', 'trg'])

    # -------------------------------
    # Count-based backbone (weight)
    # -------------------------------
    weight_col = 'weight'
    df[weight_col] = df[weight_col].astype(int)

    # ⚠️ NOTE: The actual backbone extraction must be done using the MATLAB implementation of the Polya Filter:
    # - https://www.mathworks.com/matlabcentral/fileexchange/69501-pf
    # - https://www.nature.com/articles/s41467-019-08667-3

    # -------------------------------
    # Time-based backbone (deltasmed)
    # -------------------------------
    weight_col = 'deltasmed'

    # Convert and normalize deltasmed weights
    df[weight_col] = df[weight_col] / 60            # Convert to minutes
    df[weight_col] = df[weight_col].apply(np.ceil)  # Round up
    max_value = df[weight_col].max()
    df[weight_col] = max_value + 1 - df[weight_col] # Invert: shorter = stronger

    # ⚠️ NOTE: The actual backbone extraction must be done using the MATLAB implementation of the Polya Filter:
    # - https://www.mathworks.com/matlabcentral/fileexchange/69501-pf
    # - https://www.nature.com/articles/s41467-019-08667-3


    # Again, backbone extraction to be done in MATLAB
    df_result[weight_col] = max_value + 1 - df[weight_col]  # revert if needed for interpretation
    df_result.to_csv(f"edgelist/{dataset_name}-Backbone-{weight_col}.edgelist", index=False)


Processing backbone classification for: 1ROUN
Edge class distribution for 1ROUN:
edge_class
1    67.051553
3    32.550840
4     0.354523
2     0.043084 

Processing backbone classification for: 2ROUN
Edge class distribution for 2ROUN:
edge_class
1    66.440084
3    33.298427
4     0.213158
2     0.048332 

Processing backbone classification for: RIOTS
Edge class distribution for RIOTS:
edge_class
1    98.968274
2     0.601019
3     0.426484
4     0.004223 



## CDFs (Telegram)

This section presents the **Empirical Cumulative Distribution Functions (ECDFs)** of two edge weight dimensions for Telegram networks:

- **`nij_c`** — Number of messages exchanged between users (interaction count)
- **`nij_t`** — Average time between messages (in minutes, inverted and normalized)

ECDFs are plotted **by edge class**, allowing comparison of weight distributions across structural categories.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Reset Seaborn style
sns.reset_defaults()

# Create output folder if it doesn't exist
os.makedirs("figs", exist_ok=True)

# Datasets to process
datasets = ['1ROUN', '2ROUN', 'RIOTS']

# ECDF plotting function
def plot_ecdf_for_classes(df, variable, save_path, x_axis_label, log_x_axis=False):
    """
    Plot ECDF of a given variable, grouped by edge class.

    Parameters:
    - df (pd.DataFrame): DataFrame containing 'edge_class' and the variable
    - variable (str): Column to plot ('nij_c' or 'nij_t')
    - save_path (str): Path to save the figure
    - x_axis_label (str): Label for the x-axis
    - log_x_axis (bool): Whether to apply log scale to x-axis
    """
    plt.figure(figsize=(4, 3))

    # Plot ECDF by class
    for cls in sorted(df['edge_class'].unique()):
        subset = df[df['edge_class'] == cls]
        sns.ecdfplot(data=subset, x=variable, label=f'Class {cls}')

    if log_x_axis:
        plt.xscale('log')

    plt.xlabel(x_axis_label)
    plt.ylabel('P(X ≤ x)')
    plt.legend(title='Edge Class', loc='lower right')
    plt.tight_layout()
    plt.savefig(save_path)
    plt.close()

# ----------------------------------------
# Generate ECDFs for each dataset
# ----------------------------------------
for dataset_name in datasets:
    print(f"Plotting ECDFs for: {dataset_name}")

    df = pd.read_csv(f'edgelist/{dataset_name}-Backbone-Final.csv')

    # Apply small transformation for time (if needed)
    df['nij_t'] = df['nij_t'] + 1

    # Plot temporal ECDF
    plot_ecdf_for_classes(
        df=df,
        variable='nij_t',
        save_path=f'figs/{dataset_name}-deltasmed.pdf',
        x_axis_label='Avg. time between messages (minutes)',
        log_x_axis=False
    )

    # Plot interaction count ECDF
    plot_ecdf_for_classes(
        df=df,
        variable='nij_c',
        save_path=f'figs/{dataset_name}-weight.pdf',
        x_axis_label='# of messages',
        log_x_axis=True
    )


## Edge Class Diversity per User (Telegram)

This step analyzes the **diversity of structural roles** played by users in the Telegram backbone networks. For each dataset (`1ROUN`, `2ROUN`, `RIOTS`):

- The script counts how many **distinct edge classes** (from 1 to 4) each user is involved in.
- It then plots the **fraction of users** connected through 1, 2, 3, or 4 edge classes.

This reveals the extent to which users engage in **multiple types of interactions** within the network.

The bar plots are saved in the `figs/` folder with the pattern:



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import os

# List of Telegram dataset identifiers
datasets = ['1ROUN', '2ROUN', 'RIOTS']

# Maximum Y-axis limit for plot
ylim_fraction = 1

# Ensure output folder exists
os.makedirs('figs', exist_ok=True)

for dataset_name in datasets:
    file_path = f"edgelist/{dataset_name}-Backbone-Final.csv"

    try:
        # Load classified backbone data
        df = pd.read_csv(file_path)

        # Melt source and target nodes into a single 'user' column with role label
        edges_per_user = df.melt(
            id_vars='edge_class',
            value_vars=['src', 'trg'],
            var_name='role',
            value_name='user'
        )

        # Count how many distinct edge classes each user participates in
        user_class_counts = edges_per_user.groupby('user')['edge_class'].nunique().reset_index()

        # Aggregate: fraction of users by number of distinct edge classes
        distribution = user_class_counts['edge_class'].value_counts(normalize=True).sort_index()

        # Plot bar chart
        plt.figure(figsize=(4, 3))
        plt.bar(distribution.index, distribution.values, color='skyblue')

        # Labeling and styling
        plt.xlabel('# Distinct Edge Classes')
        plt.ylabel('Fraction of Users')
        plt.xticks(range(1, len(distribution) + 1))
        plt.ylim(0, ylim_fraction)
        plt.tight_layout()

        # Save plot
        output_path = f"figs/{dataset_name}_User_EdgeClass_Distribution.pdf"
        plt.savefig(output_path)
        plt.close()

    except FileNotFoundError:
        print(f"File not found: {file_path}")
