# Generating, Updating Networks and Polarization Experimentation

## The Imports

In [1]:
from src.classes.network import RandomNetwork, ScaleFreeNetwork
from src.experimentation import generate_networks, read_and_load_networks, multiple_correlations_par
from src.viusalization import plot_cascade_animation, statistics_cascades, plot_cascades_gamma, plot_cascade_dist_average, plot_cascade_power_law
from src.assortativity_exp import run_assortativity_experiment, assortativity_significance
from src.social_ties_exp import run_social_ties_experiment
from collections import defaultdict
import numpy as np

#### Important Note
Currently, a lot of the directories within the experimentaiton files are set to a dummy directory, to prevent overwriting important data. However, the reading in of data for experimentation are set to the proper directories. So, to experiment with self-generated networks, the directories within experimentation.py should be set to the right ones.

## Global Values

- correlations: different values of the news correlation
- num_runs: number of networks generated for each value of the news correlation
- num_nodes: total number of nodes in the network
- update_fraction: the fraction of nodes that sample the news directly
- starting_distribution: fraction of nodes with identity L
- p: probability of creating an edge in a random network
- m: number of edges per node in a scale-free network

In [2]:
correlations = np.linspace(-1, 1, 11)
correlations = np.round(correlations, 1)
initial_seeds = np.linspace(13, 1600, 11)
num_runs = 30
num_nodes = 200
update_fraction = 0.1
average_degree = 8
starting_distribution = 0.5
p = average_degree/(num_nodes-1) 
updates = 300000
m = 4

# vary between random and scale_free
# what_net = "random"
what_net = "scale_free"

## Exemplatory Network Generation

Generating and updating ScaleFreeNetwork

In [3]:
# # Gebruik plot=True om begin plot van distributie te zien
# network = ScaleFreeNetwork(m=m, plot=True)

# for round in range(10000):
#     network.update_round()

# # Gebruik deze functie om de distributie te plotten op het einde
# network.verify_scale_free_distribution(plot=True)

Generating and Updating RandomNetwork

In [4]:
# IPV Network() aan te roepen doe je nu RandomNetwork() of ScaleFreeNetwork()
# # Je kunt rustig dezelfde argumenten meegeven als bij Network() zoals p=0.1, k=8. Als dit wordt leeggelaaten worden standaard waarden gebruikt.

# random_network = RandomNetwork()
# for round in range(10000):
#     random_network.update_round()

## Developing and saving network
This function generates a network (scale-free or random), performs the specified number of updates and reads it out to a .txt file for easy further experimentation. This is done in a parallelized fashion, though it still can take up to 2/3 hours. 

In [None]:
# # dummy values
# num_runs = 10
# updates=1000
# # scale-free
# generate_networks(correlations, initial_seeds, num_nodes=num_nodes, iterations=updates, how_many=num_runs, update_fraction=update_fraction, starting_distribution=starting_distribution, p=p, network_sort=what_net, m=m)

 ## Reading in and generating Network

These function reads in the networks from the .txt in which they were saved. As the network is fully seeded and thus reproducible, the network can be resimulated with the correct seed and connections. 

As a check for the validity of the networks, the below test boolean can be set True. This check takes a ~5 minutes.

This reading in method of the network is alter used for social ties and assortativity experiments, as this allows for efficient experimentation. For the cascade experiments the networks are read in dynamically, as the running of cascades is done in parallel. 

In [6]:
# # # dummy values
# # num_runs = 5
# # updates=1000

# this test only works if the read in network is exactly the same as the generated networks, so check the paths before running!!
test=False


# Read in the network and save it in a datastructure
# all_networks = read_and_load_networks(num_runs, num_nodes, update_fraction, average_degree, starting_distribution, correlations, whichtype=what_net)
all_networks = read_and_load_networks(num_runs, num_nodes, update_fraction, average_degree, starting_distribution, correlations, whichtype="random")

# test for consistency of the saved network
if test:
    used_seed = int(initial_seeds[0])
    if what_net == "scale_free":
        test_network = ScaleFreeNetwork(num_nodes=num_nodes, m=m, mean=0, correlation=-1.0, update_fraction=update_fraction, starting_distribution=starting_distribution, seed=used_seed)  
    else: 
        test_network = RandomNetwork(num_nodes=num_nodes, mean=0, correlation=-1.0, update_fraction=update_fraction, starting_distribution=starting_distribution, seed=used_seed, p=p)
    number_of_alterations = 0


    assert set([(conn[0].ID, conn[1].ID) for conn in all_networks[(-1.0, 0)][0].connections]) == set([(conn[0].ID, conn[1].ID) for conn in test_network.connections]), "The networks that are generated should be the same at the start"

    for _ in range(updates):
        test_network.update_round()
        number_of_alterations += test_network.alterations
        test_network.clean_network()    
        
    assert set([(conn[0].ID, conn[1].ID) for conn in all_networks[(-1.0, 0)][1].connections]) == set([(conn[0].ID, conn[1].ID) for conn in test_network.connections]), "The networks that are generated should be the same at the end"



# Experimentation (Cascades, Assortativity, Social Ties)

### Cascades (Parallelized Implementation)

The process begins by reading in the network and organizing data into structures based on cascade size and correlation value.

Cascades are run while keeping the network structure fixed, measuring both cascade sizes and polarization. These measurements are used to create distributions. A cascade forms when activated nodes sequentially trigger their neighbors, and cascades merge if they share one or more common nodes. The polarization of a cascade (how imbalanced the proportion of political identities is within it) serves as a metric for overall network polarization.

For each correlation value, 30 different networks are analyzed, and 10,000 cascades are run per network. The polarization and prevalence of cascades are then averaged across all 30 runs to ensure consistency.

Cascades are tested both before and after network updates to assess how polarization emerges over time. Once distributions are calculated, results are summarized by averaging polarization values and cascade sizes per correlation value. This allows for meaningful comparisons between networks before vs. after updates and between different network topologies (random vs. scale-free).

In [None]:

# datastructures for scale-free
cascades_before = defaultdict(lambda: defaultdict(list))
cascades_after = defaultdict(lambda: defaultdict(list))
cascades_before_averaged_sf = defaultdict(lambda: defaultdict(list))
cascades_after_averaged_sf = defaultdict(lambda: defaultdict(list))
save=True
sizes = defaultdict()
sizes_averaged = defaultdict()

# datatsturctures for random network
cascades_before_averaged_rand = defaultdict(lambda: defaultdict(list))
cascades_after_averaged_rand = defaultdict(lambda: defaultdict(list))
sizes_averaged_rand = defaultdict()

# run the cascades for different correlations (for both the initial and updated network), saving the cascade polarizations and cascade sizes in a dictionary
# random
for corr in correlations: 
    print(f"starting experimentation for correlation: {corr}")
    print("-----------------------------------------------")

    # reads in the scale free networks (30 networks per correlation value) and runs 10 000 cascades per network
    (before_after, before_after_averaged, largest_sizes) = multiple_correlations_par(corr, num_runs, num_nodes, update_fraction, average_degree, starting_distribution,what_net)
    (collection_of_all_before, collection_of_all_after) = before_after
    (coll_of_all_before_averaged, coll_of_all_after_averaged) = before_after_averaged
    (largest_size_of_all, largest_size_of_all_averaged) = largest_sizes

    # contains a dictionary with the correlation as key, and dictionary as value.
    # containing the sizes and number of times size is observed as value (averaged over 30 runs)
    # in general, average cascade size per sampled indivudual and the average polarization of this cascade is saved for the metric
    sizes_averaged[corr] = largest_size_of_all_averaged
    cascades_before_averaged_sf[corr] = coll_of_all_before_averaged
    cascades_after_averaged_sf[corr] = coll_of_all_after_averaged

# repeat experiments for the scale-free
for corr in correlations: 
    print(f"starting experimentation for correlation: {corr} (random)")
    print("-----------------------------------------------")

    (before_after, before_after_averaged, largest_sizes) = multiple_correlations_par(corr, num_runs, num_nodes, update_fraction, average_degree, starting_distribution,"random")
    (collection_of_all_before, collection_of_all_after) = before_after
    (coll_of_all_before_averaged, coll_of_all_after_averaged) = before_after_averaged
    (largest_size_of_all, largest_size_of_all_averaged) = largest_sizes
    
    # contains a dictionary with the correlation as key, and dictionary as value.
    # containing the sizes and number of times size is observed as value (averaged over 30 runs)
    # in general, average cascade size per sampled indivudual and the average polarization of this cascade is saved for the metric
    sizes_averaged_rand[corr] = largest_size_of_all_averaged
    cascades_before_averaged_rand[corr] = coll_of_all_before_averaged
    cascades_after_averaged_rand[corr] = coll_of_all_after_averaged



#### Annimation of cascade size distribution with average polarization 
uses averaged cascade size per sampled node calculation. Animates the distribution per correlation value.  
does this for both the scale free and random network 

In [None]:

# making animations for both random and scale free
plot_cascade_animation(cascades_before_averaged_sf, cascades_after_averaged_sf, list(reversed(correlations)), sizes_averaged, num_runs, what_net, save=True, averaged=True)
plot_cascade_animation(cascades_before_averaged_rand, cascades_after_averaged_rand, list(reversed(correlations)), sizes_averaged_rand, num_runs, "random", save=True, averaged=True)


##### Summarizing all cascade info in one plot 
for both the random and scale-free network (before vs after), and random vs scale-free (after updating)

In [None]:
# comparing the distributions in one plot: for before and after updating for scale free and random, and after updating for scale free vs random
plot_cascades_gamma((cascades_before_averaged_sf, cascades_after_averaged_sf), num_runs, what_net)
plot_cascades_gamma((cascades_after_averaged_rand, cascades_after_averaged_sf), num_runs, "both")
plot_cascades_gamma((cascades_before_averaged_rand, cascades_after_averaged_rand), num_runs, "random")

#### Phase transition at value 0.8

visualization of phase transition and fitting a pwerlaw. First fitting the full dist at correlation 0.8 and than zooming in for cascade sizes => 2. 

In [None]:
# plotting raw distribution at transition point
plot_cascade_dist_average(cascades_after_averaged_rand[np.float64(0.8)], "after", "random", sizes_averaged_rand[np.float64(0.8)], num_runs, save, np.float64(0.8))
plot_cascade_dist_average(cascades_after_averaged_sf[np.float64(0.8)], "after", "scale_free", sizes_averaged[np.float64(0.8)], num_runs, save, np.float64(0.8))

#plotting zoomed in powerlaw
plot_cascade_power_law(cascades_after_averaged_rand[np.float64(0.8)], "after", "random", sizes_averaged_rand[np.float64(0.8)], num_runs, save, np.float64(0.8))
plot_cascade_power_law(cascades_after_averaged_sf[np.float64(0.8)], "after", "scale_free", sizes_averaged[np.float64(0.8)], num_runs, save, np.float64(0.8))



#### Statistical testing 
calculate significance between random and scale-free (after network is updated), random before and after network is updated and scale-free before vs after network is updated. These values are saved in the folder designated for statistical testing. This is done for different correlation values

In [12]:

cas_sf = (cascades_before_averaged_sf, cascades_after_averaged_sf)
cas_rand = (cascades_before_averaged_rand, cascades_after_averaged_rand)

statistics_cascades(cas_sf, cas_rand, num_runs)

# Calculating the assortativity coefficient

The assortativity coefficient describes the tendency for a node to connect with another node with the same characteristics. In this context, that characteristic is political identity. If the coefficient is 0, a node with political identity L has the same amount of L and R connections on average. If the coefficient is greater than 0, a node with identity L has more L connections on average. With this in mind, we can you this coefficient as a measure for polarization. 

For each value of $\gamma$, there are 30 networks generated and the average, along with the confidence interval at the $p = 95\%$ confidence level is plotted.

We expect that as the news sources diverge, i.e., $\gamma \rightarrow -1$, the assorativity will rise. This experiment is done for both the random and the scale-free network.

First we load the generated networks:

In [None]:
all_random_networks = read_and_load_networks(num_runs=30, num_nodes=200, update_fraction=0.1, average_degree=8, 
                                             starting_distribution=0.5, correlations=correlations, whichtype='random')
all_scalefree_networks = read_and_load_networks(num_runs=30, num_nodes=200, update_fraction=0.1, average_degree=8, 
                                                starting_distribution=0.5, correlations=correlations, whichtype='scale_free')

Using these generated networks, we calculate the assortativity coefficient for both network types and plot the results:

In [None]:
run_assortativity_experiment(all_random_networks, 'random', 30, False, True)
run_assortativity_experiment(all_scalefree_networks, 'scale_free', 30, False, True)

The plots show that the assortativity coeffienct rises as the news sources diverge, which is in line with our expectations.

To determine if there is significant difference between the assortativity of both networks, we do a Welch T-test with the following null hypothesis:

$H_0$: There is no difference between the assortativity coefficient of the random and scale-free network.

By comparing the two networks at all the values for $\gamma$, we find the following:

In [None]:
assortativity_significance(False)

This indicates that for news correlations above 0 there is no difference between the random and scale-free network. For correlations below 0 there is a statistically significant difference between the network types. There is thus less polarization in the scale-free network.

# Net Change in Social Ties

For this experiment, we compare the initial network and the final network and determine if social ties (connections between nodes) are lost/gained depending on the political identity. Again, for each value of $\gamma$, 30 networks are used and the average and confidence intervals are calculated. 

We expect that as the news sources diverge, i.e., $\gamma \rightarrow -1$, nodes will gain social ties with nodes of the same ideology and lose nodes of the opposing ideology. 

Since the same networks are used, we can run the experiment immediately:

In [None]:
run_social_ties_experiment(all_random_networks, 'random', 30, False)
run_social_ties_experiment(all_scalefree_networks, 'scale_free', 30, False)

We find that as the news sources diverge, nodes, on average, will gain social ties with nodes of the same ideology and lose nodes with an opposing ideology. This effect decreases as the news becomes more correlated. The difference between the random and the scale-free network is easily visible. The scale-free network has lower average values, meaning that there is less polarization.