# Exercise 3: Mitigating biased node rankings
## Overview
In this exercise, we will continue to extend `netin` to create custom models.
We will explore an extended use-case to study the effects of network growth interventions.
While we focus on synthetic data for now, `netin` provides interfaces to load real networks and run the simulation on top of those networks.

## Key Concepts
**Custom Models**: Define and simulate custom models to test your own modelling ideas.
In an extended use-case, we re-create a model to analyze how interventions in the growth process of the network can affect the visibility of the minority group.

## Task
1. Extend a custom class that considers two groups and homophilic interactions instead of just one. Add preferential attachment to the model mix.
2. Create a custom model that is defined by a pre- and post-intervention phase. Through the intervention, the model parameters might change. Analyze and visualize how various parameter changes impact the visibility of the minority.

## Model extensions (continued)
Various steps of analysis require modification of the existing models.
For instance, you may want to create custom models that only slightly change the pre-defined simulation logic or retrieve additional analytics about the simulation process.
The package is highly modular and provides interfaces for these use cases.

## Dependencies

In [None]:
### If running this on Google Colab, run the following lines:

import os
!pip install netin==2.0.0a1
os.kill(os.getpid(), 9)

In [33]:
from typing import Union, Optional, Tuple, List
import pandas as pd
from collections import defaultdict
from itertools import product

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

from netin.models import\
    PAHModel,\
    UndirectedModel
from netin.graphs import\
    Graph,\
    BinaryClassNodeVector
from netin.utils import MINORITY_LABEL, MAJORITY_LABEL, Event
from netin.link_formation_mechanisms import\
    TwoClassHomophily, PreferentialAttachment
import netin.utils.constants as const

### Custom models
The code is structured to be extendable.
You can create custom models by combining existing link formation mechanisms or by creating new ones.

Let's create a model that has two group attributes and combines homophily based on a product of the homophily probabilities.

In [34]:
class TwoGroupHomophilyModel(UndirectedModel):
    SHORT = "TGH" # Short name of the model

    def __init__(self, N:int, m:int,
                 f_m:float,
                 h0:float, h1:float,
                seed:  Optional[Union[int, np.random.Generator]] = None):
        """Constructor for the TwoGroupHomophilyModel class.

        Parameters
        ----------
        N : int
            Number of total nodes
        m : int
            Number of edges to attach from a new node to existing nodes
        f_m : float
            Fraction of minority nodes
        h0 : float
            Homophily for the first attribute
        h1 : float
            Homophily for the second attribute
        seed : Union[int, np.random.Generator], optional
            Random seed, by default 1
        """
        super().__init__(N=N, m=m, seed=seed)
        # Handle remaining parameters
        self.f_m = f_m
        self.h0 = h0
        self.h1 = h1

    def _initialize_node_classes(self):
        """Initializes two node classes for each node.
        """
        self.graph.set_node_class(
            "attribute_0", # Attribute name
            # Values drawn randomly from the minority fraction
            BinaryClassNodeVector.from_fraction(
                N=self.N,
                f_m=self.f_m,
                class_labels=[MAJORITY_LABEL, MINORITY_LABEL],
                rng=self._rng)
        )
        self.graph.set_node_class(
            "attribute_1",
            BinaryClassNodeVector.from_fraction(
                N=self.N,
                f_m=self.f_m,
                class_labels=[MAJORITY_LABEL, MINORITY_LABEL],
                rng=self._rng)
        )

    def _initialize_lfms(self):
        """Initializes the two homophily link formation mechanisms.
        One for each attribute.
        """
        self.lfm_h0 = TwoClassHomophily.from_two_class_homophily(
            node_class_values=self.graph.get_node_class("attribute_0"),
            homophily=(self.h0, self.h0))
        self.lfm_h1 = TwoClassHomophily.from_two_class_homophily(
            node_class_values=self.graph.get_node_class("attribute_1"),
            homophily=(self.h1, self.h1))

    def compute_target_probabilities(self, source: int) -> np.ndarray:
        """Compute the target probabilities for a given source node.
        This function is being called in each simulation iteration to
        compute the target probabilities for all potential target nodes.
        Here, we simply multiply the homophily values based on the
        two node attributes of the source node and target nodes.

        Parameters
        ----------
        source : int
            Source node

        Returns
        -------
        ndarray
            Target probabilities for the source node
        """
        # Calling super() applies the pre-defined filters to avoid
        # self-loops and multiple edges
        return super().compute_target_probabilities(source)\
            * self.lfm_h0.get_target_probabilities(source)\
            * self.lfm_h1.get_target_probabilities(source)


Because most of the simulation logic is contained in the parent class, we can run the simulation as we used to.

In [35]:
N = 200
m = 3
f_m = 0.3

tgh_model = TwoGroupHomophilyModel(N=N, m=m, f_m=f_m, h0=0.5, h1=0.8, seed=123)
tgh_graph = tgh_model.simulate()

To visualize the network, we export it to NetworkX.

In [36]:
def draw_tgh_model(graph: Graph):
    tgh_nx_graph = graph.to_nxgraph()
    pos = nx.kamada_kawai_layout(tgh_nx_graph)

    colors = []
    # Assign one out of four node colors based on the two attribute combinations
    for _, data in tgh_nx_graph.nodes(data=True):
        if data["attribute_0"] == 0 and data["attribute_1"] == 0:
            colors.append("red")
        elif data["attribute_0"] == 1 and data["attribute_1"] == 1:
            colors.append("green")
        elif data["attribute_0"] == 0 and data["attribute_1"] == 1:
            colors.append("blue")
        elif data["attribute_0"] == 1 and data["attribute_1"] == 0:
            colors.append("yellow")

    nx.draw_networkx_nodes(
        tgh_nx_graph,
        pos=pos,
        node_color=colors,
        node_size=[15*deg for deg in dict(tgh_nx_graph.degree()).values()])
    nx.draw_networkx_edges(tgh_nx_graph, pos=pos)


In [None]:
draw_tgh_model(tgh_graph)

**Tasks**:
1. Extend the class to include preferential attachment as a link formation mechanism.
Following the original models, the target probabilities to connect to a new node should be the product of the `Homophily` and `PreferentialAttachment` mechanisms.
For the former, you can reuse `TwoGroupHomophilyModel.compute_target_probabilities()` and multiply it with `PreferentialAttachment.get_target_probabilities()`.
   - `PreferentialAttachment` requires the graph when it is initialized to keep track of the degree changes. It is stored as a parent class attribute (accessible via `self.graph`).
   - You can extend `TwoGroupHomophilyModel` and override only `_initialize_lfms` and `compute_target_probabilities`. Make sure to call `super().compute_target_probabilities()` to include the target probabilities computed by `TwoGroupHomophilyModel`.
2. Plot the result using `draw_tgh_model()`.

### Solution

In [38]:
class PATGHModel(TwoGroupHomophilyModel):
    SHORT = "PATGH"

    def _initialize_lfms(self):
        super()._initialize_lfms()
        self.pa = PreferentialAttachment(N=self.N, graph=self.graph)

    def compute_target_probabilities(self, source: int) -> np.ndarray:
        return super().compute_target_probabilities(source) * self.pa.get_target_probabilities(source)

In [39]:
patgh_model = PATGHModel(N=N, m=m, f_m=f_m, h0=0.5, h1=0.8, seed=123)
patgh_graph = patgh_model.simulate()

In [None]:
draw_tgh_model(patgh_graph)

## Use case: testing network growth interventions
As a showcase and advanced exercise, we use the `NetIn` package to study the effects of growth interventions introduced to the network.
Based on a recent work by Neuhäuser et al. [[1]](#references), we try to understand how growing the minority group or changing the connection behavior can affect the visibility of the minority group.

The model is defined by the two stages of pre- and post-intervention.
Both stages implement a `PAHModel` that add `N // 2` nodes, with the second stage preloading the simulated graph of the first stage.
Moreover, both models can have different parameters.
We interpret the changes in the homophily parameter and minority fraction as as behavioral or group size intervention, respectively.
Such a `GrowthInterventionModel` can be defined by the transition $PAH(f_{pre}, h_{pre}) \rightarrow PAH(f_{post}, h_{post})$ with
- minority fraction $f$
- homophily parameter $h$
 
 **Task**:
1. Create a custom `GrowthInterventionModel` by filling out the code snippet below, fulfilling the following requirements
 - It should inherit from `UndirectedModel` (this includes the initialization logic and event handling).
 - It should take as additional parameters: `h_pre`, `h_post`, `f_m_pre` and `f_m_post`.
 - Overwrite the `simulate()`-function to
   - Create two `PAHModels`, parameterized by the pre- and post-intervention `h` and `f_m`.
   - Simulate the first model and trigger the predefined `EVENT_GROWTH_INTERVENTION` event.
   - Preload the created `Graph` for the second model and return its simulation result (_Hint_: Use `Model.preload_graph`).


In [41]:
EVENT_GROWTH_INTERVENTION = "GROWTH_INTERVENTION"

class GrowthInterventionModel(UndirectedModel):
    SHORT = "GI" # Short name of the model

    # Add the custom event to the list of events triggered by this class
    EVENTS = UndirectedModel.EVENTS + [EVENT_GROWTH_INTERVENTION]

    h_pre: float
    h_post: float

    f_m_pre: float
    f_m_post: float

    def __init__(
            self, *args,
            N: int,
            m: int,
            f_m_pre: float, f_m_post: float,
            h_pre: float, h_post: float,
            seed:  Optional[Union[int, np.random.Generator]] = None,
            **kwargs):
        # Set the parameters and forward the remaining arguments to the super class
        ################
        ## YOUR CODE HERE
        ################
        pass

    def simulate(self) -> Graph:
        # Create the first half of the graph by simulating a ``PAHModel``.
        model_pre = PAHModel(
            ################
            ## YOUR CODE HERE
            ################
        )
        graph_pre = model_pre.simulate()

        # Remove the default event handler for adding links
        graph_pre.remove_event_handler(Event.LINK_ADD_AFTER)

        # Trigger the custom event to allow for modifications
        ################
        ## YOUR CODE HERE
        ################

        # Create the second half of the graph by simulating another ``PAHModel``.
        ################
        ## YOUR CODE HERE
        ################
        model_post = PAHModel(
            ################
            ## YOUR CODE HERE
            ################
        )

        # Preload the graph from the first half and return the simulation result
        ################
        ## YOUR CODE HERE
        ################

        return model_post.simulate()

    def _initialize_lfms(self):
        # Needed to avoid errors when calling the super class
        pass

    def _initialize_node_classes(self):
        # Needed to avoid errors when calling the super class
        pass


### Solution

In [42]:
EVENT_GROWTH_INTERVENTION = "GROWTH_INTERVENTION"

class GrowthInterventionModel(UndirectedModel):
    SHORT = "GI" # Short name of the model

    # Add the custom event to the list of events triggered by this class
    EVENTS = UndirectedModel.EVENTS + [EVENT_GROWTH_INTERVENTION]

    h_pre: float
    h_post: float

    f_m_pre: float
    f_m_post: float

    def __init__(
            self, *args,
            N: int,
            m: int,
            f_m_pre: float, f_m_post: float,
            h_pre: float, h_post: float,
            seed:  Optional[Union[int, np.random.Generator]] = None,
            **kwargs):
        # Set the parameters and forward the remaining arguments to the super class
        super().__init__(
            *args, N=N, m=m, seed=seed, **kwargs)
        self.f_m_pre = f_m_pre
        self.f_m_post = f_m_post
        self.h_pre = h_pre
        self.h_post = h_post

    def simulate(self) -> Graph:
        # Create the first half of the graph by simulating a ``PAHModel``.
        model_pre = PAHModel(
            N=self.N // 2,
            f_m=self.f_m_pre,
            m=self.m,
            h_m=self.h_pre,
            h_M=self.h_pre,
            seed=self._rng)
        graph_pre = model_pre.simulate()

        # Remove the default event handler for adding links
        graph_pre.remove_event_handler(Event.LINK_ADD_AFTER)

        # Trigger the custom event to allow for modifications
        self.trigger_event(event=EVENT_GROWTH_INTERVENTION, graph=graph_pre)

        # Create the second half of the graph by simulating another ``PAHModel``.
        model_post = PAHModel(
            N=self.N // 2,
            f_m=self.f_m_post,
            m=self.m,
            h_m=self.h_post,
            h_M=self.h_post,
            seed=self._rng)

        # Preload the graph from the first half and return the simulation result
        model_post.preload_graph(graph=graph_pre)
        return model_post.simulate()

    def _initialize_lfms(self):
        # Needed to avoid errors when calling the super class
        pass

    def _initialize_node_classes(self):
        # Needed to avoid errors when calling the super class
        pass


Following the original article, we assess the impact of growth interventions by the representation of minority nodes among the highest degree nodes before and after the intervention.

**Tasks**
1. Write a function `get_top_p_minority_fraction` that takes a `Graph` and a percentile `top_p` parameter and returns the fraction of minority nodes among the nodes with a degree that is higher than the `top_p` percentile border.
   - _Hint_: Use `numpy.percentile`, `Graph.degrees()` and `Graph.get_node_class(const.CLASS_ATTRIBUTE)`. Recall that `NodeVector` can be used like a numpy array.
2. Report the impact of the following two interventions $PAH(f_{pre}, h_{pre}) \rightarrow PAH(f_{post}, h_{post})$ by printing the result of `get_top_p_minority_fraction` before and after the intervention:
   1. Behavioral: $f_{pre} = f_{post} = 0.1$, $h_{pre}=0.2$, $h_{post} = 0.8$
   2. Group size: $f_{pre} = 0.1, f_{post} = 0.5$, $h_{pre} = h_{post} = 0.8$
   - _Hint_: Utilize `Model.register_event_handler` to make use of the `EVENT_GROWTH_INTERVENTION` to compute the pre-intervention fraction.
   - _Bonus points_: Run the simulation a couple of times and report the averages to account for random fluctuations.

### Solution

In [43]:
TOP_P = 0.1

N = 1000
m = 2

In [44]:
def get_top_p_minority_fraction(
        graph: Graph,
        top_p: float = TOP_P) -> float:
    """Gets the minority fraction of nodes with a degree above the top p degree percentile.

    Parameters
    ----------
    graph : Graph
        The graph to analyze.
    top_p : float
        The top p percentile.

    Returns
    -------
    float
        The minority fraction of nodes with a degree above the top p percentile.
    """

    # Get the node classes and degrees
    minority_nodes = graph.get_node_class(
        const.CLASS_ATTRIBUTE)
    degrees = graph.degrees()

    # Get the degree cutoff based on the top p percentile
    d_cutoff = np.quantile(degrees, 1 - top_p)

    return np.mean(minority_nodes[degrees >= d_cutoff])


In [45]:
def run_model(f_m_pre: float, f_m_post: float, h_pre: float, h_post: float, n_iter: int = 10)\
    -> Tuple[List[float], List[float]]:
    # Keep track of the pre- and post-intervention top p minority fractions
    # for each simulation run
    f_m_top_pre = []
    f_m_top_post = []

    # Define the model
    model = GrowthInterventionModel(
        N=N, m=m,
        f_m_pre=f_m_pre, f_m_post=f_m_post,
        h_pre=h_pre, h_post=h_post
    )

    # Define a function to keep track of the top p minority fraction
    # prior to the intervention
    def add_intervention_top_p(graph: Graph, f_m_top_pre=f_m_top_pre):
        f_m_top_pre.append(
            get_top_p_minority_fraction(graph=graph))

    # Register the function as an event handler
    model.register_event_handler(
        event=EVENT_GROWTH_INTERVENTION,
        function=lambda graph: add_intervention_top_p(graph=graph))

    # Run the simulation ten times
    for _ in range(n_iter):
        model._initialize_graph()
        graph = model.simulate()

        # Store the post-intervention minority fraction
        f_m_top_post.append(
            get_top_p_minority_fraction(graph=graph))

    print((f"Minority fraction in top {TOP_P:.0%} degree nodes for "
           f"f_m_pre={f_m_pre}, f_m_post={f_m_post} and h_pre={h_pre}, h_post={h_post}:"))
    print(f"\tAfter first phase ({len(f_m_top_pre)} iterations): {np.mean(f_m_top_pre), np.std(f_m_top_pre)}")
    print(f"\tAfter second phase ({len(f_m_top_post)} iterations): {np.mean(f_m_top_post), np.std(f_m_top_post)}")

    return f_m_top_pre, f_m_top_post

In [None]:
f_m = 0.1
h_pre, h_post = 0.2, 0.8
f_m_top_pre, f_m_top_post = run_model(f_m_pre=f_m, f_m_post=f_m, h_pre=h_pre, h_post=h_post)

In [None]:
f_m_pre, f_m_post = 0.1, 0.5
h = 0.8
f_m_top_pre, f_m_top_post = run_model(f_m_pre=f_m_pre, f_m_post=f_m_post, h_pre=h, h_post=h)

The initial parameterization and intervention seem to have a big effect on the outcome.
Let's visualize these effects more systematically.

**Tasks**:
1. Repeat the experiment for all combinations of $h_{pre}, h_{post} \in \{0.2, 0.5, 0.8\}$, storing the pre- and post-intervention fraction of minority nodes among the top $10\%$ degree percentile. Use at least ten iterations per configuration.
2. Plot the results. Focus on the difference of the minority fraction in top ranks between pre- and post-intervention. You're free to choose your preferred form of visualization. For inspiration, you can follow Fig. 2a of [[1]](#references), displaying the line color based on $h_{pre}$, $h_{post}$ on the x-axis and the computed minority fractions on the y-axis differentiating between pre- and post-intervention by markers.
   - _Hint_: You can use `pandas.DataFrame` and `matplotlib.errorbar` to store and visualize the results.

### Solutions
1. Computing the impact of interventions

In [None]:
H_VALS = (0.2, 0.5, 0.8)
f_m = 0.1

df_data = defaultdict(list)

def add_intervention_top_kp(graph: Graph):
    global df_data
    f_top = get_top_p_minority_fraction(graph=graph)
    df_data["f_top_pre"].append(f_top)

for h_pre, h_post in product(H_VALS, repeat=2):
    for _ in range(20):
        model = GrowthInterventionModel(
            N=N, m=m,
            f_m_pre=f_m, f_m_post=f_m,
            h_pre=h_pre, h_post=h_post
        )
        model.register_event_handler(
            event=EVENT_GROWTH_INTERVENTION, function=add_intervention_top_kp)
        graph = model.simulate()
        f_top = get_top_p_minority_fraction(graph=graph)

        df_data["h_pre"].append(h_pre)
        df_data["h_post"].append(h_post)
        df_data["f_top_post"].append(f_top)
df_data = pd.DataFrame(df_data)
df_data


In [None]:
df_data = df_data\
    .groupby(["h_pre", "h_post"])\
    .agg(["mean", "std"])\
    .sort_index()
df_data

2. Plotting the impact

In [None]:
norm = plt.Normalize(vmin=min(H_VALS), vmax=max(H_VALS))
cmap = plt.colormaps["coolwarm"]

plt.axhline(y=f_m_pre, color="black", linestyle="dashed")

for x_off, h_pre in zip([-.05, 0, .05], H_VALS):
    df_data_h = df_data.loc[h_pre]
    color = cmap(norm(h_pre)) if h_pre != H_VALS[1] else "black"

    plt.axvline(x=h_pre, color="lightgray", linestyle="dotted")

    plt.errorbar(
        df_data_h.index + x_off, # h_post
        df_data_h[("f_top_pre", "mean")],
        yerr=df_data_h[("f_top_pre", "std")],
        fmt="x",
        label=f"$f_{{pre}}$, $h_{{pre}}={h_pre}$",
        color=color,
        linestyle="dotted",
        capsize=5
    )

    plt.errorbar(
        df_data_h.index + x_off, # h_post
        df_data_h[("f_top_post", "mean")],
        yerr=df_data_h[("f_top_post", "std")],
        fmt="o",
        label=f"$f_{{post}}$, $h_{{pre}}={h_pre}$",
        color=color,
        linestyle="solid",
        capsize=5
    )

legend_elements = [
    plt.Line2D(
        [0], [0],
        marker='x', color='black', markerfacecolor='black',
        markersize=10,
        label='Pre-intervention',
        linestyle='dotted'),
    plt.Line2D(
        [0], [0],
        marker='o', color='black', markerfacecolor='black',
        markersize=10,
        label='Post-intervention',
        linestyle="solid"),
    plt.Line2D([0], [0], color=cmap(norm(H_VALS[0])), lw=2, label=f'$h_{{pre}}={H_VALS[0]}$'),
    plt.Line2D([0], [0], color="black", lw=2, label=f'$h_{{pre}}={H_VALS[1]}$'),
    plt.Line2D([0], [0], color=cmap(norm(H_VALS[2])), lw=2, label=f'$h_{{pre}}={H_VALS[2]}$')
]
plt.legend(handles=legend_elements, bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(H_VALS)
plt.xlabel("Post-intervention homophily $h_{{post}}$")
plt.ylabel("Minority fraction in top 10% degree nodes")

## References
1. Neuhäuser, L., Karimi, F., Bachmann, J., Strohmaier, M. & Schaub, M. T. Improving the visibility of minorities through network growth interventions. [Commun Phys 6, 1–13 (2023)](https://www.nature.com/articles/s42005-023-01218-9).
