# Exercise #2: Auditing Node Rankings in Directed Networks

## Overview

In this exercise, we will explore how network structure, particularly the mechanisms of edge formation, impacts node ranking algorithms. Node rankings help determine the importance or relevance of nodes in a network, with applications ranging from social networks to citation networks. We will specifically focus on **PageRank**, a widely used algorithm for ranking nodes based on their centrality.

Our goal is to audit how **majority** and **minority** groups are represented in the top-k rankings of PageRank. A real-world example of this issue is the ranking of scholars based on citation or collaboration networks. For instance, how do men and women rank in the top-k of a PageRank algorithm, and how does this compare to their overall representation in the population?

### Key Concepts:
1. **Node Ranking**: Ranking nodes based on their importance using algorithms like degree centrality or PageRank.
2. **Disparity**: The relationship between inequality (distribution of rankings) and inequity (representation of minority nodes in the top-k rankings).
    a. **Inequality**: Measured by the Gini coefficient of the PageRank distribution.
    b. **Inequity**: The representation of minority nodes in the top-k.

We will use the **DPAHModel** to generate multiple synthetic directed networks and calculate **disparity scores** (inequality and inequity) to understand how these networks treat minority nodes in comparison to majority nodes.

This approach was published in [Espín-Noboa et al. (2022)](https://www.nature.com/articles/s41598-022-05434-1) in *Nature Scientific Reports*.

## Task

1. **Generate Synthetic Networks**: Use the `DPAHModel` to create multiple synthetic directed networks with varying parameters.
2. **Compute centrality metrics**: Rank the nodes in each network using a centrality metric e.g., the PageRank algorithm.
3. **Compute Disparity Scores**:
   - Calculate the **Gini coefficient** of the PageRank distribution to measure **inequality**.
   - Analyze the **representation** of minority nodes in the top-k PageRank rankings to measure **inequity**.
4. **Plot and Compare**: Visualize the disparity scores across the networks to see how inequality and inequity vary based on network structure.

## Instructions

1. Use the provided function to generate networks using the `DPAHModel`.
2. Implement PageRank for each network.
3. Write a function to compute the Gini coefficient of the PageRank distribution.
4. Write another function to compute the inequity score, based on the proportion of minority nodes in the top-k PageRank.
5. Plot the disparity scores (inequality and inequity) for comparison.

## Expected Outcome

By the end of this exercise, you will have a deeper understanding of how different network structures influence node rankings, and how inequality and inequity manifest in these rankings. You will also learn to audit algorithmic outcomes in the context of network science.

___

In [None]:
### If running this on Google Colab, run the following lines:
# import os
# !pip install netin==2.0.0a1
# !mkdir plots
# os.kill(os.getpid(), 9)

## Dependencies

In [None]:
## Network models
from netin.models import DPAModel
from netin.models import DHModel
from netin.models import DPAHModel

In [None]:
## Utils
import helper
from netin import viz
from netin.utils import io
from netin.stats import networks as utils_network

## Constants

In [None]:
PLOTS = '../plots/'
io.validate_dir(PLOTS)

## Task 1. Generate Synthetic Directed Graphs

In [None]:
### Fix some parameters of the networks

N = 1000     # number of nodes
d = 0.003    # number of edges to attach to every source node
             # Hint: Remember that the final number of edges will be: e = d * n (n-1)
f_m = 0.1    # fraction of minority group
plo_M = 2.1  # powerlaw out_degree exponent of the majority group (activity)
plo_m = 2.1  # powerlaw out_degree exponent of the minority group (activity)
seed = 12345 # random seed (reproducibility)

model_gen = DPAHModel # Model generator D-PA-H model generates networks with:
                      # Directed edges, Preferential Attachment, and Homophily

In [None]:
# DPAH graphs:
# Homophilic h > 0.5
# Neutral h = 0.5
# Heterophilic h < 0.5

homophily_values = [0.1, 0.5, 0.9]
graph_models = []

for h_M in homophily_values: # homophily within majority nodes
    for h_m in homophily_values: # homophily within minority nodes

        # generating graph
        m = model_gen(N=N, d=d, f_m=f_m, plo_M=plo_M, plo_m=plo_m, h_M=h_M, h_m=h_m, seed=seed)
        m.simulate()

        # updating name to include homophily values
        graph_models.append(m)

## Task 2. Compute Centrality metrics

In [None]:
# generating node metadata dataframe
metadata = []
for m in graph_models:
    df = utils_network.get_node_metadata_as_dataframe(m.graph)
    df.name = model_gen.SHORT
    df.name = helper.get_title(df, m.f_m, m.h_M, m.h_m)
    metadata.append(df)

## Task 3. Getting to know the data

In [None]:
### Setting the look & feel
viz.reset_style()
viz.set_paper_style()

In [None]:
### Plotting al graphs at once
### Showing 3 graphs per row

viz.plot_graph(graph_models,
               nc = 3,
               cell_size = 2.0,
               wspace = 0.1,
               ignore_singletons=True,
               fn = 'plots/4_all_graphs.pdf')