# **Breaking the Dyadic Barrier: Rethinking Fairness in Link Prediction Beyond Demographic Parity**

## **Preparing data**

### **Loading datasets**
This loads the datasets and makes the splits as an example, for the researcher to verify behaviour. The splits are produced again deterministically for each run based on the seed when training the model. 

We use a different seed for this example to not interfere with splits used for training and our results

In [1]:
from pathlib import Path

from helpers.utils import get_dataset, print_graph_statistics
from helpers.metrics import calculate_edge_homophily

datasets = ["facebook", "german", "nba", "pokec_n", "pokec_z", "credit", "chameleon", "airtraffic"]
split_path = "../data/splits"
output_folder = "../data/output"
csv_out = f"{output_folder}/MORAL_edge_distributions.csv"

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
for dataset in datasets:
    print(f"Processing dataset: {dataset}")
    adj, features, train_idx, val_idx, test_idx, labels, sens, sens_idx, data, splits  = get_dataset(dataset, Path(split_path), seed=42)
    _ = calculate_edge_homophily(adj, sens)
    print(f"Features shape: {features.shape}")

Processing dataset: facebook
Edge homophily: 0.5757
Random baseline: 0.5502
Excess homophily Δh = 0.0255
→ Homophilic relative to random mixing
Features shape: torch.Size([1045, 574])
Processing dataset: german
Edge homophily: 0.8048
Random baseline: 0.5722
Excess homophily Δh = 0.2326
→ Homophilic relative to random mixing
Features shape: torch.Size([1000, 27])
Processing dataset: nba
Edge homophily: 0.7237
Random baseline: 0.6100
Excess homophily Δh = 0.1137
→ Homophilic relative to random mixing
Features shape: torch.Size([403, 96])
Processing dataset: pokec_n
Edge homophily: 0.4560
Random baseline: 0.5003
Excess homophily Δh = -0.0443
→ Heterophilic relative to random mixing
Features shape: torch.Size([66569, 266])
Processing dataset: pokec_z
Edge homophily: 0.4507
Random baseline: 0.5001
Excess homophily Δh = -0.0494
→ Heterophilic relative to random mixing
Features shape: torch.Size([67796, 277])
Processing dataset: credit
Edge homophily: 0.8790
Random baseline: 0.8370
Excess hom

### **Check Split Distributions**
During training, after the generation of the splits, the distribution of how the train/val/test edge splits are divided is saved for each dataset. We follow the original implementation of 70/10/20. The sensitive-attribute distribution is preserved across all splits. We implement this by stratifying positives by pair type (0,0)/(0,1)/(1,1), splitting within each group, then combining.

The outputs are saved in a csv file (this csv file is generated during training).

Notes on choices:
- Negative sampling: the paper excerpt does not fully specify the exact negative
sampling scheme. To avoid fairness evaluation artifacts, we sample negatives
per split in the same pair-type proportions as the positives of that split.
- Randomness control: the paper fixes seeds and runs each experiment 3 times. This is also done during training of the model
- The global edge distribution is saved here for calculating the NDKL later, this value is constant and not altered by splitting the data

In [3]:
print_graph_statistics(csv_out)

Dataset    | Nodes | Avg Degree         | Gradients/Epoch | Global π counts (00) | (01)   | (11)   | Global π dist. (00)  | (01)                | (11)                 | Train π counts (00) | (01)   | (11)  | Train π dist. (00)  | (01)                | (11)                 | Valid π counts (00) | (01)  | (11)  | Valid π dist. (00)   | (01)                | (11)                 | Test π counts (00) | (01)  | (11)  | Test π dist. (00)   | (01)                | (11)                 | Train pos/neg overlap | Valid pos/neg overlap | Test pos/neg overlap
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## **Training the model**
Training the model takes a significant amount of time and compute. It is advised to only retrain when necessary. The final ranking results from MORAL are located in data/output.

In [None]:
# ! python main.py --fair_model moral --model gae --dataset credit --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset german --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset nba --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset facebook --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset pokec_n --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset pokec_z --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset chameleon --device cuda:0 --epochs 500
# ! python main.py --fair_model moral --model gae --dataset airtraffic --device cuda:0 --epochs 500


## **Replicating Results**

In [5]:
from helpers.metrics import get_results, print_results

In [6]:
datasets_results= ["facebook", "german", "nba", "credit", "pokec_n", "pokec_z", "chameleon", "airtraffic"]
output_folder = "../data/output"
k = 1000

results = get_results(
    datasets=datasets_results,
    folder=output_folder,
    graph_stats_csv=csv_out,
    splits_dir=split_path,
    k=k,
)
print_results(results, k)



Dataset    | Raw NDKL@1000   | Reranked NDKL@1000 | Raw AWRF@1000   | Reranked AWRF@1000 | Raw DP          | Reranked DP     | Precision@1000  | NDCG@1000      
----------------------------------------------------------------------------------------------------------------------------------------------------------------
facebook   | 0.0277 ± 0.0037 | 0.0084 ± 0.0000    | 0.0325 ± 0.0196 | 0.0032 ± 0.0000    | 0.0074 ± 0.0041 | 0.0004 ± 0.0001 | 0.9740 ± 0.0000 | 0.9960 ± 0.0004
german     | 0.0131 ± 0.0027 | 0.0068 ± 0.0000    | 0.0378 ± 0.0094 | 0.0021 ± 0.0000    | 0.0134 ± 0.0043 | 0.0110 ± 0.0014 | 0.9533 ± 0.0048 | 0.9942 ± 0.0012
nba        | 0.0210 ± 0.0061 | 0.0059 ± 0.0000    | 0.0207 ± 0.0047 | 0.0020 ± 0.0000    | 0.0009 ± 0.0007 | 0.0173 ± 0.0055 | 0.7513 ± 0.0053 | 0.9598 ± 0.0035
credit     | 0.0090 ± 0.0027 | 0.0038 ± 0.0000    | 0.0194 ± 0.0102 | 0.0021 ± 0.0000    | 0.0474 ± 0.0026 | 0.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000
pokec_n    | 0.0215 ± 0.0012 | 0.0