## FairAC Experiments on the NBA dataset
This notebook can be used to run FairAC experiments on the NBA dataset.

It is currently configured to run a full training run on three different seeds.

In [1]:
# Here add the imports for our experiment runner
import sys
import os
import torch
sys.path.append(os.path.abspath("../"))
from base_experiment import ExperimentRunner

### Set up the experiment runner
First we create an experiment runner, which is used to set the random seeds and provide params/logging directories to the different runs.

In [2]:
# Set up the experiment runner with the all the seeds and params we want
experiment = ExperimentRunner(
    experiment_name = "nba_fair_ac_main",
    seeds = [40, 41, 42],
    data_path = "dataset/NBA",
    log_dir="experiments/fair_ac/logs/nba", 
    device=2,
    params=[{"lambda1": 1.0, "lambda2": 1.0}]
)

# After we set up the experiment, we can import the rest
from dataset import NBA
from models.gnn import WrappedGNNConfig
from models.fair.ac import FairAC, Trainer

FairAC wraps a GNN with a sensitive classifier for the downstream task, so we configure the GNN + sensitive classifier combo for the downstream task


In [3]:
gnn_config = WrappedGNNConfig(
    hidden_dim=128,
    kind="GCN",
    lr=1e-3,
    weight_decay=1e-5,
    kwargs={"dropout": 0.5},
)

### Run the experiments
Now we're ready to run the experiments!

We can do this by iterating of the `ExperimentRunner.runs()` method. This method returns a generator that yields the seed, logging directory, device and the params for the current experiment run.


For each experiment run we first:
1. Load in the dataset
2. Create the FairAC model instance
3. Create the FairAC trainer


Once everything is initiliased, we can run the pretraining using `Trainer.pretrain()`. This trains the AE and sensitivity classifier.

Then we run the main training loop, this trains the full FairAC model for the remaining epochs.

In [4]:
for (seed, log_dir, device, params) in experiment.runs():
    print("===========================")
    print(f"Running {experiment.experiment_name} using seed {seed}")
    print(f"Log directory: {log_dir}")
    print(f"Params: {params}")
    print("===========================")
    
    # Load in the dataset
    dataset = NBA(
        nodes_path=experiment.data_path / "nba.csv",
        edges_path=experiment.data_path / "nba_relationship.txt",
        embedding_path=experiment.data_path / "nba_embedding10.npy",
        feat_drop_rate=0.3,
        device=experiment.device
    )

    print(f"Loaded dataset with {dataset.graph.num_nodes()} nodes and {dataset.graph.num_edges()} edges")
    print(f"Using feat_drop_rate: {dataset.feat_drop_rate}")

    # Create FairAC model
    fair_ac = FairAC(
        feature_dim=dataset.features.shape[1],
        transformed_feature_dim=128,
        emb_dim=dataset.embeddings.shape[1],
        attn_vec_dim=128,
        attn_num_heads=1,
        dropout=0.5,
        num_sensitive_classes=1,
    ).to(experiment.device)
    print(f"Created FairAC model with {1} sensitive class")
        
    # Create FairAC trainer
    trainer = Trainer(
        ac_model=fair_ac,
        lambda1=params["lambda1"],
        lambda2=params["lambda2"],
        dataset=dataset,
        device=experiment.device,
        gnn_config=gnn_config,
        log_dir=log_dir,
        min_acc=0.65,
        min_roc=0.69,
    )
    print(f"Created trainer with {'GCN'} model, using log_dir: {log_dir}")

    print("Starting pre-training phase")
    # Run pre-training
    trainer.pretrain(epochs=200)
    print("Finished pretraining")
    
    # Main training loop, with GNN validation
    print("Starting main training...")
    trainer.train(val_start_epoch=800, val_epoch_interval=200, epochs=2800)


Running nba_fair_ac_main using seed 40
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_40_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 403 nodes and 21645 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_40_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:303] Found new best fairness of 0.220351
[1000:334] Found new best fairness of 0.195034
[1000:340] Found new best fairness of 0.088076
[1000:389] Found new best fairness of 0.069895
[1000:450] Found new best fairness of 0.069089
[1000:600] Found new best fairness of 0.038081
[1000:638] Found new best fairness of 0.012764
[1000:733] Found new best fairness of 0.006435


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1800:732] Found new best fairness of 0.005311


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[2200:741] Found new best fairness of 0.002036


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 2200

Best fair model:
	acc: 0.6526
	roc: 0.7342
	parity: 0.0006
	equality: 0.0015
	consistency: 0.0264

Best acc model:
	acc: 0.7371
	roc: 0.7589
	parity: 0.0865
	equality: 0.1446

Best auc model:
	acc: 0.5962
	roc: 0.7691
	parity: 0.0254
	equality: 0.0351
Running nba_fair_ac_main using seed 41
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_41_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 403 nodes and 21645 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_41_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:200] Found new best fairness of 0.125246
[1000:210] Found new best fairness of 0.094724
[1000:221] Found new best fairness of 0.088395
[1000:266] Found new best fairness of 0.073807
[1000:279] Found new best fairness of 0.055944
[1000:308] Found new best fairness of 0.050420
[1000:324] Found new best fairness of 0.043286
[1000:329] Found new best fairness of 0.036957


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:135] Found new best fairness of 0.030415
[1200:230] Found new best fairness of 0.024086
[1200:261] Found new best fairness of 0.017297
[1200:309] Found new best fairness of 0.002036


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 1200

Best fair model:
	acc: 0.6526
	roc: 0.7357
	parity: 0.0006
	equality: 0.0015
	consistency: 0.0264

Best acc model:
	acc: 0.7512
	roc: 0.7799
	parity: 0.0281
	equality: 0.1183

Best auc model:
	acc: 0.5587
	roc: 0.7857
	parity: 0.0268
	equality: 0.0365
Running nba_fair_ac_main using seed 42
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_42_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 403 nodes and 21645 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/nba/nba_fair_ac_main_42_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:431] Found new best fairness of 0.429601
[1000:471] Found new best fairness of 0.413570
[1000:487] Found new best fairness of 0.398761
[1000:538] Found new best fairness of 0.151049
[1000:605] Found new best fairness of 0.138709
[1000:657] Found new best fairness of 0.132380
[1000:679] Found new best fairness of 0.113393
[1000:752] Found new best fairness of 0.101053
[1000:765] Found new best fairness of 0.074294
[1000:823] Found new best fairness of 0.073055
[1000:892] Found new best fairness of 0.067646


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:877] Found new best fairness of 0.042250
[1200:886] Found new best fairness of 0.017651


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 1200

Best fair model:
	acc: 0.6573
	roc: 0.7335
	parity: 0.0021
	equality: 0.0156
	consistency: 0.0264

Best acc model:
	acc: 0.7136
	roc: 0.7438
	parity: 0.1166
	equality: 0.2473

Best auc model:
	acc: 0.5822
	roc: 0.7662
	parity: 0.0056
	equality: 0.0740
