## FairAC Experiments on the Pokec-N dataset
This notebook can be used to run FairAC experiments on the Pokec-N dataset.

It is currently configured to run a full training run on three different seeds.

In [1]:
import sys
import os
import torch
sys.path.append(os.path.abspath("../"))
from base_experiment import ExperimentRunner

### Set up the experiment runner
First we create an experiment runner, which is used to set the random seeds and provide params/logging directories to the different runs.

In [2]:
# Set up the experiment runner with the all the seeds and params we want
experiment = ExperimentRunner(
    experiment_name = "pokec_n_fair_ac_main",
    seeds = [40, 41, 42],
    data_path = "dataset/pokec",
    log_dir="experiments/fair_ac/logs/pokec_n", 
    device=1,
    params=[{"lambda1": 1.0, "lambda2": 0.5}] # for pokec-n the original work used lambda2 = 0.5
)

# After we set up the experiment, we can import the rest
from dataset import PokecN
from models.gnn import WrappedGNNConfig
from models.fair.ac import FairAC, Trainer

FairAC wraps a GNN with a sensitive classifier for the downstream task, so we configure the GNN + sensitive classifier combo for the downstream task


In [3]:
gnn_config = WrappedGNNConfig(
    hidden_dim=128,
    kind="GCN",
    lr=1e-3,
    weight_decay=1e-5,
    kwargs={"dropout": 0.5},
)

In [4]:
for (seed, log_dir, device, params) in experiment.runs():
    print("===========================")
    print(f"Running {experiment.experiment_name} using seed {seed}")
    print(f"Log directory: {log_dir}")
    print(f"Params: {params}")
    print("===========================")
    
    # Load in the dataset
    dataset = PokecN(
        nodes_path=experiment.data_path / "region_job_2.csv",
        edges_path=experiment.data_path / "region_job_2_relationship.txt",
        embedding_path=experiment.data_path / "pokec_n_embedding10.npy",
        feat_drop_rate=0.3,
        device=device
    )

    print(f"Loaded dataset with {dataset.graph.num_nodes()} nodes and {dataset.graph.num_edges()} edges")
    print(f"Using feat_drop_rate: {dataset.feat_drop_rate}")

    # Create FairAC model
    fair_ac = FairAC(
        feature_dim=dataset.features.shape[1],
        transformed_feature_dim=128,
        emb_dim=dataset.embeddings.shape[1],
        attn_vec_dim=128,
        attn_num_heads=1,
        dropout=0.5,
        num_sensitive_classes=1,
    ).to(device)
    print(f"Created FairAC model with {1} sensitive class")
        
    # Create FairAC trainer
    trainer = Trainer(
        ac_model=fair_ac,
        lambda1=params["lambda1"],
        lambda2=params["lambda2"],
        dataset=dataset,
        device=device,
        gnn_config=gnn_config,
        log_dir=log_dir,
        min_acc=0.65,
        min_roc=0.69,
    )
    print(f"Created trainer with {'GCN'} model, using log_dir: {log_dir}")

    print("Starting pre-training phase")
    # Run pre-training
    trainer.pretrain(epochs=200)
    print("Finished pretraining")
    
    # Main training loop, with GNN validation
    print("Starting main training...")
    trainer.train(val_start_epoch=800, val_epoch_interval=200, epochs=2800)
    # As we allocate the entire dataset on the gpu, we need to de-allocate it, before starting over.
    del dataset
    del trainer
    del fair_ac
    torch.cuda.empty_cache()
    print("Cleared cuda cache")


Running pokec_n_fair_ac_main using seed 40
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_40_lambda1_1.0_lambda2_0.5
Params: {'lambda1': 1.0, 'lambda2': 0.5}
Loaded dataset with 66569 nodes and 1100663 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_40_lambda1_1.0_lambda2_0.5
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:156] Found new best fairness of 0.187510
[1000:161] Found new best fairness of 0.186373
[1000:170] Found new best fairness of 0.184871
[1000:178] Found new best fairness of 0.166819
[1000:209] Found new best fairness of 0.166105
[1000:216] Found new best fairness of 0.157949
[1000:275] Found new best fairness of 0.155898
[1000:312] Found new best fairness of 0.155405
[1000:413] Found new best fairness of 0.147429
[1000:484] Found new best fairness of 0.141295
[1000:565] Found new best fairness of 0.125009
[1000:773] Found new best fairness of 0.124847
[1000:782] Found new best fairness of 0.122104
[1000:838] Found new best fairness of 0.117389
[1000:853] Found new best fairness of 0.116170
[1000:883] Found new best fairness of 0.104272
[1000:896] Found new best fairness of 0.084939


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:541] Found new best fairness of 0.083888
[1200:544] Found new best fairness of 0.047388


  0%|          | 0/1000 [00:00<?, ?it/s]

[1400:709] Found new best fairness of 0.029625


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[2200:363] Found new best fairness of 0.025481
[2200:419] Found new best fairness of 0.008727
[2200:669] Found new best fairness of 0.007528
[2200:672] Found new best fairness of 0.006944


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 2200

Best fair model:
	acc: 0.6618
	roc: 0.7162
	parity: 0.0043
	equality: 0.0026
	consistency: 0.4593

Best acc model:
	acc: 0.6977
	roc: 0.7316
	parity: 0.0667
	equality: 0.0942

Best auc model:
	acc: 0.6832
	roc: 0.7361
	parity: 0.0712
	equality: 0.1107
Cleared cuda cache
Running pokec_n_fair_ac_main using seed 41
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_41_lambda1_1.0_lambda2_0.5
Params: {'lambda1': 1.0, 'lambda2': 0.5}
Loaded dataset with 66569 nodes and 1100663 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_41_lambda1_1.0_lambda2_0.5
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:104] Found new best fairness of 0.194767
[1000:105] Found new best fairness of 0.182643
[1000:115] Found new best fairness of 0.181709
[1000:117] Found new best fairness of 0.179067
[1000:121] Found new best fairness of 0.172877
[1000:146] Found new best fairness of 0.170068
[1000:162] Found new best fairness of 0.169331
[1000:174] Found new best fairness of 0.168558
[1000:177] Found new best fairness of 0.164694
[1000:186] Found new best fairness of 0.157964
[1000:203] Found new best fairness of 0.152437
[1000:222] Found new best fairness of 0.149413
[1000:226] Found new best fairness of 0.138994
[1000:271] Found new best fairness of 0.137103
[1000:279] Found new best fairness of 0.134989
[1000:288] Found new best fairness of 0.125270
[1000:299] Found new best fairness of 0.124319
[1000:301] Found new best fairness of 0.110730
[1000:334] Found new best fairness of 0.102231
[1000:384] Found new best fairness of 0.094607


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[2600:287] Found new best fairness of 0.086994


  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 2600

Best fair model:
	acc: 0.6586
	roc: 0.7240
	parity: 0.0403
	equality: 0.0467
	consistency: 0.4593

Best acc model:
	acc: 0.6977
	roc: 0.7328
	parity: 0.0830
	equality: 0.1124

Best auc model:
	acc: 0.6909
	roc: 0.7374
	parity: 0.0585
	equality: 0.0954
Cleared cuda cache
Running pokec_n_fair_ac_main using seed 42
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_42_lambda1_1.0_lambda2_0.5
Params: {'lambda1': 1.0, 'lambda2': 0.5}
Loaded dataset with 66569 nodes and 1100663 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_n/pokec_n_fair_ac_main_42_lambda1_1.0_lambda2_0.5
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:200] Found new best fairness of 0.237771
[1000:201] Found new best fairness of 0.212839
[1000:203] Found new best fairness of 0.197786
[1000:204] Found new best fairness of 0.187332
[1000:232] Found new best fairness of 0.183688
[1000:239] Found new best fairness of 0.182195
[1000:262] Found new best fairness of 0.180586
[1000:265] Found new best fairness of 0.179946
[1000:268] Found new best fairness of 0.175511
[1000:273] Found new best fairness of 0.172731
[1000:278] Found new best fairness of 0.171874
[1000:319] Found new best fairness of 0.161692
[1000:350] Found new best fairness of 0.159740
[1000:384] Found new best fairness of 0.151291
[1000:405] Found new best fairness of 0.145501
[1000:429] Found new best fairness of 0.136262
[1000:517] Found new best fairness of 0.132372
[1000:541] Found new best fairness of 0.131416
[1000:617] Found new best fairness of 0.130102
[1000:692] Found new best fairness of 0.120615
[1000:719] Found new best fairness of 0.117288
[1000:735] Fo

  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:769] Found new best fairness of 0.099709
[1200:801] Found new best fairness of 0.064314
[1200:974] Found new best fairness of 0.057174
[1200:976] Found new best fairness of 0.045659


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1600:556] Found new best fairness of 0.036606
[1600:558] Found new best fairness of 0.032252
[1600:958] Found new best fairness of 0.010643


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 1600

Best fair model:
	acc: 0.6532
	roc: 0.7101
	parity: 0.0007
	equality: 0.0100
	consistency: 0.4597

Best acc model:
	acc: 0.6932
	roc: 0.7155
	parity: 0.0594
	equality: 0.0645

Best auc model:
	acc: 0.6855
	roc: 0.7199
	parity: 0.0450
	equality: 0.0419
Cleared cuda cache
