## FairAC Experiments on the Pokec-Z dataset
This notebook can be used to run FairAC experiments on the Pokec-Z dataset.

It is currently configured to run a full training run on three different seeds.

In [1]:
import sys
import os
import torch
sys.path.append(os.path.abspath("../"))
from base_experiment import ExperimentRunner

### Set up the experiment runner
First we create an experiment runner, which is used to set the random seeds and provide params/logging directories to the different runs.

In [2]:
# Set up the experiment runner with the all the seeds and params we want
experiment = ExperimentRunner(
    experiment_name = "pokec_z_fair_ac_main",
    seeds = [40, 41, 42],
    data_path = "dataset/pokec",
    log_dir="experiments/fair_ac/logs/pokec_z", 
    device=3,
    params=[{"lambda1": 1.0, "lambda2": 1.0}]
)

# after we set up the experiment, we can import the rest
from dataset import PokecZ
from models.gnn import WrappedGNNConfig
from models.fair.ac import FairAC, Trainer

FairAC wraps a GNN with a sensitive classifier for the downstream task, so we configure the GNN + sensitive classifier combo for the downstream task


In [3]:
gnn_config = WrappedGNNConfig(
    hidden_dim=128,
    kind="GCN",
    lr=1e-3,
    weight_decay=1e-5,
    kwargs={"dropout": 0.5},
)

### Run the experiments
Now we're ready to run the experiments!

We can do this by iterating of the `ExperimentRunner.runs()` method. This method returns a generator that yields the seed, logging directory, device and the params for the current experiment run.


For each experiment run we first:
1. Load in the dataset
2. Create the FairAC model instance
3. Create the FairAC trainer


Once everything is initiliased, we can run the pretraining using `Trainer.pretrain()`. This trains the AE and sensitivity classifier.

Then we run the main training loop, this trains the full FairAC model for the remaining epochs.

In [4]:
for (seed, log_dir, device, params) in experiment.runs():
    print("===========================")
    print(f"Running {experiment.experiment_name} using seed {seed}")
    print(f"Log directory: {log_dir}")
    print(f"Params: {params}")
    print("===========================")
    
    # Load in the dataset
    dataset = PokecZ(
        nodes_path=experiment.data_path / "region_job.csv",
        edges_path=experiment.data_path / "region_job_relationship.txt",
        embedding_path=experiment.data_path / "pokec_z_embedding10.npy",
        feat_drop_rate=0.3,
        device=device
    )

    print(f"Loaded dataset with {dataset.graph.num_nodes()} nodes and {dataset.graph.num_edges()} edges")
    print(f"Using feat_drop_rate: {dataset.feat_drop_rate}")

    # Create FairAC model
    fair_ac = FairAC(
        feature_dim=dataset.features.shape[1],
        transformed_feature_dim=128,
        emb_dim=dataset.embeddings.shape[1],
        attn_vec_dim=128,
        attn_num_heads=1,
        dropout=0.5,
        num_sensitive_classes=1,
    ).to(device)
    print(f"Created FairAC model with {1} sensitive class")
        
    # Create FairAC trainer
    trainer = Trainer(
        ac_model=fair_ac,
        lambda1=params["lambda1"],
        lambda2=params["lambda2"],
        dataset=dataset,
        device=device,
        gnn_config=gnn_config,
        log_dir=log_dir,
        min_acc=0.65,
        min_roc=0.69,
    )
    print(f"Created trainer with {'GCN'} model, using log_dir: {log_dir}")

    print("Starting pre-training phase")
    # Run pre-training
    trainer.pretrain(epochs=200)
    print("Finished pretraining")
    
    # Main training loop, with GNN validation
    print("Starting main training...")
    trainer.train(val_start_epoch=800, val_epoch_interval=200, epochs=2800)

    # As we allocate the entire dataset on the gpu, we need to de-allocate it, before starting over.
    del dataset
    del trainer
    del fair_ac
    torch.cuda.empty_cache()
    print("Cleared cuda cache")


Running pokec_z_fair_ac_main using seed 40
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_40_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 67796 nodes and 1303712 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_40_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:102] Found new best fairness of 0.045772
[1000:103] Found new best fairness of 0.043710
[1000:107] Found new best fairness of 0.029879
[1000:118] Found new best fairness of 0.020475
[1000:119] Found new best fairness of 0.007495
[1000:132] Found new best fairness of 0.007074
[1000:135] Found new best fairness of 0.005992
[1000:144] Found new best fairness of 0.005371
[1000:165] Found new best fairness of 0.004866
[1000:185] Found new best fairness of 0.003466
[1000:204] Found new best fairness of 0.002981
[1000:207] Found new best fairness of 0.002808
[1000:221] Found new best fairness of 0.002123
[1000:232] Found new best fairness of 0.001088
[1000:264] Found new best fairness of 0.001005


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:311] Found new best fairness of 0.000464
[1200:388] Found new best fairness of 0.000358


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[2000:232] Found new best fairness of 0.000307


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 2000

Best fair model:
	acc: 0.6582
	roc: 0.7155
	parity: 0.0000
	equality: 0.0003
	consistency: 0.4133

Best acc model:
	acc: 0.6875
	roc: 0.7408
	parity: 0.0809
	equality: 0.0883

Best auc model:
	acc: 0.6754
	roc: 0.7452
	parity: 0.0545
	equality: 0.0626
Cleared cuda cache
Running pokec_z_fair_ac_main using seed 41
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_41_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 67796 nodes and 1303712 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_41_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:200] Found new best fairness of 0.056814
[1000:201] Found new best fairness of 0.050999
[1000:214] Found new best fairness of 0.050602
[1000:215] Found new best fairness of 0.047681
[1000:218] Found new best fairness of 0.045858
[1000:225] Found new best fairness of 0.043577
[1000:239] Found new best fairness of 0.042133
[1000:382] Found new best fairness of 0.030558


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:240] Found new best fairness of 0.020588


  0%|          | 0/1000 [00:00<?, ?it/s]

[1400:222] Found new best fairness of 0.009150


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1800:194] Found new best fairness of 0.005190
[1800:271] Found new best fairness of 0.003399
[1800:291] Found new best fairness of 0.002094


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

Finished training!
Best epoch: 1800

Best fair model:
	acc: 0.6543
	roc: 0.7055
	parity: 0.0021
	equality: 0.0000
	consistency: 0.4133

Best acc model:
	acc: 0.6789
	roc: 0.7339
	parity: 0.0592
	equality: 0.0517

Best auc model:
	acc: 0.6594
	roc: 0.7443
	parity: 0.0270
	equality: 0.0020
Cleared cuda cache
Running pokec_z_fair_ac_main using seed 42
Log directory: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_42_lambda1_1.0_lambda2_1.0
Params: {'lambda1': 1.0, 'lambda2': 1.0}
Loaded dataset with 67796 nodes and 1303712 edges
Using feat_drop_rate: 0.3
Created FairAC model with 1 sensitive class
Created trainer with GCN model, using log_dir: /home/fact21/fact_refactor/experiments/fair_ac/logs/pokec_z/pokec_z_fair_ac_main_42_lambda1_1.0_lambda2_1.0
Starting pre-training phase


  0%|          | 0/200 [00:00<?, ?it/s]

Finished pretraining
Starting main training...


  0%|          | 0/2800 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[1000:113] Found new best fairness of 0.029277
[1000:120] Found new best fairness of 0.013569
[1000:270] Found new best fairness of 0.002787


  0%|          | 0/1000 [00:00<?, ?it/s]

[1200:445] Found new best fairness of 0.001988


  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

  0%|          | 0/1000 [00:00<?, ?it/s]

[2799:141] Found new best fairness of 0.001149
Finished training!
Best epoch: 2799

Best fair model:
	acc: 0.6500
	roc: 0.6920
	parity: 0.0003
	equality: 0.0009
	consistency: 0.4133

Best acc model:
	acc: 0.6711
	roc: 0.7278
	parity: 0.0418
	equality: 0.0443

Best auc model:
	acc: 0.6321
	roc: 0.7391
	parity: 0.0004
	equality: 0.0266
Cleared cuda cache
