## FairAC Experiments on the NBA dataset
This notebook can be used to run FairAC experiments on the NBA dataset.

It is currently configured to run a full training run on three different seeds.

In [None]:
# Here add the imports for our experiment runner
import sys
import os
import torch
sys.path.append(os.path.abspath("../"))
from base_experiment import ExperimentRunner

### Set up the experiment runner
First we create an experiment runner, which is used to set the random seeds and provide params/logging directories to the different runs.

In [None]:
# Set up the experiment runner with the all the seeds and params we want
experiment = ExperimentRunner(
    experiment_name = "nba_fair_ac_main",
    seeds = [40, 41, 42],
    data_path = "dataset/NBA",
    log_dir="experiments/fair_ac/logs/nba", 
    device=2,
    params=[{"lambda1": 1.0, "lambda2": 1.0}]
)

# After we set up the experiment, we can import the rest
from dataset import NBA
from models.gnn import WrappedGNNConfig
from models.fair.ac import FairAC, Trainer

FairAC wraps a GNN with a sensitive classifier for the downstream task, so we configure the GNN + sensitive classifier combo for the downstream task


In [None]:
gnn_config = WrappedGNNConfig(
    hidden_dim=128,
    kind="GCN",
    lr=1e-3,
    weight_decay=1e-5,
    kwargs={"dropout": 0.5},
)

### Run the experiments
Now we're ready to run the experiments!

We can do this by iterating of the `ExperimentRunner.runs()` method. This method returns a generator that yields the seed, logging directory, device and the params for the current experiment run.


For each experiment run we first:
1. Load in the dataset
2. Create the FairAC model instance
3. Create the FairAC trainer


Once everything is initiliased, we can run the pretraining using `Trainer.pretrain()`. This trains the AE and sensitivity classifier.

Then we run the main training loop, this trains the full FairAC model for the remaining epochs.

In [None]:
for (seed, log_dir, device, params) in experiment.runs():
    print("===========================")
    print(f"Running {experiment.experiment_name} using seed {seed}")
    print(f"Log directory: {log_dir}")
    print(f"Params: {params}")
    print("===========================")
    
    # Load in the dataset
    dataset = NBA(
        nodes_path=experiment.data_path / "nba.csv",
        edges_path=experiment.data_path / "nba_relationship.txt",
        embedding_path=experiment.data_path / "nba_embedding10.npy",
        feat_drop_rate=0.3,
        device=experiment.device
    )

    print(f"Loaded dataset with {dataset.graph.num_nodes()} nodes and {dataset.graph.num_edges()} edges")
    print(f"Using feat_drop_rate: {dataset.feat_drop_rate}")

    # Create FairAC model
    fair_ac = FairAC(
        feature_dim=dataset.features.shape[1],
        transformed_feature_dim=128,
        emb_dim=dataset.embeddings.shape[1],
        attn_vec_dim=128,
        attn_num_heads=1,
        dropout=0.5,
        num_sensitive_classes=1,
    ).to(experiment.device)
    print(f"Created FairAC model with {1} sensitive class")
        
    # Create FairAC trainer
    trainer = Trainer(
        ac_model=fair_ac,
        lambda1=params["lambda1"],
        lambda2=params["lambda2"],
        dataset=dataset,
        device=experiment.device,
        gnn_config=gnn_config,
        log_dir=log_dir,
        min_acc=0.65,
        min_roc=0.69,
    )
    print(f"Created trainer with {'GCN'} model, using log_dir: {log_dir}")

    print("Starting pre-training phase")
    # Run pre-training
    trainer.pretrain(epochs=200)
    print("Finished pretraining")
    
    # Main training loop, with GNN validation
    print("Starting main training...")
    trainer.train(val_start_epoch=800, val_epoch_interval=200, epochs=2800)
