## Approach 2: MC Dropout
This second approach to active learning changes the method by which model uncertainty is estimated. With this approach, a single model is trained, then to estimate uncertainty, the dropout is kept on during evaluation stage of the training pool and the model is run over this pool multiple times. The variance of each of these predictions will be used to guide new batch selection.

In [1]:
from scripts.data_utils import train_val_test_split
from scripts.config import (
    DATA_PATH, 
    SEQUENCE_COL, 
    SCORE_COL, 
    TOK_MODEL, 
    VAL_SPLIT,
    TEST_SPLIT,
    BATCH_SIZE,
    RANDOM_SEED,
)

training_pool, val_dataloader, test_dataloader = train_val_test_split(
    DATA_PATH,
    SEQUENCE_COL,
    SCORE_COL,
    TOK_MODEL,
    VAL_SPLIT,
    TEST_SPLIT,
    BATCH_SIZE,
    RANDOM_SEED
)

## Step-by-step
1. Select initial random training set
2. Fine-tune a single model
3. Set model to eval mode, but manually reenable dropout layers
4. For k times, evalute training pool
5. Calculate variance of k predictions for each sequence (acquisition score)
6. Select N samples with highest acquisition scores.
7. Repeat 2-6.

In [2]:
from scripts.models import enable_dropout
import torch
from tqdm import tqdm

def get_ensemble_predictions(model, dropout_prob, pool_dataloader, k_pred):
    # automatically detect what device the model is on
    device = next(model.parameters()).device

    # set model to eval mode, but keep dropout on
    model.eval()
    enable_dropout(model, dropout_prob)

    all_preds = []

    for _ in tqdm(range(k_pred), desc="MC Dropout Passes"):
        current_pass_preds = []
        with torch.inference_mode():
            for inputs, _ in pool_dataloader:
            # get model predictions, append them to list (num batches, batch size)
                inputs = {k: v.to(device) for k, v in inputs.items()}
                outputs = model(**inputs)
                preds = outputs.logits
                current_pass_preds.append(preds.cpu())
        
        # once all pool has been surveyed, concatenate into one tensor
        current_pass_preds = torch.cat(current_pass_preds, dim=0)
        
        # append full pass tensor to all predictions list
        all_preds.append(current_pass_preds)

    # convert list to tensor with shape (k_pred, num_sequences, num_labels)
    all_preds = torch.stack(all_preds)
    return all_preds

In [3]:
from pathlib import Path
import pandas as pd
import numpy as np
from scripts.acquisition import acquire_new_batch, get_variances
from scripts.training import initialize_and_train_new_model, test_model
from scripts.campaigns import run_standard_finetuning

def get_learning_curves(
        n_samples,
        initial_n_samples,
        n_samples_per_batch,
        model_name,
        approach, 
        learning_rate, 
        weight_decay, 
        epochs, 
        training_pool,
        train_dataloader_batch_size,
        pool_dataloader_batch_size, 
        val_dataloader, 
        test_dataloader,
        patience=5,
        k_pred=24,
        dropout_prob=0.1,
        results_path="active_vs_standard_learning_curves.csv"
):
    results_path = Path(results_path)
    results_dir = results_path.parent
    results_dir.mkdir(parents=True, exist_ok=True)

    # Load existing results if the file exists, otherwise start with a fresh DataFrame.
    if results_path.exists():
        all_results_df = pd.read_csv(results_path)
    else:
        all_results_df = pd.DataFrame()
    
    total_pool_size = len(training_pool)
    unlabeled_indices = np.arange(total_pool_size)
    labeled_indices = np.array([], dtype=np.int64)

    ensemble_predictions = None
    current_cycle = 1
    total_cycles = int(np.ceil((n_samples-initial_n_samples)/n_samples_per_batch)) + 1
    
    while len(labeled_indices) < n_samples and len(unlabeled_indices) > 0:
        print(f"\nCycle {current_cycle}/{total_cycles}\n-------------------------------------------------")

        # on the first cycle, choose random samples of initial_n_samples size
        if ensemble_predictions is None:
            print(f"Choosing initial {initial_n_samples} samples randomly...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, train_dataloader_batch_size, pool_dataloader_batch_size, initial_n_samples, n_samples_per_batch, labeled_indices, unlabeled_indices, acquisition_scores=None
            )
        # each other time, use the n_samples_per_batch with acquisition scores to select
        else:
            scores = get_variances(ensemble_predictions, f"results/02_mc_dropout/variances{current_cycle}.csv")
            print(f"Selecting new data points...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, train_dataloader_batch_size, pool_dataloader_batch_size, initial_n_samples, n_samples_per_batch, labeled_indices, unlabeled_indices, acquisition_scores=scores
            )
        
        # give message when loop ends
        if len(unlabeled_indices) == 0:
            print("Unlabeled pool is empty. Proceeding to final model training.")
            break
        
        # evaluate active vs standard
        final_results = []

        # active
        print(f"\nTraining and evaluating model using {len(labeled_indices)} actively selected samples...")
        model_active = initialize_and_train_new_model(approach, model_name, learning_rate, weight_decay, epochs, train_dataloader, val_dataloader, patience, return_history=False)
        results_active = test_model(model_active, test_dataloader, return_results=True)
        results_active = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'active',
            **results_active
        }
        final_results.append(results_active)

        # standard
        print(f"\nTraining and evaluating model using {len(labeled_indices)} randomly selected samples...")
        model_standard, _ = run_standard_finetuning(len(labeled_indices), approach, model_name, train_dataloader_batch_size, learning_rate, weight_decay, epochs, training_pool, val_dataloader, patience)
        results_standard = test_model(model_standard, test_dataloader, return_results=True)
        results_standard = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'standard',
            **results_standard
        }
        final_results.append(results_standard)

        # save to disk each time to save progress
        results_df = pd.DataFrame(final_results)
        all_results_df = pd.concat([all_results_df, results_df], ignore_index=True)
        all_results_df.to_csv(results_path, index=False)
        print(f"Progress for experiment {current_cycle-1} appended to {results_path}")

        # if it's the last cycle, skip ensemble predictions
        if (current_cycle == total_cycles):
            print("Experiments complete.")
            break

        print("Starting ensemble training and pool evaluation...")
        ensemble_predictions = get_ensemble_predictions(model_active, dropout_prob, pool_dataloader, k_pred)

        current_cycle += 1
    return all_results_df

In [4]:
from scripts.config import (
    MODEL_NAME,
    APPROACH,
    LEARNING_RATE,
    WEIGHT_DECAY,
    EPOCHS,
    POOL_BATCH_SIZE,
    PATIENCE,
    K_PREDS
)

get_learning_curves(
    n_samples=256,
    initial_n_samples=16,
    n_samples_per_batch=16,
    model_name=MODEL_NAME,
    approach=APPROACH,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    epochs=EPOCHS,
    training_pool=training_pool,
    train_dataloader_batch_size=BATCH_SIZE,
    pool_dataloader_batch_size=POOL_BATCH_SIZE,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    patience=PATIENCE,
    k_pred=K_PREDS,
    results_path='results/02_mc_dropout/active_vs_standard_learning_curve.csv'
)


Cycle 1/16
-------------------------------------------------
Choosing initial 16 samples randomly...

Training and evaluating model using 16 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:22<00:00,  2.26it/s]


Train Loss: 0.0090 | Val Loss: 0.2287 | SpearmanR: 0.2684


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.43it/s]



Training and evaluating model using 16 randomly selected samples...


[Training]:  38%|███▊      | 19/50 [00:08<00:13,  2.34it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0491 | Val Loss: 0.2053 | SpearmanR: 0.0516


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.59it/s]


Progress for experiment 0 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:52<00:00,  2.18s/it]



Cycle 2/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances2.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 32 actively selected samples...


[Training]:  68%|██████▊   | 34/50 [00:15<00:07,  2.19it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0452 | Val Loss: 0.2067 | SpearmanR: 0.3450


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 71.59it/s]



Training and evaluating model using 32 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:23<00:00,  2.14it/s]


Train Loss: 0.0131 | Val Loss: 0.1990 | SpearmanR: 0.3544


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 71.91it/s]


Progress for experiment 1 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:52<00:00,  2.18s/it]



Cycle 3/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances3.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 48 actively selected samples...


[Training]:  62%|██████▏   | 31/50 [00:15<00:09,  2.06it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0631 | Val Loss: 0.1961 | SpearmanR: 0.4970


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.91it/s]



Training and evaluating model using 48 randomly selected samples...


[Training]:  70%|███████   | 35/50 [00:17<00:07,  2.04it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0109 | Val Loss: 0.1482 | SpearmanR: 0.5257


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.32it/s]


Progress for experiment 2 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:51<00:00,  2.15s/it]



Cycle 4/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances4.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 64 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:25<00:00,  1.98it/s]


Train Loss: 0.0279 | Val Loss: 0.1790 | SpearmanR: 0.4773


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 71.97it/s]



Training and evaluating model using 64 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:25<00:00,  1.94it/s]


Train Loss: 0.0553 | Val Loss: 0.1688 | SpearmanR: 0.4878


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.44it/s]


Progress for experiment 3 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:51<00:00,  2.14s/it]



Cycle 5/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances5.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 80 actively selected samples...


[Training]:  66%|██████▌   | 33/50 [00:18<00:09,  1.81it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0313 | Val Loss: 0.1769 | SpearmanR: 0.5490


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.04it/s]



Training and evaluating model using 80 randomly selected samples...


[Training]:  60%|██████    | 30/50 [00:16<00:11,  1.78it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0411 | Val Loss: 0.1942 | SpearmanR: 0.4401


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.06it/s]


Progress for experiment 4 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:51<00:00,  2.14s/it]



Cycle 6/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances6.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 96 actively selected samples...


[Training]:  88%|████████▊ | 44/50 [00:25<00:03,  1.69it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0175 | Val Loss: 0.1745 | SpearmanR: 0.5485


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.17it/s]



Training and evaluating model using 96 randomly selected samples...


[Training]:  24%|██▍       | 12/50 [00:07<00:24,  1.53it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.1334 | Val Loss: 0.1503 | SpearmanR: 0.4549


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.79it/s]


Progress for experiment 5 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:50<00:00,  2.09s/it]



Cycle 7/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances7.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 112 actively selected samples...


[Training]:  46%|████▌     | 23/50 [00:16<00:18,  1.43it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0641 | Val Loss: 0.1563 | SpearmanR: 0.5351


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 74.04it/s]



Training and evaluating model using 112 randomly selected samples...


[Training]:  72%|███████▏  | 36/50 [00:26<00:10,  1.35it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0154 | Val Loss: 0.1417 | SpearmanR: 0.5573


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.97it/s]


Progress for experiment 6 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.08s/it]



Cycle 8/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances8.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 128 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:36<00:00,  1.36it/s]


Train Loss: 0.0224 | Val Loss: 0.1439 | SpearmanR: 0.6152


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 74.39it/s]



Training and evaluating model using 128 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:41<00:00,  1.21it/s]


Train Loss: 0.0206 | Val Loss: 0.1617 | SpearmanR: 0.5830


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 74.07it/s]


Progress for experiment 7 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.06s/it]



Cycle 9/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances9.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 144 actively selected samples...


[Training]:  96%|█████████▌| 48/50 [00:41<00:01,  1.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0276 | Val Loss: 0.1649 | SpearmanR: 0.5648


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 74.03it/s]



Training and evaluating model using 144 randomly selected samples...


[Training]:  72%|███████▏  | 36/50 [00:28<00:10,  1.29it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0505 | Val Loss: 0.1197 | SpearmanR: 0.6391


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.68it/s]


Progress for experiment 8 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.07s/it]



Cycle 10/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances10.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 160 actively selected samples...


[Training]:  86%|████████▌ | 43/50 [00:31<00:05,  1.37it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0312 | Val Loss: 0.1337 | SpearmanR: 0.5957


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.43it/s]



Training and evaluating model using 160 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:35<00:00,  1.40it/s]


Train Loss: 0.0351 | Val Loss: 0.1425 | SpearmanR: 0.6527


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.47it/s]


Progress for experiment 9 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.06s/it]



Cycle 11/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances11.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 176 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:36<00:00,  1.35it/s]


Train Loss: 0.0425 | Val Loss: 0.1459 | SpearmanR: 0.6037


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.27it/s]



Training and evaluating model using 176 randomly selected samples...


[Training]:  96%|█████████▌| 48/50 [00:36<00:01,  1.31it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0194 | Val Loss: 0.1579 | SpearmanR: 0.6019


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.67it/s]


Progress for experiment 10 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [08:44<00:00, 21.87s/it] 



Cycle 12/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances12.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 192 actively selected samples...


[Training]:  88%|████████▊ | 44/50 [00:35<00:04,  1.24it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0480 | Val Loss: 0.1434 | SpearmanR: 0.6488


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.18it/s]



Training and evaluating model using 192 randomly selected samples...


[Training]:  70%|███████   | 35/50 [00:28<00:12,  1.21it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0469 | Val Loss: 0.1227 | SpearmanR: 0.6404


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.37it/s]


Progress for experiment 11 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.06s/it]



Cycle 13/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances13.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 208 actively selected samples...


[Training]:  84%|████████▍ | 42/50 [00:34<00:06,  1.21it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0350 | Val Loss: 0.1390 | SpearmanR: 0.6288


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.60it/s]



Training and evaluating model using 208 randomly selected samples...


[Training]:  86%|████████▌ | 43/50 [00:36<00:05,  1.18it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0168 | Val Loss: 0.1222 | SpearmanR: 0.6614


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 73.25it/s]


Progress for experiment 12 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:49<00:00,  2.04s/it]



Cycle 14/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances14.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 224 actively selected samples...


[Training]:  74%|███████▍  | 37/50 [00:32<00:11,  1.12it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0309 | Val Loss: 0.1252 | SpearmanR: 0.6366


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.41it/s]



Training and evaluating model using 224 randomly selected samples...


[Training]:  96%|█████████▌| 48/50 [00:41<00:01,  1.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0144 | Val Loss: 0.1310 | SpearmanR: 0.6507


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.62it/s]


Progress for experiment 13 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:48<00:00,  2.02s/it]



Cycle 15/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances15.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 240 actively selected samples...


[Training]:  94%|█████████▍| 47/50 [00:43<00:02,  1.09it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0175 | Val Loss: 0.1109 | SpearmanR: 0.6796


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.85it/s]



Training and evaluating model using 240 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:44<00:00,  1.13it/s]


Train Loss: 0.0221 | Val Loss: 0.0970 | SpearmanR: 0.7059


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.26it/s]


Progress for experiment 14 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...


MC Dropout Passes: 100%|██████████| 24/24 [00:48<00:00,  2.03s/it]



Cycle 16/16
-------------------------------------------------
Saving variance distribution to results/02_mc_dropout/variances16.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 256 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:45<00:00,  1.11it/s]


Train Loss: 0.0136 | Val Loss: 0.1171 | SpearmanR: 0.6647


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.09it/s]



Training and evaluating model using 256 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:46<00:00,  1.08it/s]


Train Loss: 0.0124 | Val Loss: 0.0938 | SpearmanR: 0.7219


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 72.21it/s]

Progress for experiment 15 appended to results/02_mc_dropout/active_vs_standard_learning_curve.csv
Experiments complete.





Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.201972,0.342964,0.267876,0.202025
1,n_samples,0,16,standard,0.255758,0.378769,0.378958,0.256003
2,n_samples,1,32,active,0.161405,0.477551,0.449355,0.161907
3,n_samples,1,32,standard,0.204779,0.291629,0.253262,0.204862
4,n_samples,2,48,active,0.200616,0.158285,0.223110,0.200577
...,...,...,...,...,...,...,...,...
61,n_samples,13,224,standard,0.112046,0.699359,0.689481,0.111939
62,n_samples,14,240,active,0.120508,0.659675,0.661392,0.120622
63,n_samples,14,240,standard,0.120112,0.704371,0.695376,0.120664
64,n_samples,15,256,active,0.116768,0.684547,0.683357,0.117253


In [4]:
import pandas as pd

lc_df = pd.read_csv("results/02_mc_dropout/active_vs_standard_learning_curve.csv")
lc_df.head()

Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.195808,0.359393,0.347008,0.196497
1,n_samples,0,16,standard,0.208022,0.10839,0.151105,0.20796
2,n_samples,1,32,active,0.26461,0.456056,0.480326,0.265571
3,n_samples,1,32,standard,0.203834,0.324084,0.331453,0.204373
4,n_samples,2,48,active,0.160523,0.489545,0.512429,0.161426


In [5]:
import plotly.express as px

fig = px.line(lc_df, 'value', 'spearmanr',color="training_method")

fig.update_layout(
    xaxis_title="Number of Training Samples",
    yaxis_title="Spearman Correlation Coefficient"
)

fig.show()

Rather than a consisent under performance like last time, we at least see comparable performance. This suggests to me that the large number of predictions being made is helping to better understand the true variance.

In [6]:
import pandas as pd
import glob
from pathlib import Path

file_pattern = "results/02_mc_dropout/variances*.csv"
variance_files = glob.glob(file_pattern)
variance_files.sort()

all_variances_dfs = []

for filepath in variance_files:
    temp_df = pd.read_csv(filepath)
    column_name = Path(filepath).stem
    temp_df = temp_df.rename(columns={'variance': column_name})
    all_variances_dfs.append(temp_df)

final_df = pd.concat(all_variances_dfs, axis=1)

final_df

Unnamed: 0,variances10,variances11,variances12,variances13,variances14,variances15,variances16,variances2,variances3,variances4,variances5,variances6,variances7,variances8,variances9
0,0.011277,0.008445,0.008688,0.010080,0.005996,0.025556,0.008448,0.004034,0.007592,0.010373,0.007311,0.006803,0.008743,0.004441,0.004548
1,0.005809,0.006113,0.007618,0.011967,0.010542,0.015099,0.007711,0.005939,0.004571,0.011246,0.013604,0.009452,0.013490,0.004208,0.008405
2,0.008899,0.008993,0.017221,0.005632,0.013606,0.012596,0.005104,0.004966,0.002734,0.009145,0.005954,0.002862,0.006797,0.005591,0.020041
3,0.008668,0.011705,0.009755,0.003782,0.005567,0.017404,0.007514,0.001682,0.004713,0.011360,0.011702,0.008669,0.008822,0.003421,0.003745
4,0.008318,0.007697,0.018293,0.004900,0.010415,0.022672,0.015724,0.002391,0.004811,0.010415,0.008917,0.005406,0.005036,0.005022,0.007761
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3147,,,,,,,,0.004960,,,,,,,
3148,,,,,,,,0.002326,,,,,,,
3149,,,,,,,,0.005587,,,,,,,
3150,,,,,,,,0.005495,,,,,,,


In [7]:
columns_to_plot = ['variances2', 'variances5', 'variances8', 'variances11', 'variances14']

# "Melt" the selected columns into a long-format DataFrame
melted_df = final_df[columns_to_plot].melt(
    var_name='Cycle',      # New column for the original column names
    value_name='Variance'  # New column for the variance values
)

fig = px.histogram(
    data_frame=melted_df,
    x='Variance',                                  
    color='Cycle',                                 
    barmode='overlay',                             
    opacity=0.65,                                  
    histnorm='probability density',                
    title='Distribution of Variances Across Active Learning Cycles'
)

fig.show()

This is a pretty telling finding. As the model sees more data, it is actually just getting more confused about the data that it is seeing. This suggest that the uncertainty estimates are miscalibrated. Ideally, model uncertainty would be correlated with the absolute error of the model prediction versus the observed value. So, it would be a good idea to create a function that enables me to monitor this correlation. 

One issue that may be causing this is that by simply selecting the samples with the highest variance, I could be consistently choosing samples that are outliers and out of the distribution. The batches are small, so if these outliers happen to be samples that are just particularly noisy data points, this would inhibit the model's learning.

Another potential problem at work here is that the model might be selecting the values that have the lowest read counts in the original dataset and therefore the lowest confidence and highest noise. This would reduce the information actually available in that sample for the model to pick up on.

Additionally, there's the issue of sample diversity that I mentioned before. If the batches are substantially less diverse than randomly selected batches, this could also impair learning.

One more issue that I can think of is that a full fine-tune in the early stages is just too unstable with batches with tens of samples.