# Approach 2.5 / 3.5

Each of the approaches tested so far, despite being used previously in the literature (at least to the extent that I understand them) have failed to out perform just randomly selecting samples. I suspect this could be due to one and/or another issue with the current acquisition strategy. Each of these approaches use some method to find the variance of model predictions to find the most uncertain sequences for the next batch. However, those uncertain sequences might somehow correspond to noisy measurements (measurements with lower read counts, especially if they tend to be the sequences with more mutations), and they might not be sufficiently diverse.

Given these hypotheses for what's going wrong, I may be able to solve them by ensuring the next sample is uncertain as well as diverse. I'd be interested in testing when its just diverse and when its uncertain and diverse compared to the random baseline.

But how do we ensure diversity? Well, since we are conveniently using a language model, we should be able to just use the language model embeddings. So, all I need to do is figure out how to get the embeddings, measure their distance, and integrate some amount of distance optimization into my acquistion function.

Note: Below, I'm going to extract the embeddings from the last hidden state of the CLS token, since that's the standard model I've been working with setting up the structure of active learning. However, other fine-tuning approaches should make use of different embeddings. For example, if you're using only the mutant residues as the final state input to your output layer, you should probably use those embeddings for diversity. Another decision point is whether to use the embeddings of your fine-tuned model or a fresh ESM2 model. I'm going to work with just the fine-tuned model here.

Here's the algorithm I'll employ:
1. Train ensemble on random initial set.
2. Get variances from model predictions.
3. Filter samples with top k% of variances.
4. Retrieve embeddings for those samples.
5. For each sample in the unlabeled pool, calculate its distance from each of the samples in the current batch.
6. Find the minimum distance from those for each candidate.
7. Add candidate c whose minimum distance from the current batch is the greatest and remove it from the unlabeled pool.
8. Calculate the distance of just-added candidate c from each of the remaining unlabeled pool. Add these distances to the distances list for each sample.
9. Repeat 6-8 until n samples are added to the labeled samples.

Actually, now that I spell out the algorithm, I'm realizing that I should also test just a random choice from the top k% of variances to make sure that the embeddings even add anything.

In [1]:
from scripts.data_utils import train_val_test_split
from scripts.config import (
    DATA_PATH, 
    SEQUENCE_COL, 
    SCORE_COL, 
    TOK_MODEL, 
    VAL_SPLIT,
    TEST_SPLIT,
    BATCH_SIZE,
    RANDOM_SEED,
)

training_pool, val_dataloader, test_dataloader = train_val_test_split(
    DATA_PATH,
    SEQUENCE_COL,
    SCORE_COL,
    TOK_MODEL,
    VAL_SPLIT,
    TEST_SPLIT,
    BATCH_SIZE,
    RANDOM_SEED
)

In [2]:
import torch
from torch.utils.data import Subset, DataLoader
import numpy as np

from scripts.training import initialize_and_train_new_model
from scripts.acquisition import get_pool_predictions

def get_bootstrap_sample(labeled_indices, pool_dataset, train_dataloader_batch_size):
    bootstrap_indices = np.random.choice(labeled_indices, size=int(0.9*len(labeled_indices)),replace=True)
    bootstrap_subset = Subset(pool_dataset, bootstrap_indices)
    bootstrap_dataloader = DataLoader(bootstrap_subset, batch_size=train_dataloader_batch_size, shuffle=True)
    return bootstrap_dataloader

def train_ensemble(
        n_models, 
        model_name, 
        approach,
        learning_rate,
        weight_decay,
        epochs,
        labeled_indices,
        train_dataloader_batch_size,
        pool_dataset, 
        pool_dataloader, 
        val_dataloader,
        patience
        ):
    
    # define list to store predictions as each model is trained then evaluated
    ensemble_predictions = []
    
    for i in range(n_models):
        print(f"\nTraining Model {i+1}...")
        # set a changing manual seed
        torch.manual_seed(i)
        torch.cuda.manual_seed(i)

        # get bootstrap sample from labeled dataset
        bootstrap_dataloader = get_bootstrap_sample(labeled_indices, pool_dataset, train_dataloader_batch_size)

        # initialize and train a new model
        model = initialize_and_train_new_model(approach, model_name, learning_rate, weight_decay, epochs, bootstrap_dataloader, val_dataloader, patience)
        
        # get model predictions on pool dataloader, append to ensemble predictions list
        pool_preds = get_pool_predictions(model, pool_dataloader)
        ensemble_predictions.append(pool_preds)

    # stack ensemble predictions to create tensor of shape (n_models, n_unlabeled_samples)
    ensemble_predictions = torch.stack(ensemble_predictions, dim=0)
    print("Ensemble training complete, submitting predictions for next cycle.")
    # return list of ensemble predictions
    return ensemble_predictions

### Random choice from top % variances  

In [9]:
import numpy as np
from torch.utils.data import Subset, DataLoader

# acquire new batch, randomly if no scores given, top "batch_size_to_acquire" if given
def acquire_new_batch(
    dataset, 
    train_dataloader_batch_size, 
    pool_dataloader_batch_size, 
    initial_batch_size,
    top_score_fraction, 
    batch_size_to_acquire, 
    labeled_indices, 
    unlabeled_indices, 
    acquisition_scores=None
    ):

    # if initial batch, when there are no acquisition scores, select randomly
    if acquisition_scores is None:
        initial_batch_size = min(initial_batch_size, len(unlabeled_indices))
        indices_to_acquire = np.random.choice(unlabeled_indices, size=initial_batch_size, replace=False)
    
    # else select based on top acquisition scores
    else:
        # make sure we don't overshoot samples to acquire if on the final batch
        batch_size_to_acquire = min(batch_size_to_acquire, len(acquisition_scores))
        # detmine the number of top scorers to select from
        num_top_scorers = int(top_score_fraction * len(unlabeled_indices))
        # get the indicies of the top acquisition scores (num of samples)
        top_indices = acquisition_scores.topk(num_top_scorers).indices
        # choose a random set of indices from these top scorers
        top_k_indices = np.random.choice(top_indices.cpu().numpy(), size=batch_size_to_acquire, replace=False)
        # use these to find the indices that map back to the original dataset
        indices_to_acquire = unlabeled_indices[top_k_indices]
    
    # update the indices lists
    labeled_indices = np.concatenate([labeled_indices, indices_to_acquire])
    unlabeled_indices = np.setdiff1d(unlabeled_indices, indices_to_acquire, assume_unique=True)
    
    # create new subsets and dataloaders
    train_subset = Subset(dataset, labeled_indices.tolist())
    pool_subset = Subset(dataset, unlabeled_indices.tolist())
    train_dataloader = DataLoader(train_subset, batch_size=train_dataloader_batch_size, shuffle=True)
    pool_dataloader = DataLoader(pool_subset, batch_size=pool_dataloader_batch_size, shuffle=False)
    
    return train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices

In [10]:
from pathlib import Path
import pandas as pd

from scripts.acquisition import get_variances
from scripts.training import initialize_and_train_new_model, test_model
from scripts.campaigns import run_standard_finetuning


def get_learning_curves(
        n_samples,
        initial_n_samples,
        top_score_fraction,
        n_samples_per_batch,
        model_name, 
        approach,
        learning_rate, 
        weight_decay, 
        epochs, 
        training_pool, 
        train_dataloader_batch_size,
        pool_dataloader_batch_size,
        val_dataloader, 
        test_dataloader,
        patience=5,
        n_models=5,
        results_path="active_vs_standard_learning_curves.csv"
):
    results_path = Path(results_path)
    results_dir = results_path.parent
    results_dir.mkdir(parents=True, exist_ok=True)

    # Load existing results if the file exists, otherwise start with a fresh DataFrame.
    if results_path.exists():
        all_results_df = pd.read_csv(results_path)
    else:
        all_results_df = pd.DataFrame()
    
    total_pool_size = len(training_pool)
    unlabeled_indices = np.arange(total_pool_size)
    labeled_indices = np.array([], dtype=np.int64)

    ensemble_predictions = None
    current_cycle = 1
    total_cycles = int(np.ceil((n_samples-initial_n_samples)/n_samples_per_batch)) + 1
    
    while len(labeled_indices) < n_samples and len(unlabeled_indices) > 0:
        print(f"\nCycle {current_cycle}/{total_cycles}\n-------------------------------------------------")

        # on the first cycle, choose random samples of initial_n_samples size
        if ensemble_predictions is None:
            print(f"Choosing initial {initial_n_samples} samples randomly...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, train_dataloader_batch_size, pool_dataloader_batch_size, initial_n_samples, top_score_fraction, n_samples_per_batch, labeled_indices, unlabeled_indices, acquisition_scores=None
            )
        # each other time, use the n_samples_per_batch with acquisition scores to select
        else:
            scores = get_variances(ensemble_predictions, f"{results_dir}/variances{current_cycle}.csv")
            print(f"Selecting new data points...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, train_dataloader_batch_size, pool_dataloader_batch_size, initial_n_samples, top_score_fraction, n_samples_per_batch, labeled_indices, unlabeled_indices, acquisition_scores=scores
            )
        
        # give message when loop ends
        if len(unlabeled_indices) == 0:
            print("Unlabeled pool is empty. Proceeding to final model training.")
            break
        
        # evaluate active vs standard
        final_results = []

        # active
        print(f"\nTraining and evaluating model using {len(labeled_indices)} actively selected samples...")
        model_active = initialize_and_train_new_model(approach, model_name, learning_rate, weight_decay, epochs, train_dataloader, val_dataloader, patience, return_history=False)
        results_active = test_model(model_active, test_dataloader, return_results=True)
        results_active = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'active',
            **results_active
        }
        final_results.append(results_active)

        # standard
        print(f"\nTraining and evaluating model using {len(labeled_indices)} randomly selected samples...")
        model_standard, _ = run_standard_finetuning(len(labeled_indices), approach, model_name, train_dataloader_batch_size, learning_rate, weight_decay, epochs, training_pool, val_dataloader, patience)
        results_standard = test_model(model_standard, test_dataloader, return_results=True)
        results_standard = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'standard',
            **results_standard
        }
        final_results.append(results_standard)
        # save to disk each time to save progress
        results_df = pd.DataFrame(final_results)
        all_results_df = pd.concat([all_results_df, results_df], ignore_index=True)
        all_results_df.to_csv(results_path, index=False)
        print(f"Progress for experiment {current_cycle-1} appended to {results_path}")

        # if it's the last cycle, skip ensemble predictions
        if (current_cycle == total_cycles):
            print("Experiments complete.")
            break

        print("Starting ensemble training and pool evaluation...")
        ensemble_predictions = train_ensemble(n_models, model_name, approach, learning_rate, weight_decay, epochs, labeled_indices, train_dataloader_batch_size, training_pool, pool_dataloader, val_dataloader, patience)
    
        current_cycle += 1
    return all_results_df

In [11]:
from scripts.config import (
    MODEL_NAME,
    APPROACH,
    LEARNING_RATE,
    WEIGHT_DECAY,
    EPOCHS,
    POOL_BATCH_SIZE,
    PATIENCE,
    N_MODELS,
    TOP_SCORE_FRACTION,
)

get_learning_curves(
    n_samples=256,
    initial_n_samples=16,
    top_score_fraction=TOP_SCORE_FRACTION,
    n_samples_per_batch=16,
    model_name=MODEL_NAME,
    approach=APPROACH,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    epochs=EPOCHS,
    training_pool=training_pool,
    train_dataloader_batch_size=BATCH_SIZE,
    pool_dataloader_batch_size=POOL_BATCH_SIZE,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    patience=PATIENCE,
    n_models=N_MODELS,
    results_path='results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv'
)


Cycle 1/16
-------------------------------------------------
Choosing initial 16 samples randomly...

Training and evaluating model using 16 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:35<00:00,  1.42it/s]


Train Loss: 0.0336 | Val Loss: 0.1788 | SpearmanR: 0.4060


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.40it/s]



Training and evaluating model using 16 randomly selected samples...


[Training]:  20%|██        | 10/50 [00:03<00:13,  2.88it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.2326 | Val Loss: 0.2467 | SpearmanR: -0.0998


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.63it/s]


Progress for experiment 0 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  96%|█████████▌| 48/50 [00:27<00:01,  1.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0013 | Val Loss: 0.2925 | SpearmanR: 0.2855


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.78it/s]



Training Model 2...


[Training]:  84%|████████▍ | 42/50 [00:24<00:04,  1.73it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0291 | Val Loss: 0.3151 | SpearmanR: 0.1252


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.76it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:32<00:00,  1.55it/s]


Train Loss: 0.0047 | Val Loss: 0.1990 | SpearmanR: 0.1898


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.92it/s]



Training Model 4...


[Training]:  20%|██        | 10/50 [00:02<00:11,  3.51it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.2623 | Val Loss: 0.1999 | SpearmanR: -0.0290


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.81it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:32<00:00,  1.54it/s]


Train Loss: 0.0059 | Val Loss: 0.1985 | SpearmanR: 0.3307


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.00it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 2/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances2.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 32 actively selected samples...


[Training]:  90%|█████████ | 45/50 [00:18<00:02,  2.38it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0235 | Val Loss: 0.1893 | SpearmanR: 0.3408


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 131.91it/s]



Training and evaluating model using 32 randomly selected samples...


[Training]:  26%|██▌       | 13/50 [00:07<00:20,  1.81it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.1520 | Val Loss: 0.2177 | SpearmanR: 0.1649


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 131.04it/s]


Progress for experiment 1 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.83it/s]


Train Loss: 0.0019 | Val Loss: 0.2308 | SpearmanR: 0.3456


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.81it/s]



Training Model 2...


[Training]:  88%|████████▊ | 44/50 [00:13<00:01,  3.18it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0084 | Val Loss: 0.1676 | SpearmanR: 0.4401


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.75it/s]



Training Model 3...


[Training]:  70%|███████   | 35/50 [00:11<00:04,  3.05it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0448 | Val Loss: 0.2103 | SpearmanR: 0.1413


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.74it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:12<00:03,  3.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0181 | Val Loss: 0.2064 | SpearmanR: 0.3939


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.74it/s]



Training Model 5...


[Training]:  90%|█████████ | 45/50 [00:14<00:01,  3.05it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0050 | Val Loss: 0.2154 | SpearmanR: 0.3789


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.76it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 3/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances3.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 48 actively selected samples...


[Training]:  82%|████████▏ | 41/50 [00:14<00:03,  2.85it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0309 | Val Loss: 0.1424 | SpearmanR: 0.5392


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.74it/s]



Training and evaluating model using 48 randomly selected samples...


[Training]:  42%|████▏     | 21/50 [00:07<00:09,  2.99it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0978 | Val Loss: 0.1651 | SpearmanR: 0.3981


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.23it/s]


Progress for experiment 2 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  62%|██████▏   | 31/50 [00:11<00:06,  2.80it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0081 | Val Loss: 0.1883 | SpearmanR: 0.3627


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.74it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.76it/s]


Train Loss: 0.0061 | Val Loss: 0.1606 | SpearmanR: 0.4655


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.76it/s]



Training Model 3...


[Training]:  76%|███████▌  | 38/50 [00:13<00:04,  2.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0019 | Val Loss: 0.1933 | SpearmanR: 0.4052


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.86it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.64it/s]


Train Loss: 0.0057 | Val Loss: 0.1433 | SpearmanR: 0.5033


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.87it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.76it/s]


Train Loss: 0.0155 | Val Loss: 0.1741 | SpearmanR: 0.4640


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.64it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 4/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances4.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 64 actively selected samples...


[Training]:  56%|█████▌    | 28/50 [00:10<00:08,  2.56it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0719 | Val Loss: 0.1567 | SpearmanR: 0.4937


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.16it/s]



Training and evaluating model using 64 randomly selected samples...


[Training]:  92%|█████████▏| 46/50 [00:15<00:01,  2.89it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0271 | Val Loss: 0.1510 | SpearmanR: 0.4731


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.01it/s]


Progress for experiment 3 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  86%|████████▌ | 43/50 [00:14<00:02,  2.89it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0081 | Val Loss: 0.1741 | SpearmanR: 0.3866


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.67it/s]



Training Model 2...


[Training]:  86%|████████▌ | 43/50 [00:14<00:02,  3.05it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0154 | Val Loss: 0.1542 | SpearmanR: 0.5206


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.75it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.79it/s]


Train Loss: 0.0139 | Val Loss: 0.1796 | SpearmanR: 0.4037


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.51it/s]



Training Model 4...


[Training]:  72%|███████▏  | 36/50 [00:12<00:05,  2.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0047 | Val Loss: 0.1641 | SpearmanR: 0.4576


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.71it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.72it/s]


Train Loss: 0.0052 | Val Loss: 0.1634 | SpearmanR: 0.5089


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.60it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 5/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances5.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 80 actively selected samples...


[Training]:  86%|████████▌ | 43/50 [00:15<00:02,  2.85it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0236 | Val Loss: 0.1710 | SpearmanR: 0.4967


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.71it/s]



Training and evaluating model using 80 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.84it/s]


Train Loss: 0.0198 | Val Loss: 0.1620 | SpearmanR: 0.5035


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.55it/s]


Progress for experiment 4 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  72%|███████▏  | 36/50 [00:22<00:08,  1.64it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0088 | Val Loss: 0.1681 | SpearmanR: 0.4383


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.13it/s]



Training Model 2...


[Training]:  54%|█████▍    | 27/50 [00:16<00:14,  1.59it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0117 | Val Loss: 0.1650 | SpearmanR: 0.4218


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.00it/s]



Training Model 3...


[Training]:  54%|█████▍    | 27/50 [00:18<00:15,  1.48it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0130 | Val Loss: 0.1816 | SpearmanR: 0.3911


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.26it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:28<00:00,  1.76it/s]


Train Loss: 0.0008 | Val Loss: 0.1386 | SpearmanR: 0.5346


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.12it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:34<00:00,  1.44it/s]


Train Loss: 0.0020 | Val Loss: 0.1901 | SpearmanR: 0.4033


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.27it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 6/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances6.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 96 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:31<00:00,  1.57it/s]


Train Loss: 0.0138 | Val Loss: 0.1468 | SpearmanR: 0.5769


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.91it/s]



Training and evaluating model using 96 randomly selected samples...


[Training]:  68%|██████▊   | 34/50 [00:21<00:10,  1.56it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0285 | Val Loss: 0.1320 | SpearmanR: 0.5998


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.37it/s]


Progress for experiment 5 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  90%|█████████ | 45/50 [00:31<00:03,  1.41it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0030 | Val Loss: 0.1603 | SpearmanR: 0.5065


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.61it/s]



Training Model 2...


[Training]:  58%|█████▊    | 29/50 [00:20<00:14,  1.40it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0181 | Val Loss: 0.1552 | SpearmanR: 0.5059


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.59it/s]



Training Model 3...


[Training]:  72%|███████▏  | 36/50 [00:19<00:07,  1.83it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0068 | Val Loss: 0.1569 | SpearmanR: 0.4809


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.48it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:27<00:06,  1.43it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0074 | Val Loss: 0.1516 | SpearmanR: 0.5078


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.42it/s]



Training Model 5...


[Training]:  88%|████████▊ | 44/50 [00:29<00:04,  1.49it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0101 | Val Loss: 0.1569 | SpearmanR: 0.5020


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.52it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 7/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances7.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 112 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:34<00:00,  1.45it/s]


Train Loss: 0.0096 | Val Loss: 0.1576 | SpearmanR: 0.5423


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.10it/s]



Training and evaluating model using 112 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:33<00:00,  1.48it/s]


Train Loss: 0.0174 | Val Loss: 0.1533 | SpearmanR: 0.5817


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.01it/s]


Progress for experiment 6 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:32<00:00,  1.52it/s]


Train Loss: 0.0036 | Val Loss: 0.1561 | SpearmanR: 0.5177


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]



Training Model 2...


[Training]:  34%|███▍      | 17/50 [00:11<00:22,  1.45it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0222 | Val Loss: 0.1765 | SpearmanR: 0.4338


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.67it/s]



Training Model 3...


[Training]:  66%|██████▌   | 33/50 [00:20<00:10,  1.59it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0446 | Val Loss: 0.1629 | SpearmanR: 0.4387


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.53it/s]



Training Model 4...


[Training]:  86%|████████▌ | 43/50 [00:30<00:04,  1.41it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0159 | Val Loss: 0.1547 | SpearmanR: 0.5035


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.51it/s]



Training Model 5...


[Training]:  54%|█████▍    | 27/50 [00:18<00:15,  1.46it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0162 | Val Loss: 0.1602 | SpearmanR: 0.4933


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.51it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 8/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances8.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 128 actively selected samples...


[Training]:  80%|████████  | 40/50 [00:25<00:06,  1.55it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0151 | Val Loss: 0.1450 | SpearmanR: 0.5699


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.95it/s]



Training and evaluating model using 128 randomly selected samples...


[Training]:  94%|█████████▍| 47/50 [00:33<00:02,  1.42it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0358 | Val Loss: 0.1352 | SpearmanR: 0.5908


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.69it/s]


Progress for experiment 7 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  38%|███▊      | 19/50 [00:13<00:21,  1.45it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.1477 | Val Loss: 0.2029 | SpearmanR: 0.5158


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:37<00:00,  1.32it/s]


Train Loss: 0.0045 | Val Loss: 0.1711 | SpearmanR: 0.4564


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.70it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:24<00:00,  2.05it/s]


Train Loss: 0.0178 | Val Loss: 0.1521 | SpearmanR: 0.5003


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.73it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:29<00:07,  1.37it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0133 | Val Loss: 0.1453 | SpearmanR: 0.5624


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.67it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:40<00:00,  1.24it/s]


Train Loss: 0.0394 | Val Loss: 0.1550 | SpearmanR: 0.5098


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 9/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances9.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 144 actively selected samples...


[Training]:  68%|██████▊   | 34/50 [00:30<00:14,  1.11it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0296 | Val Loss: 0.1379 | SpearmanR: 0.5825


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.54it/s]



Training and evaluating model using 144 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:33<00:00,  1.49it/s]


Train Loss: 0.0233 | Val Loss: 0.1500 | SpearmanR: 0.5689


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.49it/s]


Progress for experiment 8 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:23<00:00,  2.15it/s]


Train Loss: 0.0168 | Val Loss: 0.1379 | SpearmanR: 0.5476


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.61it/s]



Training Model 2...


[Training]:  92%|█████████▏| 46/50 [00:19<00:01,  2.39it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0120 | Val Loss: 0.1674 | SpearmanR: 0.5490


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]



Training Model 3...


[Training]:  62%|██████▏   | 31/50 [00:14<00:08,  2.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0259 | Val Loss: 0.1938 | SpearmanR: 0.4978


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.72it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:19<00:00,  2.50it/s]


Train Loss: 0.0217 | Val Loss: 0.1418 | SpearmanR: 0.5246


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.68it/s]



Training Model 5...


[Training]:  58%|█████▊    | 29/50 [00:14<00:10,  1.95it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0204 | Val Loss: 0.1893 | SpearmanR: 0.4517


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 10/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances10.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 160 actively selected samples...


[Training]:  94%|█████████▍| 47/50 [00:21<00:01,  2.14it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0205 | Val Loss: 0.1256 | SpearmanR: 0.5914


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.13it/s]



Training and evaluating model using 160 randomly selected samples...


[Training]:  98%|█████████▊| 49/50 [00:25<00:00,  1.94it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0090 | Val Loss: 0.1355 | SpearmanR: 0.6327


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.51it/s]


Progress for experiment 9 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  92%|█████████▏| 46/50 [00:22<00:01,  2.03it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0044 | Val Loss: 0.1577 | SpearmanR: 0.5298


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.71it/s]



Training Model 2...


[Training]:  82%|████████▏ | 41/50 [00:19<00:04,  2.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0117 | Val Loss: 0.1464 | SpearmanR: 0.5633


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.75it/s]



Training Model 3...


[Training]:  96%|█████████▌| 48/50 [00:23<00:00,  2.05it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0016 | Val Loss: 0.1418 | SpearmanR: 0.5635


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.84it/s]



Training Model 4...


[Training]:  98%|█████████▊| 49/50 [00:22<00:00,  2.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0056 | Val Loss: 0.1467 | SpearmanR: 0.5560


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.81it/s]



Training Model 5...


[Training]:  80%|████████  | 40/50 [00:17<00:04,  2.29it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0061 | Val Loss: 0.1338 | SpearmanR: 0.5726


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.60it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 11/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances11.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 176 actively selected samples...


[Training]:  82%|████████▏ | 41/50 [00:21<00:04,  1.95it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0332 | Val Loss: 0.1579 | SpearmanR: 0.6268


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.50it/s]



Training and evaluating model using 176 randomly selected samples...


[Training]:  80%|████████  | 40/50 [00:20<00:05,  1.97it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0515 | Val Loss: 0.1163 | SpearmanR: 0.6516


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.68it/s]


Progress for experiment 10 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:33<00:00,  1.50it/s]


Train Loss: 0.0158 | Val Loss: 0.1611 | SpearmanR: 0.5337


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.05it/s]



Training Model 2...


[Training]:  84%|████████▍ | 42/50 [00:30<00:05,  1.38it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0073 | Val Loss: 0.1462 | SpearmanR: 0.5419


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.92it/s]



Training Model 3...


[Training]:  80%|████████  | 40/50 [00:30<00:07,  1.32it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0485 | Val Loss: 0.1494 | SpearmanR: 0.5668


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.08it/s]



Training Model 4...


[Training]:  72%|███████▏  | 36/50 [00:18<00:07,  1.93it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0086 | Val Loss: 0.1380 | SpearmanR: 0.5817


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.77it/s]



Training Model 5...


[Training]:  78%|███████▊  | 39/50 [00:21<00:05,  1.84it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0098 | Val Loss: 0.1325 | SpearmanR: 0.5677


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.69it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 12/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances12.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 192 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:38<00:00,  1.31it/s]


Train Loss: 0.0164 | Val Loss: 0.1415 | SpearmanR: 0.6202


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.06it/s]



Training and evaluating model using 192 randomly selected samples...


[Training]:  86%|████████▌ | 43/50 [00:33<00:05,  1.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0154 | Val Loss: 0.1386 | SpearmanR: 0.6167


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.31it/s]


Progress for experiment 11 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:33<00:00,  1.51it/s]


Train Loss: 0.0027 | Val Loss: 0.1411 | SpearmanR: 0.5689


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.91it/s]



Training Model 2...


[Training]:  56%|█████▌    | 28/50 [00:23<00:18,  1.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0091 | Val Loss: 0.1352 | SpearmanR: 0.5561


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.98it/s]



Training Model 3...


[Training]:  98%|█████████▊| 49/50 [00:37<00:00,  1.32it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0109 | Val Loss: 0.1192 | SpearmanR: 0.6487


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.83it/s]



Training Model 4...


[Training]:  70%|███████   | 35/50 [00:25<00:11,  1.35it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0143 | Val Loss: 0.1797 | SpearmanR: 0.5343


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.86it/s]



Training Model 5...


[Training]:  92%|█████████▏| 46/50 [00:36<00:03,  1.28it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0026 | Val Loss: 0.1287 | SpearmanR: 0.6145


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.86it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 13/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances13.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 208 actively selected samples...


[Training]:  70%|███████   | 35/50 [00:29<00:12,  1.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0254 | Val Loss: 0.1389 | SpearmanR: 0.6247


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.79it/s]



Training and evaluating model using 208 randomly selected samples...


[Training]:  82%|████████▏ | 41/50 [00:31<00:06,  1.30it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0177 | Val Loss: 0.1152 | SpearmanR: 0.6718


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 131.18it/s]


Progress for experiment 12 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:41<00:00,  1.19it/s]


Train Loss: 0.0057 | Val Loss: 0.1622 | SpearmanR: 0.5660


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.28it/s]



Training Model 2...


[Training]:  94%|█████████▍| 47/50 [00:40<00:02,  1.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0056 | Val Loss: 0.1614 | SpearmanR: 0.5567


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.32it/s]



Training Model 3...


[Training]:  74%|███████▍  | 37/50 [00:24<00:08,  1.51it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0121 | Val Loss: 0.1210 | SpearmanR: 0.6124


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.12it/s]



Training Model 4...


[Training]:  64%|██████▍   | 32/50 [00:28<00:15,  1.14it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0200 | Val Loss: 0.1585 | SpearmanR: 0.5745


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.24it/s]



Training Model 5...


[Training]:  82%|████████▏ | 41/50 [00:23<00:05,  1.74it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0144 | Val Loss: 0.1526 | SpearmanR: 0.5292


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.32it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 14/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances14.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 224 actively selected samples...


[Training]:  90%|█████████ | 45/50 [00:40<00:04,  1.11it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0105 | Val Loss: 0.1191 | SpearmanR: 0.6370


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.91it/s]



Training and evaluating model using 224 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:34<00:00,  1.45it/s]


Train Loss: 0.0079 | Val Loss: 0.1006 | SpearmanR: 0.7193


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.94it/s]


Progress for experiment 13 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:38<00:00,  1.31it/s]


Train Loss: 0.0050 | Val Loss: 0.1355 | SpearmanR: 0.6122


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.03it/s]



Training Model 2...


[Training]:  64%|██████▍   | 32/50 [00:26<00:14,  1.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0537 | Val Loss: 0.1421 | SpearmanR: 0.5954


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.51it/s]



Training Model 3...


[Training]:  74%|███████▍  | 37/50 [00:30<00:10,  1.22it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0063 | Val Loss: 0.1405 | SpearmanR: 0.5971


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.14it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:32<00:08,  1.21it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0073 | Val Loss: 0.1189 | SpearmanR: 0.6433


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.38it/s]



Training Model 5...


[Training]:  96%|█████████▌| 48/50 [00:35<00:01,  1.37it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0039 | Val Loss: 0.1455 | SpearmanR: 0.6149


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.40it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 15/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances15.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 240 actively selected samples...


[Training]:  98%|█████████▊| 49/50 [00:40<00:00,  1.21it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0241 | Val Loss: 0.1112 | SpearmanR: 0.6497


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.20it/s]



Training and evaluating model using 240 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:45<00:00,  1.09it/s]


Train Loss: 0.0316 | Val Loss: 0.1277 | SpearmanR: 0.6740


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.06it/s]


Progress for experiment 14 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  68%|██████▊   | 34/50 [00:19<00:09,  1.75it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0059 | Val Loss: 0.1392 | SpearmanR: 0.5750


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.38it/s]



Training Model 2...


[Training]:  84%|████████▍ | 42/50 [00:23<00:04,  1.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0054 | Val Loss: 0.1156 | SpearmanR: 0.6557


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 15.99it/s]



Training Model 3...


[Training]:  84%|████████▍ | 42/50 [00:22<00:04,  1.83it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0041 | Val Loss: 0.1406 | SpearmanR: 0.5626


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.00it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:26<00:00,  1.87it/s]


Train Loss: 0.0044 | Val Loss: 0.1362 | SpearmanR: 0.5776


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 15.82it/s]



Training Model 5...


[Training]:  68%|██████▊   | 34/50 [00:18<00:08,  1.83it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0261 | Val Loss: 0.1444 | SpearmanR: 0.6235


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 15.97it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 16/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/random_from_top_fraction/variances16.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 256 actively selected samples...


[Training]:  84%|████████▍ | 42/50 [00:25<00:04,  1.64it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0209 | Val Loss: 0.1147 | SpearmanR: 0.6732


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.66it/s]



Training and evaluating model using 256 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:30<00:00,  1.62it/s]


Train Loss: 0.0382 | Val Loss: 0.1242 | SpearmanR: 0.6988


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 118.27it/s]

Progress for experiment 15 appended to results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv
Experiments complete.





Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.232308,0.276791,0.255876,0.23217
1,n_samples,0,16,standard,0.208022,0.10839,0.151105,0.20796
2,n_samples,0,16,active,0.225947,0.288939,0.327616,0.226419
3,n_samples,0,16,standard,0.223339,-0.1404,-0.167712,0.223238
4,n_samples,1,32,active,0.18496,0.453173,0.476212,0.185616
5,n_samples,1,32,standard,0.213183,0.243198,0.218171,0.213262
6,n_samples,2,48,active,0.164185,0.492976,0.524374,0.164829
7,n_samples,2,48,standard,0.218176,0.484964,0.40869,0.218421
8,n_samples,3,64,active,0.15205,0.509348,0.531544,0.152574
9,n_samples,3,64,standard,0.14424,0.547185,0.551789,0.14395


In [12]:
lc_df = pd.read_csv("results/04_adding_diversity/random_from_top_fraction/active_vs_standard_learning_curve.csv")
lc_df.head()

Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.225947,0.288939,0.327616,0.226419
1,n_samples,0,16,standard,0.223339,-0.1404,-0.167712,0.223238
2,n_samples,1,32,active,0.18496,0.453173,0.476212,0.185616
3,n_samples,1,32,standard,0.213183,0.243198,0.218171,0.213262
4,n_samples,2,48,active,0.164185,0.492976,0.524374,0.164829


In [13]:
import plotly.express as px

fig = px.line(lc_df, 'value', 'spearmanr',color="training_method")

fig.update_layout(
    xaxis_title="Number of Training Samples",
    yaxis_title="Spearman Correlation Coefficient"
)

fig.show()

In [14]:
import pandas as pd
import glob
from pathlib import Path

file_pattern = "results/04_adding_diversity/random_from_top_fraction/variances*.csv"
variance_files = glob.glob(file_pattern)
variance_files.sort()

all_variances_dfs = []

for filepath in variance_files:
    temp_df = pd.read_csv(filepath)
    column_name = Path(filepath).stem
    temp_df = temp_df.rename(columns={'variance': column_name})
    all_variances_dfs.append(temp_df)

final_df = pd.concat(all_variances_dfs, axis=1)

final_df

Unnamed: 0,variances10,variances11,variances12,variances13,variances14,variances15,variances16,variances2,variances3,variances4,variances5,variances6,variances7,variances8,variances9
0,0.017557,0.017802,0.021863,0.012007,0.007614,0.003172,0.006799,0.021065,0.034405,0.087627,0.007492,0.010298,0.001138,0.001611,0.073645
1,0.030268,0.013252,0.017392,0.014582,0.000798,0.013131,0.012300,0.032668,0.025599,0.013012,0.006858,0.012764,0.006510,0.019254,0.046434
2,0.062635,0.009125,0.005722,0.016405,0.020386,0.018690,0.008005,0.180021,0.043308,0.125256,0.021468,0.057188,0.073069,0.013695,0.049970
3,0.006008,0.047597,0.165927,0.025047,0.017256,0.031446,0.033810,0.034345,0.010135,0.010568,0.011698,0.011238,0.002768,0.009420,0.062151
4,0.108935,0.037727,0.017766,0.017103,0.021608,0.025763,0.023030,0.034272,0.086913,0.054073,0.031779,0.078399,0.056023,0.042748,0.117974
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3147,,,,,,,,0.045600,,,,,,,
3148,,,,,,,,0.023604,,,,,,,
3149,,,,,,,,0.064340,,,,,,,
3150,,,,,,,,0.022443,,,,,,,


In [17]:
columns_to_plot = ['variances2', 'variances5', 'variances8', 'variances11', 'variances14']

# "Melt" the selected columns into a long-format DataFrame
melted_df = final_df[columns_to_plot].melt(
    var_name='Cycle',      # New column for the original column names
    value_name='Variance'  # New column for the variance values
)

fig = px.histogram(
    data_frame=melted_df,
    x='Variance',                                  
    color='Cycle',                                 
    barmode='overlay',                             
    opacity=0.65,                                  
    histnorm='probability density',                
    title='Distribution of Variances Across Active Learning Cycles'
)

fig.show()

### With embeddings

#### Variance + Embedding Diversity

In [3]:
def get_embeddings(dataset, indices, pool_dataloader_batch_size, model):
    device = 'cuda' if torch.cuda.is_available else 'cpu'
    
    # Use input indices to make a subset and dataloader from the dataset.
    candidate_subset = Subset(dataset, indices)
    candidate_dataloader = DataLoader(candidate_subset, batch_size=pool_dataloader_batch_size, shuffle=False)
    # initialize a list to store embeddings
    embeddings = []
    # Loop through the dataloader
    model.eval()
    with torch.inference_mode():
        for inputs, _ in candidate_dataloader:
            # forward pass on the batch
            inputs = {k: v.to(device) for k, v in inputs.items()}
            outputs = model(**inputs, output_hidden_states=True)
            # get last hidden states
            last_hidden_states = outputs.hidden_states[-1]
            # get the embeddings of the first token (CLS token)
            embedding = last_hidden_states[:,0,:]
            # append to embedding list
            embeddings.append(embedding)
    # return embeddings, converted to tensor, without batch dimension
    embeddings = torch.cat(embeddings, dim=0)
    return embeddings

def select_diverse_batch(labeled_embeddings, candidate_embeddings, n_samples_per_batch):
    # ensure we don't try to get more candidates than we have available
    num_candidates = candidate_embeddings.shape[0]
    n_samples_in_batch = min(n_samples_per_batch, num_candidates)

    # initialize list to store selected indices
    selected_indices = []

    # get initial euclidean distances, final shape: # candidate samples x # labeled samples
    dist_matrix = torch.cdist(candidate_embeddings, labeled_embeddings, p=2)
    
    # find the minimum distance each candidate sample is from any labeled sample
    min_distances, _ = torch.min(dist_matrix, dim=1) 

    # for each sample in new batch
    for _ in range(n_samples_in_batch):
        # find the index of the sample with the greatest distance
        farthest_idx = torch.argmax(min_distances).item()
        # add that index to the selected indices list
        selected_indices.append(farthest_idx)
        # get the embedding of the newly selected sample
        newly_selected_embedding = candidate_embeddings[farthest_idx].unsqueeze(0) # unsqueeze to add batch dimension for cdist to function
        # calculate the new distances from the newly selected sample
        dist_to_new = torch.cdist(candidate_embeddings, newly_selected_embedding, p=2).squeeze()
        # update minimum distances for the next iteration
        min_distances = torch.minimum(min_distances, dist_to_new)
    
    return selected_indices

def acquire_new_batch(
        dataset, 
        train_dataloader_batch_size, 
        pool_dataloader_batch_size, 
        initial_batch_size, 
        top_score_fraction,
        batch_size_to_acquire, 
        labeled_indices, 
        unlabeled_indices,
        model=None, 
        variances=None,
        ):

    # if initial batch, when there are no variances, select randomly
    if variances is None:
        initial_batch_size = min(initial_batch_size, len(unlabeled_indices))
        indices_to_acquire = np.random.choice(unlabeled_indices, size=initial_batch_size, replace=False)
    
    # else select based on top variances scores
    else:
        # make sure we don't overshoot samples to acquire if on the final batch
        batch_size_to_acquire = min(batch_size_to_acquire, len(variances))
        # determine the number of top scorers to select from
        num_top_scorers = int(top_score_fraction * len(unlabeled_indices))
        # get the indicies of the top acquisition scores (num of samples)
        candidate_indices = variances.topk(num_top_scorers).indices
        
        # get embeddings for labeled and candidate sets
        labeled_embeds = get_embeddings(dataset, labeled_indices, pool_dataloader_batch_size, model)
        candidate_embeds = get_embeddings(dataset, candidate_indices, pool_dataloader_batch_size, model)

        # use embeddings to determine candidate indices that are diverse
        diverse_indices_in_candidates = select_diverse_batch(labeled_embeds, candidate_embeds, batch_size_to_acquire)

        # use these to find the indices that map back to the original dataset
        indices_to_acquire = candidate_indices[diverse_indices_in_candidates]
    
    # update the indices lists
    labeled_indices = np.concatenate([labeled_indices, indices_to_acquire])
    unlabeled_indices = np.setdiff1d(unlabeled_indices, indices_to_acquire, assume_unique=True)
    
    # create new subsets and dataloaders
    train_subset = Subset(dataset, labeled_indices.tolist())
    pool_subset = Subset(dataset, unlabeled_indices.tolist())
    train_dataloader = DataLoader(train_subset, batch_size=train_dataloader_batch_size, shuffle=True)
    pool_dataloader = DataLoader(pool_subset, batch_size=pool_dataloader_batch_size, shuffle=False)
    
    return train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices

In [4]:
from pathlib import Path
import pandas as pd

from scripts.acquisition import get_variances
from scripts.training import initialize_and_train_new_model, test_model
from scripts.campaigns import run_standard_finetuning


def get_learning_curves(
        n_samples,
        initial_n_samples,
        top_score_fraction,
        n_samples_per_batch,
        model_name, 
        approach,
        learning_rate, 
        weight_decay, 
        epochs, 
        training_pool, 
        train_dataloader_batch_size,
        pool_dataloader_batch_size,
        val_dataloader, 
        test_dataloader,
        patience=5,
        n_models=5,
        results_path="active_vs_standard_learning_curves.csv"
):
    results_path = Path(results_path)
    results_dir = results_path.parent
    results_dir.mkdir(parents=True, exist_ok=True)

    # Load existing results if the file exists, otherwise start with a fresh DataFrame.
    if results_path.exists():
        all_results_df = pd.read_csv(results_path)
    else:
        all_results_df = pd.DataFrame()
    
    total_pool_size = len(training_pool)
    unlabeled_indices = np.arange(total_pool_size)
    labeled_indices = np.array([], dtype=np.int64)

    ensemble_predictions = None
    current_cycle = 1
    total_cycles = int(np.ceil((n_samples-initial_n_samples)/n_samples_per_batch)) + 1
    
    while len(labeled_indices) < n_samples and len(unlabeled_indices) > 0:
        print(f"\nCycle {current_cycle}/{total_cycles}\n-------------------------------------------------")

        # on the first cycle, choose random samples of initial_n_samples size
        if ensemble_predictions is None:
            print(f"Choosing initial {initial_n_samples} samples randomly...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, 
                train_dataloader_batch_size, 
                pool_dataloader_batch_size, 
                initial_n_samples, 
                top_score_fraction, 
                n_samples_per_batch, 
                labeled_indices, 
                unlabeled_indices, 
                variances=None
            )
        # each other time, use the n_samples_per_batch with acquisition scores to select
        else:
            variances = get_variances(ensemble_predictions, f"{results_dir}/variances{current_cycle}.csv")
            print(f"Selecting new data points...")
            train_dataloader, pool_dataloader, labeled_indices, unlabeled_indices = acquire_new_batch(
                training_pool, 
                train_dataloader_batch_size, 
                pool_dataloader_batch_size, 
                initial_n_samples, 
                top_score_fraction, 
                n_samples_per_batch, 
                labeled_indices, 
                unlabeled_indices,
                model=model_active,
                variances=variances
            )
        
        # give message when loop ends
        if len(unlabeled_indices) == 0:
            print("Unlabeled pool is empty. Proceeding to final model training.")
            break
        
        # evaluate active vs standard
        final_results = []

        # active
        print(f"\nTraining and evaluating model using {len(labeled_indices)} actively selected samples...")
        model_active = initialize_and_train_new_model(
            approach, 
            model_name, 
            learning_rate, 
            weight_decay, 
            epochs, 
            train_dataloader, 
            val_dataloader, 
            patience, 
            return_history=False
            )
        results_active = test_model(model_active, test_dataloader, return_results=True)
        results_active = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'active',
            **results_active
        }
        final_results.append(results_active)

        # standard
        print(f"\nTraining and evaluating model using {len(labeled_indices)} randomly selected samples...")
        model_standard, _ = run_standard_finetuning(
            len(labeled_indices), 
            approach, 
            model_name, 
            train_dataloader_batch_size, 
            learning_rate, 
            weight_decay, 
            epochs, 
            training_pool, 
            val_dataloader, 
            patience
            )
        results_standard = test_model(model_standard, test_dataloader, return_results=True)
        results_standard = {
            'changing_var': 'n_samples',
            'local_exp_idx': current_cycle-1,
            'value': len(labeled_indices),
            'training_method': 'standard',
            **results_standard
        }
        final_results.append(results_standard)
        # save to disk each time to save progress
        results_df = pd.DataFrame(final_results)
        all_results_df = pd.concat([all_results_df, results_df], ignore_index=True)
        all_results_df.to_csv(results_path, index=False)
        print(f"Progress for experiment {current_cycle-1} appended to {results_path}")

        # if it's the last cycle, skip ensemble predictions
        if (current_cycle == total_cycles):
            print("Experiments complete.")
            break

        print("Starting ensemble training and pool evaluation...")
        ensemble_predictions = train_ensemble(
            n_models, 
            model_name, 
            approach, 
            learning_rate, 
            weight_decay, 
            epochs, 
            labeled_indices, 
            train_dataloader_batch_size, 
            training_pool, 
            pool_dataloader, 
            val_dataloader, 
            patience
            )
    
        current_cycle += 1
    return all_results_df

In [None]:
from scripts.config import (
    MODEL_NAME,
    APPROACH,
    LEARNING_RATE,
    WEIGHT_DECAY,
    EPOCHS,
    POOL_BATCH_SIZE,
    PATIENCE,
    N_MODELS,
    TOP_SCORE_FRACTION, # 0.5 in this trial
)

get_learning_curves(
    n_samples=256,
    initial_n_samples=16,
    top_score_fraction=TOP_SCORE_FRACTION,
    n_samples_per_batch=16,
    model_name=MODEL_NAME,
    approach=APPROACH,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    epochs=EPOCHS,
    training_pool=training_pool,
    train_dataloader_batch_size=BATCH_SIZE,
    pool_dataloader_batch_size=POOL_BATCH_SIZE,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    patience=PATIENCE,
    n_models=N_MODELS,
    results_path='results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv'
)


Cycle 1/16
-------------------------------------------------
Choosing initial 16 samples randomly...

Training and evaluating model using 16 actively selected samples...


[Training]:  44%|████▍     | 22/50 [00:06<00:08,  3.47it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0805 | Val Loss: 0.2110 | SpearmanR: 0.0976


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.48it/s]



Training and evaluating model using 16 randomly selected samples...


[Training]:  92%|█████████▏| 46/50 [00:14<00:01,  3.10it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0172 | Val Loss: 0.2118 | SpearmanR: 0.3338


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.33it/s]


Progress for experiment 0 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  76%|███████▌  | 38/50 [00:12<00:04,  3.00it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0730 | Val Loss: 0.2148 | SpearmanR: 0.0377


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.75it/s]



Training Model 2...


[Training]:  48%|████▊     | 24/50 [00:09<00:10,  2.55it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0532 | Val Loss: 0.2253 | SpearmanR: 0.0492


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.78it/s]



Training Model 3...


[Training]:  44%|████▍     | 22/50 [00:05<00:07,  3.74it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0446 | Val Loss: 0.2029 | SpearmanR: 0.1759


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.68it/s]



Training Model 4...


[Training]:  20%|██        | 10/50 [00:02<00:09,  4.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.2298 | Val Loss: 0.2215 | SpearmanR: -0.1174


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.77it/s]



Training Model 5...


[Training]:  76%|███████▌  | 38/50 [00:11<00:03,  3.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0307 | Val Loss: 0.1956 | SpearmanR: 0.1844


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.50it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 2/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances2.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 32 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:14<00:00,  3.37it/s]


Train Loss: 0.0069 | Val Loss: 0.1960 | SpearmanR: 0.4261


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.87it/s]



Training and evaluating model using 32 randomly selected samples...


[Training]:  34%|███▍      | 17/50 [00:06<00:12,  2.56it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.1132 | Val Loss: 0.1975 | SpearmanR: 0.1019


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.37it/s]


Progress for experiment 1 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  60%|██████    | 30/50 [00:09<00:06,  3.18it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0429 | Val Loss: 0.2217 | SpearmanR: 0.0706


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.68it/s]



Training Model 2...


[Training]:  88%|████████▊ | 44/50 [00:15<00:02,  2.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0036 | Val Loss: 0.2017 | SpearmanR: 0.1478


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.73it/s]



Training Model 3...


[Training]:  74%|███████▍  | 37/50 [00:10<00:03,  3.67it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0267 | Val Loss: 0.2333 | SpearmanR: 0.3195


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.79it/s]



Training Model 4...


[Training]:  68%|██████▊   | 34/50 [00:09<00:04,  3.63it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0383 | Val Loss: 0.2445 | SpearmanR: 0.3152


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.63it/s]



Training Model 5...


[Training]:  68%|██████▊   | 34/50 [00:11<00:05,  2.95it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0425 | Val Loss: 0.1945 | SpearmanR: 0.2705


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.70it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 3/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances3.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 48 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:15<00:00,  3.15it/s]


Train Loss: 0.0062 | Val Loss: 0.1492 | SpearmanR: 0.5383


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.83it/s]



Training and evaluating model using 48 randomly selected samples...


[Training]:  64%|██████▍   | 32/50 [00:12<00:07,  2.52it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0481 | Val Loss: 0.1699 | SpearmanR: 0.3657


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.49it/s]


Progress for experiment 2 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.85it/s]


Train Loss: 0.0008 | Val Loss: 0.1760 | SpearmanR: 0.4020


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.00it/s]



Training Model 2...


[Training]:  66%|██████▌   | 33/50 [00:11<00:05,  2.99it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0069 | Val Loss: 0.1641 | SpearmanR: 0.4359


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.63it/s]



Training Model 3...


[Training]:  90%|█████████ | 45/50 [00:16<00:01,  2.75it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0030 | Val Loss: 0.1568 | SpearmanR: 0.4368


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.93it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:15<00:00,  3.18it/s]


Train Loss: 0.0019 | Val Loss: 0.2006 | SpearmanR: 0.3986


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.95it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.85it/s]


Train Loss: 0.0042 | Val Loss: 0.1908 | SpearmanR: 0.3072


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.05it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 4/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances4.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 64 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.50it/s]


Train Loss: 0.0122 | Val Loss: 0.2037 | SpearmanR: 0.4972


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 124.14it/s]



Training and evaluating model using 64 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.75it/s]


Train Loss: 0.0103 | Val Loss: 0.1652 | SpearmanR: 0.5121


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.78it/s]


Progress for experiment 3 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:19<00:00,  2.61it/s]


Train Loss: 0.0036 | Val Loss: 0.1700 | SpearmanR: 0.4505


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.87it/s]



Training Model 2...


[Training]:  76%|███████▌  | 38/50 [00:11<00:03,  3.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0049 | Val Loss: 0.1624 | SpearmanR: 0.4183


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.00it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.80it/s]


Train Loss: 0.0002 | Val Loss: 0.2095 | SpearmanR: 0.3998


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.81it/s]



Training Model 4...


[Training]:  72%|███████▏  | 36/50 [00:11<00:04,  3.12it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0086 | Val Loss: 0.2267 | SpearmanR: 0.4563


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.00it/s]



Training Model 5...


[Training]:  86%|████████▌ | 43/50 [00:16<00:02,  2.54it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0042 | Val Loss: 0.1827 | SpearmanR: 0.4562


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.86it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 5/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances5.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 80 actively selected samples...


[Training]:  86%|████████▌ | 43/50 [00:14<00:02,  2.97it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0132 | Val Loss: 0.1556 | SpearmanR: 0.5160


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.03it/s]



Training and evaluating model using 80 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:21<00:00,  2.32it/s]


Train Loss: 0.0095 | Val Loss: 0.1574 | SpearmanR: 0.5081


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.98it/s]


Progress for experiment 4 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  54%|█████▍    | 27/50 [00:12<00:10,  2.19it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0091 | Val Loss: 0.1727 | SpearmanR: 0.3854


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.04it/s]



Training Model 2...


[Training]:  56%|█████▌    | 28/50 [00:10<00:08,  2.65it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0099 | Val Loss: 0.1565 | SpearmanR: 0.4810


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.72it/s]



Training Model 3...


[Training]:  92%|█████████▏| 46/50 [00:18<00:01,  2.49it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0026 | Val Loss: 0.1749 | SpearmanR: 0.4355


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.10it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:14<00:03,  2.78it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0082 | Val Loss: 0.1859 | SpearmanR: 0.4023


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.02it/s]



Training Model 5...


[Training]:  52%|█████▏    | 26/50 [00:10<00:09,  2.52it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0158 | Val Loss: 0.1814 | SpearmanR: 0.3463


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.86it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 6/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances6.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 96 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:19<00:00,  2.61it/s]


Train Loss: 0.0223 | Val Loss: 0.1695 | SpearmanR: 0.5648


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.73it/s]



Training and evaluating model using 96 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.41it/s]


Train Loss: 0.0135 | Val Loss: 0.1556 | SpearmanR: 0.5507


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.28it/s]


Progress for experiment 5 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  78%|███████▊  | 39/50 [00:13<00:03,  2.81it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0031 | Val Loss: 0.1658 | SpearmanR: 0.4167


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.48it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:21<00:00,  2.33it/s]


Train Loss: 0.0010 | Val Loss: 0.1530 | SpearmanR: 0.4964


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.55it/s]



Training Model 3...


[Training]:  66%|██████▌   | 33/50 [00:11<00:05,  2.87it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0131 | Val Loss: 0.1445 | SpearmanR: 0.5190


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.48it/s]



Training Model 4...


[Training]:  82%|████████▏ | 41/50 [00:16<00:03,  2.44it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0012 | Val Loss: 0.1555 | SpearmanR: 0.5112


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.51it/s]



Training Model 5...


[Training]:  78%|███████▊  | 39/50 [00:15<00:04,  2.50it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0010 | Val Loss: 0.1704 | SpearmanR: 0.4752


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.46it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 7/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances7.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 112 actively selected samples...


[Training]:  74%|███████▍  | 37/50 [00:16<00:05,  2.18it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0325 | Val Loss: 0.1489 | SpearmanR: 0.5301


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.64it/s]



Training and evaluating model using 112 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.42it/s]


Train Loss: 0.0159 | Val Loss: 0.1317 | SpearmanR: 0.6201


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.40it/s]


Progress for experiment 6 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  76%|███████▌  | 38/50 [00:16<00:05,  2.34it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0059 | Val Loss: 0.1576 | SpearmanR: 0.5044


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.62it/s]



Training Model 2...


[Training]:  82%|████████▏ | 41/50 [00:15<00:03,  2.61it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0122 | Val Loss: 0.1649 | SpearmanR: 0.4643


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.45it/s]



Training Model 3...


[Training]:  68%|██████▊   | 34/50 [00:14<00:07,  2.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0029 | Val Loss: 0.1788 | SpearmanR: 0.4472


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.49it/s]



Training Model 4...


[Training]:  54%|█████▍    | 27/50 [00:10<00:08,  2.62it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0206 | Val Loss: 0.2048 | SpearmanR: 0.4054


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.63it/s]



Training Model 5...


[Training]:  78%|███████▊  | 39/50 [00:15<00:04,  2.47it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0035 | Val Loss: 0.2015 | SpearmanR: 0.4193


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.66it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 8/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances8.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 128 actively selected samples...


[Training]:  64%|██████▍   | 32/50 [00:13<00:07,  2.33it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0305 | Val Loss: 0.1507 | SpearmanR: 0.5312


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.48it/s]



Training and evaluating model using 128 randomly selected samples...


[Training]:  98%|█████████▊| 49/50 [00:23<00:00,  2.11it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0104 | Val Loss: 0.1263 | SpearmanR: 0.6238


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.92it/s]


Progress for experiment 7 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  96%|█████████▌| 48/50 [00:18<00:00,  2.57it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0016 | Val Loss: 0.1557 | SpearmanR: 0.5210


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.72it/s]



Training Model 2...


[Training]:  52%|█████▏    | 26/50 [00:13<00:12,  1.89it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0135 | Val Loss: 0.1542 | SpearmanR: 0.5185


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.67it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:18<00:00,  2.65it/s]


Train Loss: 0.0010 | Val Loss: 0.1713 | SpearmanR: 0.5285


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.71it/s]



Training Model 4...


[Training]:  88%|████████▊ | 44/50 [00:20<00:02,  2.14it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0062 | Val Loss: 0.1533 | SpearmanR: 0.5068


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.76it/s]



Training Model 5...


[Training]:  48%|████▊     | 24/50 [00:09<00:10,  2.48it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0572 | Val Loss: 0.1449 | SpearmanR: 0.5026


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.73it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 9/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances9.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 144 actively selected samples...


[Training]:  72%|███████▏  | 36/50 [00:17<00:06,  2.02it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0067 | Val Loss: 0.1326 | SpearmanR: 0.6049


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.14it/s]



Training and evaluating model using 144 randomly selected samples...


[Training]:  66%|██████▌   | 33/50 [00:13<00:06,  2.44it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0238 | Val Loss: 0.1380 | SpearmanR: 0.6169


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.78it/s]


Progress for experiment 8 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  80%|████████  | 40/50 [00:17<00:04,  2.25it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0211 | Val Loss: 0.1537 | SpearmanR: 0.5040


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.63it/s]



Training Model 2...


[Training]:  66%|██████▌   | 33/50 [00:14<00:07,  2.24it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0493 | Val Loss: 0.2319 | SpearmanR: 0.4605


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.65it/s]



Training Model 3...


[Training]:  74%|███████▍  | 37/50 [00:15<00:05,  2.41it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0457 | Val Loss: 0.1507 | SpearmanR: 0.5762


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.71it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:22<00:00,  2.23it/s]


Train Loss: 0.0302 | Val Loss: 0.1486 | SpearmanR: 0.5295


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.81it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:21<00:00,  2.29it/s]


Train Loss: 0.0215 | Val Loss: 0.1346 | SpearmanR: 0.5912


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.77it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 10/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances10.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 160 actively selected samples...


[Training]:  84%|████████▍ | 42/50 [00:19<00:03,  2.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0069 | Val Loss: 0.1366 | SpearmanR: 0.5895


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.83it/s]



Training and evaluating model using 160 randomly selected samples...


[Training]:  64%|██████▍   | 32/50 [00:16<00:09,  1.93it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0342 | Val Loss: 0.1288 | SpearmanR: 0.6104


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.52it/s]


Progress for experiment 9 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  98%|█████████▊| 49/50 [00:20<00:00,  2.34it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0018 | Val Loss: 0.1395 | SpearmanR: 0.5677


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.84it/s]



Training Model 2...


[Training]:  54%|█████▍    | 27/50 [00:14<00:11,  1.92it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0039 | Val Loss: 0.1534 | SpearmanR: 0.5032


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.76it/s]



Training Model 3...


[Training]:  68%|██████▊   | 34/50 [00:14<00:07,  2.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0082 | Val Loss: 0.1673 | SpearmanR: 0.5400


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.76it/s]



Training Model 4...


[Training]:  68%|██████▊   | 34/50 [00:15<00:07,  2.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0043 | Val Loss: 0.1376 | SpearmanR: 0.5617


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.60it/s]



Training Model 5...


[Training]:  54%|█████▍    | 27/50 [00:12<00:10,  2.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0131 | Val Loss: 0.1533 | SpearmanR: 0.5284


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.82it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 11/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances11.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 176 actively selected samples...


[Training]:  58%|█████▊    | 29/50 [00:13<00:09,  2.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0233 | Val Loss: 0.1292 | SpearmanR: 0.6184


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.82it/s]



Training and evaluating model using 176 randomly selected samples...


[Training]:  66%|██████▌   | 33/50 [00:18<00:09,  1.78it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0227 | Val Loss: 0.1710 | SpearmanR: 0.6431


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 124.40it/s]


Progress for experiment 10 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  56%|█████▌    | 28/50 [00:12<00:09,  2.22it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0169 | Val Loss: 0.1477 | SpearmanR: 0.5144


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.70it/s]



Training Model 2...


[Training]:  76%|███████▌  | 38/50 [00:19<00:06,  1.94it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0037 | Val Loss: 0.1507 | SpearmanR: 0.5184


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.73it/s]



Training Model 3...


[Training]:  66%|██████▌   | 33/50 [00:14<00:07,  2.28it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0070 | Val Loss: 0.1333 | SpearmanR: 0.6013


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.81it/s]



Training Model 4...


[Training]:  80%|████████  | 40/50 [00:20<00:05,  1.98it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0019 | Val Loss: 0.1590 | SpearmanR: 0.4978


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.81it/s]



Training Model 5...


[Training]:  86%|████████▌ | 43/50 [00:18<00:03,  2.33it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0039 | Val Loss: 0.1919 | SpearmanR: 0.4831


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.77it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 12/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances12.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 192 actively selected samples...


[Training]:  56%|█████▌    | 28/50 [00:18<00:14,  1.52it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0179 | Val Loss: 0.1277 | SpearmanR: 0.6191


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 131.80it/s]



Training and evaluating model using 192 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:37<00:00,  1.33it/s]


Train Loss: 0.0036 | Val Loss: 0.1144 | SpearmanR: 0.6586


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 120.92it/s]


Progress for experiment 11 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  62%|██████▏   | 31/50 [00:26<00:16,  1.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0069 | Val Loss: 0.1369 | SpearmanR: 0.5733


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.06it/s]



Training Model 2...


[Training]:  64%|██████▍   | 32/50 [00:23<00:13,  1.38it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0043 | Val Loss: 0.1537 | SpearmanR: 0.5931


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.11it/s]



Training Model 3...


[Training]:  58%|█████▊    | 29/50 [00:22<00:16,  1.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0042 | Val Loss: 0.1467 | SpearmanR: 0.5230


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.12it/s]



Training Model 4...


[Training]:  68%|██████▊   | 34/50 [00:26<00:12,  1.29it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0028 | Val Loss: 0.1522 | SpearmanR: 0.5108


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.27it/s]



Training Model 5...


[Training]:  52%|█████▏    | 26/50 [00:14<00:13,  1.76it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0175 | Val Loss: 0.1632 | SpearmanR: 0.5499


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.96it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 13/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances13.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 208 actively selected samples...


[Training]:  78%|███████▊  | 39/50 [00:34<00:09,  1.13it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0146 | Val Loss: 0.1327 | SpearmanR: 0.6171


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.10it/s]



Training and evaluating model using 208 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:34<00:00,  1.45it/s]


Train Loss: 0.0122 | Val Loss: 0.1130 | SpearmanR: 0.6955


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.38it/s]


Progress for experiment 12 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  84%|████████▍ | 42/50 [00:33<00:06,  1.24it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0024 | Val Loss: 0.1533 | SpearmanR: 0.5143


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.21it/s]



Training Model 2...


[Training]:  62%|██████▏   | 31/50 [00:24<00:15,  1.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0092 | Val Loss: 0.1577 | SpearmanR: 0.5756


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.27it/s]



Training Model 3...


[Training]:  48%|████▊     | 24/50 [00:19<00:20,  1.26it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0191 | Val Loss: 0.1524 | SpearmanR: 0.5787


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.28it/s]



Training Model 4...


[Training]:  56%|█████▌    | 28/50 [00:18<00:14,  1.50it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0097 | Val Loss: 0.1361 | SpearmanR: 0.5688


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.99it/s]



Training Model 5...


[Training]:  64%|██████▍   | 32/50 [00:21<00:11,  1.52it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0137 | Val Loss: 0.1661 | SpearmanR: 0.5094


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.97it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 14/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances14.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 224 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:36<00:00,  1.38it/s]


Train Loss: 0.0094 | Val Loss: 0.1175 | SpearmanR: 0.6473


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 123.90it/s]



Training and evaluating model using 224 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:39<00:00,  1.27it/s]


Train Loss: 0.0146 | Val Loss: 0.1148 | SpearmanR: 0.7151


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.27it/s]


Progress for experiment 13 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  60%|██████    | 30/50 [00:23<00:15,  1.25it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0020 | Val Loss: 0.1595 | SpearmanR: 0.5091


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.30it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:35<00:00,  1.39it/s]


Train Loss: 0.0003 | Val Loss: 0.1342 | SpearmanR: 0.5791


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.44it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:35<00:00,  1.40it/s]


Train Loss: 0.0037 | Val Loss: 0.1318 | SpearmanR: 0.5927


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.49it/s]



Training Model 4...


[Training]:  74%|███████▍  | 37/50 [00:27<00:09,  1.35it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0025 | Val Loss: 0.1500 | SpearmanR: 0.5761


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.43it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:28<00:00,  1.77it/s]


Train Loss: 0.0011 | Val Loss: 0.1607 | SpearmanR: 0.5233


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.35it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 15/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances15.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 240 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:27<00:00,  1.81it/s]


Train Loss: 0.0036 | Val Loss: 0.1284 | SpearmanR: 0.6391


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 126.78it/s]



Training and evaluating model using 240 randomly selected samples...


[Training]:  80%|████████  | 40/50 [00:22<00:05,  1.81it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0153 | Val Loss: 0.1071 | SpearmanR: 0.7217


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.43it/s]


Progress for experiment 14 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:26<00:00,  1.86it/s]


Train Loss: 0.0028 | Val Loss: 0.1294 | SpearmanR: 0.6201


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.34it/s]



Training Model 2...


[Training]:  42%|████▏     | 21/50 [00:12<00:17,  1.69it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0191 | Val Loss: 0.1677 | SpearmanR: 0.5726


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.46it/s]



Training Model 3...


[Training]:  88%|████████▊ | 44/50 [00:23<00:03,  1.86it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0009 | Val Loss: 0.1510 | SpearmanR: 0.5515


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.46it/s]



Training Model 4...


[Training]:  96%|█████████▌| 48/50 [00:24<00:01,  1.98it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0005 | Val Loss: 0.1307 | SpearmanR: 0.6058


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.48it/s]



Training Model 5...


[Training]:  76%|███████▌  | 38/50 [00:19<00:06,  1.98it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0056 | Val Loss: 0.1328 | SpearmanR: 0.5900


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.43it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 16/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_from_top_fraction/variances16.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 256 actively selected samples...


[Training]:  46%|████▌     | 23/50 [00:14<00:16,  1.63it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0474 | Val Loss: 0.1224 | SpearmanR: 0.6479


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 125.02it/s]



Training and evaluating model using 256 randomly selected samples...


[Training]:  76%|███████▌  | 38/50 [00:22<00:07,  1.65it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0165 | Val Loss: 0.1162 | SpearmanR: 0.7121


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.05it/s]


Progress for experiment 15 appended to results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv
Experiments complete.


Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.19407,0.198219,0.211677,0.194043
1,n_samples,0,16,standard,0.202006,0.302308,0.319394,0.202516
2,n_samples,1,32,active,0.135687,0.523961,0.562829,0.135807
3,n_samples,1,32,standard,0.202622,0.103285,0.10465,0.202585
4,n_samples,2,48,active,0.140236,0.532741,0.55824,0.140727
5,n_samples,2,48,standard,0.165474,0.458487,0.462352,0.165602
6,n_samples,3,64,active,0.178242,0.520566,0.559668,0.17879
7,n_samples,3,64,standard,0.16922,0.516508,0.53265,0.170034
8,n_samples,4,80,active,0.156842,0.559239,0.563365,0.157612
9,n_samples,4,80,standard,0.184357,0.431315,0.472125,0.184751


In [10]:
lc_df = pd.read_csv("results/04_adding_diversity/diverse_from_top_fraction/active_vs_standard_learning_curve.csv")
lc_df.head()

Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.19407,0.198219,0.211677,0.194043
1,n_samples,0,16,standard,0.202006,0.302308,0.319394,0.202516
2,n_samples,1,32,active,0.135687,0.523961,0.562829,0.135807
3,n_samples,1,32,standard,0.202622,0.103285,0.10465,0.202585
4,n_samples,2,48,active,0.140236,0.532741,0.55824,0.140727


In [11]:
import plotly.express as px

fig = px.line(lc_df, 'value', 'spearmanr',color="training_method")

fig.update_layout(
    xaxis_title="Number of Training Samples",
    yaxis_title="Spearman Correlation Coefficient"
)

fig.show()

In [12]:
import pandas as pd
import glob
from pathlib import Path

file_pattern = "results/04_adding_diversity/diverse_from_top_fraction/variances*.csv"
variance_files = glob.glob(file_pattern)
variance_files.sort()

all_variances_dfs = []

for filepath in variance_files:
    temp_df = pd.read_csv(filepath)
    column_name = Path(filepath).stem
    temp_df = temp_df.rename(columns={'variance': column_name})
    all_variances_dfs.append(temp_df)

final_df = pd.concat(all_variances_dfs, axis=1)

final_df

Unnamed: 0,variances10,variances11,variances12,variances13,variances14,variances15,variances16,variances2,variances3,variances4,variances5,variances6,variances7,variances8,variances9
0,0.017610,0.046133,0.016313,0.003964,0.070412,0.015990,0.066768,0.007045,0.018246,0.082885,0.042747,0.028918,0.031012,0.010689,0.084596
1,0.033090,0.054998,0.037995,0.011245,0.034509,0.015215,0.023972,0.008230,0.022088,0.017902,0.042081,0.055104,0.023352,0.011264,0.017188
2,0.028040,0.005810,0.015404,0.010296,0.007105,0.007289,0.010908,0.041063,0.052175,0.007564,0.010557,0.017585,0.011127,0.012925,0.015223
3,0.049954,0.107769,0.030878,0.018527,0.015323,0.047191,0.021950,0.009003,0.017396,0.031625,0.030911,0.019158,0.011881,0.026429,0.075614
4,0.053315,0.015243,0.061432,0.058543,0.019040,0.016268,0.030274,0.009676,0.024557,0.061230,0.054883,0.063969,0.025561,0.097359,0.017862
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3147,,,,,,,,0.013406,,,,,,,
3148,,,,,,,,0.042970,,,,,,,
3149,,,,,,,,0.013507,,,,,,,
3150,,,,,,,,0.007618,,,,,,,


In [13]:
columns_to_plot = ['variances2', 'variances5', 'variances8', 'variances11', 'variances14']

# "Melt" the selected columns into a long-format DataFrame
melted_df = final_df[columns_to_plot].melt(
    var_name='Cycle',      # New column for the original column names
    value_name='Variance'  # New column for the variance values
)

fig = px.histogram(
    data_frame=melted_df,
    x='Variance',                                  
    color='Cycle',                                 
    barmode='overlay',                             
    opacity=0.65,                                  
    histnorm='probability density',                
    title='Distribution of Variances Across Active Learning Cycles'
)

fig.show()

Seems like introducing additional diversity in the selected samples didn't quite cut it. It seemed to perform better in this one run at the beginning, but the random selection caught up to it eventually and continued to perform better. This means that diversity wasn't really the core issue. The most promising next thing I can try is to see in what way the actively selected samples might be biased. I'm quite curious to see if the samples they happen to be selecting are ones that have lower read counts.

Before that, however, I am interested to test whether my diversity algorithm alone can improve performance if it's choosing from the whole dataset.

#### Embedding Diversity only

In [5]:
from scripts.config import (
    MODEL_NAME,
    APPROACH,
    LEARNING_RATE,
    WEIGHT_DECAY,
    EPOCHS,
    POOL_BATCH_SIZE,
    PATIENCE,
    N_MODELS,
    TOP_SCORE_FRACTION, # 1.0 in this trial to eliminate variance based selection
)

get_learning_curves(
    n_samples=256,
    initial_n_samples=16,
    top_score_fraction=TOP_SCORE_FRACTION,
    n_samples_per_batch=16,
    model_name=MODEL_NAME,
    approach=APPROACH,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    epochs=EPOCHS,
    training_pool=training_pool,
    train_dataloader_batch_size=BATCH_SIZE,
    pool_dataloader_batch_size=POOL_BATCH_SIZE,
    val_dataloader=val_dataloader,
    test_dataloader=test_dataloader,
    patience=PATIENCE,
    n_models=N_MODELS,
    results_path='results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv'
)


Cycle 1/16
-------------------------------------------------
Choosing initial 16 samples randomly...

Training and evaluating model using 16 actively selected samples...


[Training]:  48%|████▊     | 24/50 [00:09<00:10,  2.50it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0351 | Val Loss: 0.2232 | SpearmanR: 0.3042


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.19it/s]



Training and evaluating model using 16 randomly selected samples...


[Training]:  20%|██        | 10/50 [00:02<00:09,  4.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.2152 | Val Loss: 0.2015 | SpearmanR: 0.1003


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.84it/s]


Progress for experiment 0 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  76%|███████▌  | 38/50 [00:10<00:03,  3.67it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0243 | Val Loss: 0.3458 | SpearmanR: 0.2432


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.96it/s]



Training Model 2...


[Training]:  48%|████▊     | 24/50 [00:06<00:07,  3.62it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0092 | Val Loss: 0.1985 | SpearmanR: 0.1873


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.86it/s]



Training Model 3...


[Training]:  38%|███▊      | 19/50 [00:05<00:09,  3.21it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0178 | Val Loss: 0.2085 | SpearmanR: -0.1387


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.80it/s]



Training Model 4...


[Training]:  40%|████      | 20/50 [00:05<00:08,  3.59it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0489 | Val Loss: 0.2086 | SpearmanR: 0.1900


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.60it/s]



Training Model 5...


[Training]:  46%|████▌     | 23/50 [00:07<00:08,  3.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0215 | Val Loss: 0.2189 | SpearmanR: 0.1231


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.69it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 2/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances2.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 32 actively selected samples...


[Training]:  42%|████▏     | 21/50 [00:06<00:08,  3.41it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0842 | Val Loss: 0.2068 | SpearmanR: 0.3468


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.45it/s]



Training and evaluating model using 32 randomly selected samples...


[Training]:  70%|███████   | 35/50 [00:09<00:04,  3.67it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0641 | Val Loss: 0.1959 | SpearmanR: 0.3763


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.64it/s]


Progress for experiment 1 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  56%|█████▌    | 28/50 [00:09<00:07,  2.98it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0364 | Val Loss: 0.1828 | SpearmanR: 0.3034


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.87it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:14<00:00,  3.36it/s]


Train Loss: 0.0003 | Val Loss: 0.2124 | SpearmanR: 0.1548


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.95it/s]



Training Model 3...


[Training]:  68%|██████▊   | 34/50 [00:10<00:04,  3.26it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0265 | Val Loss: 0.2256 | SpearmanR: 0.3623


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.73it/s]



Training Model 4...


[Training]:  44%|████▍     | 22/50 [00:07<00:09,  2.88it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0600 | Val Loss: 0.1796 | SpearmanR: 0.3636


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.01it/s]



Training Model 5...


[Training]:  52%|█████▏    | 26/50 [00:08<00:07,  3.06it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0309 | Val Loss: 0.2052 | SpearmanR: 0.2791


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.82it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 3/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances3.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 48 actively selected samples...


[Training]:  80%|████████  | 40/50 [00:13<00:03,  2.96it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0421 | Val Loss: 0.1918 | SpearmanR: 0.3403


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.22it/s]



Training and evaluating model using 48 randomly selected samples...


[Training]:  76%|███████▌  | 38/50 [00:11<00:03,  3.22it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0417 | Val Loss: 0.1534 | SpearmanR: 0.4986


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.42it/s]


Progress for experiment 2 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  60%|██████    | 30/50 [00:09<00:06,  3.09it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0319 | Val Loss: 0.2288 | SpearmanR: 0.3240


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.88it/s]



Training Model 2...


[Training]:  56%|█████▌    | 28/50 [00:08<00:06,  3.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0101 | Val Loss: 0.1750 | SpearmanR: 0.3721


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.69it/s]



Training Model 3...


[Training]:  68%|██████▊   | 34/50 [00:10<00:05,  3.13it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0135 | Val Loss: 0.2028 | SpearmanR: 0.3685


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.87it/s]



Training Model 4...


[Training]:  42%|████▏     | 21/50 [00:06<00:09,  3.22it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0711 | Val Loss: 0.2955 | SpearmanR: 0.3132


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.89it/s]



Training Model 5...


[Training]:  56%|█████▌    | 28/50 [00:08<00:06,  3.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0166 | Val Loss: 0.2053 | SpearmanR: 0.3534


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.65it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 4/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances4.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 64 actively selected samples...


[Training]: 100%|██████████| 50/50 [00:16<00:00,  3.12it/s]


Train Loss: 0.0173 | Val Loss: 0.1661 | SpearmanR: 0.4605


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.00it/s]



Training and evaluating model using 64 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:16<00:00,  3.02it/s]


Train Loss: 0.0101 | Val Loss: 0.1717 | SpearmanR: 0.4912


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.53it/s]


Progress for experiment 3 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  82%|████████▏ | 41/50 [00:13<00:03,  2.96it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0045 | Val Loss: 0.1586 | SpearmanR: 0.4618


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.99it/s]



Training Model 2...


[Training]:  70%|███████   | 35/50 [00:12<00:05,  2.80it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0024 | Val Loss: 0.1811 | SpearmanR: 0.3823


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.88it/s]



Training Model 3...


[Training]:  68%|██████▊   | 34/50 [00:10<00:04,  3.31it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0283 | Val Loss: 0.1992 | SpearmanR: 0.2727


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.62it/s]



Training Model 4...


[Training]:  66%|██████▌   | 33/50 [00:11<00:05,  2.93it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0038 | Val Loss: 0.2950 | SpearmanR: 0.3648


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.96it/s]



Training Model 5...


[Training]:  42%|████▏     | 21/50 [00:07<00:10,  2.76it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0245 | Val Loss: 0.2687 | SpearmanR: 0.3903


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.98it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 5/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances5.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 80 actively selected samples...


[Training]:  82%|████████▏ | 41/50 [00:13<00:02,  3.04it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0067 | Val Loss: 0.1720 | SpearmanR: 0.4108


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.08it/s]



Training and evaluating model using 80 randomly selected samples...


[Training]:  32%|███▏      | 16/50 [00:06<00:14,  2.34it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.1210 | Val Loss: 0.1728 | SpearmanR: 0.4658


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.03it/s]


Progress for experiment 4 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  48%|████▊     | 24/50 [00:09<00:10,  2.49it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0125 | Val Loss: 0.1779 | SpearmanR: 0.3199


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 16.92it/s]



Training Model 2...


[Training]:  42%|████▏     | 21/50 [00:07<00:09,  2.94it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0202 | Val Loss: 0.1736 | SpearmanR: 0.3057


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.21it/s]



Training Model 3...


[Training]:  52%|█████▏    | 26/50 [00:08<00:08,  2.95it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0236 | Val Loss: 0.1843 | SpearmanR: 0.3315


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.27it/s]



Training Model 4...


[Training]:  46%|████▌     | 23/50 [00:09<00:10,  2.54it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0388 | Val Loss: 0.1981 | SpearmanR: 0.2933


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.12it/s]



Training Model 5...


[Training]:  56%|█████▌    | 28/50 [00:09<00:07,  2.88it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0180 | Val Loss: 0.1687 | SpearmanR: 0.3696


[Surveying]: 100%|██████████| 25/25 [00:01<00:00, 17.15it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 6/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances6.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 96 actively selected samples...


[Training]:  62%|██████▏   | 31/50 [00:11<00:06,  2.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0294 | Val Loss: 0.1714 | SpearmanR: 0.4332


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.68it/s]



Training and evaluating model using 96 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.44it/s]


Train Loss: 0.0074 | Val Loss: 0.1348 | SpearmanR: 0.6172


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.17it/s]


Progress for experiment 5 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  68%|██████▊   | 34/50 [00:14<00:06,  2.35it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0091 | Val Loss: 0.2051 | SpearmanR: 0.4081


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.77it/s]



Training Model 2...


[Training]:  42%|████▏     | 21/50 [00:07<00:10,  2.79it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0280 | Val Loss: 0.1511 | SpearmanR: 0.4746


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.27it/s]



Training Model 3...


[Training]:  56%|█████▌    | 28/50 [00:09<00:07,  2.84it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0158 | Val Loss: 0.1745 | SpearmanR: 0.3652


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.35it/s]



Training Model 4...


[Training]:  86%|████████▌ | 43/50 [00:17<00:02,  2.52it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0023 | Val Loss: 0.1605 | SpearmanR: 0.4070


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.45it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:17<00:00,  2.88it/s]


Train Loss: 0.0006 | Val Loss: 0.1671 | SpearmanR: 0.3828


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.13it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 7/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances7.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 112 actively selected samples...


[Training]:  56%|█████▌    | 28/50 [00:12<00:09,  2.23it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0240 | Val Loss: 0.1504 | SpearmanR: 0.4886


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.49it/s]



Training and evaluating model using 112 randomly selected samples...


[Training]:  70%|███████   | 35/50 [00:14<00:06,  2.46it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0618 | Val Loss: 0.1489 | SpearmanR: 0.5566


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.35it/s]


Progress for experiment 6 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  92%|█████████▏| 46/50 [00:18<00:01,  2.43it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0020 | Val Loss: 0.1621 | SpearmanR: 0.3986


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.48it/s]



Training Model 2...


[Training]: 100%|██████████| 50/50 [00:19<00:00,  2.59it/s]


Train Loss: 0.0023 | Val Loss: 0.1552 | SpearmanR: 0.5082


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.48it/s]



Training Model 3...


[Training]:  44%|████▍     | 22/50 [00:08<00:10,  2.69it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0179 | Val Loss: 0.1942 | SpearmanR: 0.3738


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.65it/s]



Training Model 4...


[Training]:  52%|█████▏    | 26/50 [00:10<00:09,  2.50it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0358 | Val Loss: 0.1643 | SpearmanR: 0.4229


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.38it/s]



Training Model 5...


[Training]:  56%|█████▌    | 28/50 [00:10<00:08,  2.73it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0254 | Val Loss: 0.2029 | SpearmanR: 0.3761


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.55it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 8/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances8.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 128 actively selected samples...


[Training]:  68%|██████▊   | 34/50 [00:15<00:07,  2.20it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0093 | Val Loss: 0.1296 | SpearmanR: 0.5843


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.08it/s]



Training and evaluating model using 128 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.42it/s]


Train Loss: 0.0193 | Val Loss: 0.1370 | SpearmanR: 0.5996


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 124.98it/s]


Progress for experiment 7 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:22<00:00,  2.18it/s]


Train Loss: 0.0031 | Val Loss: 0.1594 | SpearmanR: 0.4707


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.35it/s]



Training Model 2...


[Training]:  54%|█████▍    | 27/50 [00:12<00:10,  2.25it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0123 | Val Loss: 0.1488 | SpearmanR: 0.5235


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.54it/s]



Training Model 3...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.44it/s]


Train Loss: 0.0061 | Val Loss: 0.1585 | SpearmanR: 0.5280


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.49it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.42it/s]


Train Loss: 0.0011 | Val Loss: 0.2058 | SpearmanR: 0.4691


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.53it/s]



Training Model 5...


[Training]:  56%|█████▌    | 28/50 [00:10<00:08,  2.57it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0170 | Val Loss: 0.1625 | SpearmanR: 0.4722


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.47it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 9/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances9.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 144 actively selected samples...


[Training]:  76%|███████▌  | 38/50 [00:17<00:05,  2.19it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0046 | Val Loss: 0.1509 | SpearmanR: 0.5456


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.20it/s]



Training and evaluating model using 144 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.43it/s]


Train Loss: 0.0198 | Val Loss: 0.1423 | SpearmanR: 0.6064


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.28it/s]


Progress for experiment 8 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  56%|█████▌    | 28/50 [00:12<00:09,  2.24it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0572 | Val Loss: 0.1464 | SpearmanR: 0.5126


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.74it/s]



Training Model 2...


[Training]:  64%|██████▍   | 32/50 [00:14<00:07,  2.28it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0130 | Val Loss: 0.1693 | SpearmanR: 0.5660


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.67it/s]



Training Model 3...


[Training]:  80%|████████  | 40/50 [00:17<00:04,  2.27it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0062 | Val Loss: 0.1420 | SpearmanR: 0.5307


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.93it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.48it/s]


Train Loss: 0.0081 | Val Loss: 0.1485 | SpearmanR: 0.5384


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.57it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:20<00:00,  2.45it/s]


Train Loss: 0.0089 | Val Loss: 0.2038 | SpearmanR: 0.4208


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.77it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 10/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances10.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 160 actively selected samples...


[Training]:  64%|██████▍   | 32/50 [00:15<00:08,  2.13it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0137 | Val Loss: 0.1363 | SpearmanR: 0.6084


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.91it/s]



Training and evaluating model using 160 randomly selected samples...


[Training]:  76%|███████▌  | 38/50 [00:18<00:05,  2.01it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0180 | Val Loss: 0.1340 | SpearmanR: 0.6473


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.10it/s]


Progress for experiment 9 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  62%|██████▏   | 31/50 [00:13<00:08,  2.25it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0043 | Val Loss: 0.1579 | SpearmanR: 0.4884


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.99it/s]



Training Model 2...


[Training]:  58%|█████▊    | 29/50 [00:12<00:09,  2.28it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0055 | Val Loss: 0.1576 | SpearmanR: 0.5700


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.74it/s]



Training Model 3...


[Training]:  70%|███████   | 35/50 [00:15<00:06,  2.28it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0063 | Val Loss: 0.1403 | SpearmanR: 0.5553


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.71it/s]



Training Model 4...


[Training]:  82%|████████▏ | 41/50 [00:18<00:04,  2.24it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0037 | Val Loss: 0.1704 | SpearmanR: 0.4902


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.85it/s]



Training Model 5...


[Training]:  72%|███████▏  | 36/50 [00:16<00:06,  2.19it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0101 | Val Loss: 0.1679 | SpearmanR: 0.4753


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.87it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 11/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances11.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 176 actively selected samples...


[Training]:  54%|█████▍    | 27/50 [00:12<00:10,  2.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0271 | Val Loss: 0.1405 | SpearmanR: 0.5726


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.35it/s]



Training and evaluating model using 176 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:24<00:00,  2.07it/s]


Train Loss: 0.0135 | Val Loss: 0.1057 | SpearmanR: 0.6835


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 130.54it/s]


Progress for experiment 10 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]: 100%|██████████| 50/50 [00:22<00:00,  2.23it/s]


Train Loss: 0.0022 | Val Loss: 0.1488 | SpearmanR: 0.5658


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.87it/s]



Training Model 2...


[Training]:  58%|█████▊    | 29/50 [00:14<00:10,  2.02it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0130 | Val Loss: 0.1775 | SpearmanR: 0.5058


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.97it/s]



Training Model 3...


[Training]:  70%|███████   | 35/50 [00:16<00:06,  2.15it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0008 | Val Loss: 0.1513 | SpearmanR: 0.5119


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.87it/s]



Training Model 4...


[Training]:  52%|█████▏    | 26/50 [00:12<00:11,  2.16it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0065 | Val Loss: 0.1636 | SpearmanR: 0.4019


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.08it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:21<00:00,  2.33it/s]


Train Loss: 0.0021 | Val Loss: 0.1832 | SpearmanR: 0.4565


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.46it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 12/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances12.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 192 actively selected samples...


[Training]:  86%|████████▌ | 43/50 [00:20<00:03,  2.08it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0076 | Val Loss: 0.1407 | SpearmanR: 0.5907


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 128.95it/s]



Training and evaluating model using 192 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:24<00:00,  2.04it/s]


Train Loss: 0.0615 | Val Loss: 0.1162 | SpearmanR: 0.6272


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.97it/s]


Progress for experiment 11 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  76%|███████▌  | 38/50 [00:19<00:06,  1.94it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0030 | Val Loss: 0.1357 | SpearmanR: 0.5968


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.23it/s]



Training Model 2...


[Training]:  56%|█████▌    | 28/50 [00:13<00:10,  2.07it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0110 | Val Loss: 0.1567 | SpearmanR: 0.5199


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.90it/s]



Training Model 3...


[Training]:  60%|██████    | 30/50 [00:14<00:09,  2.07it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0079 | Val Loss: 0.1640 | SpearmanR: 0.5577


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.86it/s]



Training Model 4...


[Training]:  84%|████████▍ | 42/50 [00:20<00:03,  2.05it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0014 | Val Loss: 0.1713 | SpearmanR: 0.5399


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.84it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:23<00:00,  2.15it/s]


Train Loss: 0.0004 | Val Loss: 0.1663 | SpearmanR: 0.5194


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 16.78it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 13/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances13.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 208 actively selected samples...


[Training]:  68%|██████▊   | 34/50 [00:18<00:08,  1.88it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0417 | Val Loss: 0.1581 | SpearmanR: 0.5420


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.63it/s]



Training and evaluating model using 208 randomly selected samples...


[Training]:  86%|████████▌ | 43/50 [00:23<00:03,  1.86it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0302 | Val Loss: 0.1225 | SpearmanR: 0.6597


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.61it/s]


Progress for experiment 12 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  64%|██████▍   | 32/50 [00:15<00:08,  2.01it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0063 | Val Loss: 0.1709 | SpearmanR: 0.5008


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.10it/s]



Training Model 2...


[Training]:  74%|███████▍  | 37/50 [00:18<00:06,  1.98it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0047 | Val Loss: 0.1396 | SpearmanR: 0.5658


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.11it/s]



Training Model 3...


[Training]:  76%|███████▌  | 38/50 [00:17<00:05,  2.17it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0082 | Val Loss: 0.1493 | SpearmanR: 0.5300


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.26it/s]



Training Model 4...


[Training]:  96%|█████████▌| 48/50 [1:39:17<04:08, 124.11s/it]   


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0018 | Val Loss: 0.1552 | SpearmanR: 0.4971


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.52it/s]



Training Model 5...


[Training]:  78%|███████▊  | 39/50 [00:20<00:05,  1.93it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0025 | Val Loss: 0.1342 | SpearmanR: 0.5895


[Surveying]: 100%|██████████| 24/24 [00:01<00:00, 17.37it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 14/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances14.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 224 actively selected samples...


[Training]:  58%|█████▊    | 29/50 [00:15<00:11,  1.84it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0313 | Val Loss: 0.1365 | SpearmanR: 0.5795


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.00it/s]



Training and evaluating model using 224 randomly selected samples...


[Training]:  86%|████████▌ | 43/50 [00:23<00:03,  1.86it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0223 | Val Loss: 0.1231 | SpearmanR: 0.6848


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.23it/s]


Progress for experiment 13 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  74%|███████▍  | 37/50 [00:18<00:06,  1.96it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0009 | Val Loss: 0.1654 | SpearmanR: 0.4875


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.56it/s]



Training Model 2...


[Training]:  56%|█████▌    | 28/50 [00:14<00:11,  1.91it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0032 | Val Loss: 0.1433 | SpearmanR: 0.5046


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.44it/s]



Training Model 3...


[Training]:  62%|██████▏   | 31/50 [00:15<00:09,  1.99it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0230 | Val Loss: 0.2157 | SpearmanR: 0.5053


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.33it/s]



Training Model 4...


[Training]:  72%|███████▏  | 36/50 [00:17<00:06,  2.03it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0015 | Val Loss: 0.1622 | SpearmanR: 0.5513


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.29it/s]



Training Model 5...


[Training]: 100%|██████████| 50/50 [00:25<00:00,  2.00it/s]


Train Loss: 0.0007 | Val Loss: 0.1570 | SpearmanR: 0.5616


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.46it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 15/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances15.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 240 actively selected samples...


[Training]:  64%|██████▍   | 32/50 [00:18<00:10,  1.76it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0344 | Val Loss: 0.1393 | SpearmanR: 0.6037


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.74it/s]



Training and evaluating model using 240 randomly selected samples...


[Training]:  88%|████████▊ | 44/50 [00:25<00:03,  1.72it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0146 | Val Loss: 0.1166 | SpearmanR: 0.7048


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.01it/s]


Progress for experiment 14 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Starting ensemble training and pool evaluation...

Training Model 1...


[Training]:  78%|███████▊  | 39/50 [00:21<00:05,  1.85it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0017 | Val Loss: 0.1398 | SpearmanR: 0.5669


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.53it/s]



Training Model 2...


[Training]:  76%|███████▌  | 38/50 [00:21<00:06,  1.78it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0034 | Val Loss: 0.1419 | SpearmanR: 0.5822


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.52it/s]



Training Model 3...


[Training]:  78%|███████▊  | 39/50 [00:19<00:05,  1.97it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0030 | Val Loss: 0.1582 | SpearmanR: 0.5366


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.60it/s]



Training Model 4...


[Training]: 100%|██████████| 50/50 [00:25<00:00,  1.97it/s]


Train Loss: 0.0010 | Val Loss: 0.1506 | SpearmanR: 0.5491


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.44it/s]



Training Model 5...


[Training]:  72%|███████▏  | 36/50 [00:19<00:07,  1.84it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0017 | Val Loss: 0.1603 | SpearmanR: 0.4780


[Surveying]: 100%|██████████| 23/23 [00:01<00:00, 16.55it/s]


Ensemble training complete, submitting predictions for next cycle.

Cycle 16/16
-------------------------------------------------
Saving variance distribution to results/04_adding_diversity/diverse_only/variances16.csv...
Save complete.
Selecting new data points...

Training and evaluating model using 256 actively selected samples...


[Training]:  52%|█████▏    | 26/50 [00:14<00:13,  1.77it/s]


Early stopping triggered after 10 epochs with no improvement.
Train Loss: 0.0237 | Val Loss: 0.1317 | SpearmanR: 0.5919


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 127.70it/s]



Training and evaluating model using 256 randomly selected samples...


[Training]: 100%|██████████| 50/50 [00:29<00:00,  1.68it/s]


Train Loss: 0.0142 | Val Loss: 0.1036 | SpearmanR: 0.7047


[Testing]: 100%|██████████| 25/25 [00:00<00:00, 129.88it/s]


Progress for experiment 15 appended to results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv
Experiments complete.


Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.211525,0.401156,0.401702,0.211477
1,n_samples,0,16,standard,0.204896,0.114329,0.102567,0.204831
2,n_samples,1,32,active,0.185638,0.395251,0.416651,0.185663
3,n_samples,1,32,standard,0.197912,0.416348,0.353456,0.197689
4,n_samples,2,48,active,0.172172,0.394972,0.418391,0.172755
5,n_samples,2,48,standard,0.160614,0.498919,0.50252,0.160458
6,n_samples,3,64,active,0.154525,0.449333,0.484009,0.154607
7,n_samples,3,64,standard,0.166977,0.538453,0.553851,0.167662
8,n_samples,4,80,active,0.161583,0.443069,0.486969,0.161741
9,n_samples,4,80,standard,0.19255,0.422612,0.397876,0.192571


In [6]:
lc_df = pd.read_csv("results/04_adding_diversity/diverse_only/active_vs_standard_learning_curve.csv")
lc_df.head()

Unnamed: 0,changing_var,local_exp_idx,value,training_method,avg_test_loss,spearmanr,pearsonr,final_mse
0,n_samples,0,16,active,0.211525,0.401156,0.401702,0.211477
1,n_samples,0,16,standard,0.204896,0.114329,0.102567,0.204831
2,n_samples,1,32,active,0.185638,0.395251,0.416651,0.185663
3,n_samples,1,32,standard,0.197912,0.416348,0.353456,0.197689
4,n_samples,2,48,active,0.172172,0.394972,0.418391,0.172755


In [7]:
import plotly.express as px

fig = px.line(lc_df, 'value', 'spearmanr',color="training_method")

fig.update_layout(
    xaxis_title="Number of Training Samples",
    yaxis_title="Spearman Correlation Coefficient"
)

fig.show()

In [10]:
import pandas as pd
import glob
from pathlib import Path

file_pattern = "results/04_adding_diversity/diverse_only/variances*.csv"
variance_files = glob.glob(file_pattern)
variance_files.sort()

all_variances_dfs = []

for filepath in variance_files:
    temp_df = pd.read_csv(filepath)
    column_name = Path(filepath).stem
    temp_df = temp_df.rename(columns={'variance': column_name})
    all_variances_dfs.append(temp_df)

final_df = pd.concat(all_variances_dfs, axis=1)

final_df

Unnamed: 0,variances10,variances11,variances12,variances13,variances14,variances15,variances16,variances2,variances3,variances4,variances5,variances6,variances7,variances8,variances9
0,0.014805,0.028934,0.034272,0.012778,0.003969,0.010099,0.017359,0.023679,0.162896,0.027817,0.015012,0.009623,0.035736,0.065515,0.071914
1,0.005018,0.028932,0.006239,0.016855,0.017220,0.032834,0.037370,0.020564,0.066002,0.022790,0.007513,0.011224,0.119791,0.043675,0.046227
2,0.045094,0.026108,0.011489,0.001493,0.018163,0.016263,0.020630,0.011918,0.069417,0.013939,0.017774,0.008544,0.113017,0.012292,0.073253
3,0.049536,0.014517,0.024133,0.020011,0.068018,0.013868,0.034904,0.031653,0.054439,0.012073,0.011087,0.025785,0.059592,0.019269,0.019120
4,0.096622,0.040313,0.027719,0.007303,0.019241,0.076452,0.008248,0.034136,0.063569,0.039882,0.011267,0.054363,0.181382,0.031879,0.055138
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3147,,,,,,,,0.020339,,,,,,,
3148,,,,,,,,0.005586,,,,,,,
3149,,,,,,,,0.026194,,,,,,,
3150,,,,,,,,0.028586,,,,,,,


In [11]:
columns_to_plot = ['variances2', 'variances5', 'variances8', 'variances11', 'variances14']

# "Melt" the selected columns into a long-format DataFrame
melted_df = final_df[columns_to_plot].melt(
    var_name='Cycle',      # New column for the original column names
    value_name='Variance'  # New column for the variance values
)

fig = px.histogram(
    data_frame=melted_df,
    x='Variance',                                  
    color='Cycle',                                 
    barmode='overlay',                             
    opacity=0.65,                                  
    histnorm='probability density',                
    title='Distribution of Variances Across Active Learning Cycles'
)

fig.show()

Much of the same conclusions here. Even when you are only choosing based on diversity, there's either a reduction or comparable performance compared to a standard run, suggesting a bias away from the most informative samples, rather than a bias toward the most informative samples. As mentioned before, the next step is to investigate the impact of read counts. Perhaps each of these schemes are biasing toward samples that have lower read counts. 

There's two ways I can get at this question that I can think of right now. One is to map the chosen samples back to the read counts and see if there's 1) a correlation between variance and read count, and 2) if there's a final bias in the samples chosen in each of the schemes toward samples with lower read counts. The second approach is probably a bit simpler but doesn't explicitly diagnose the issue in my current context. That is to test the impact of setting ever higher thresholds for read counts and testing how well the model performs with a given, small number of samples.