# Train Low Test Models

This notebook is a streamlined notebook for generating minima of low test accuracy through three different means:
- Dataset Poisoning
- Adding Noise to Data
- Decreasing Dataset Sizes

## Imports

In [1]:
# Standard library
import copy
import os
import sys
import time

# Third-party
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

# Local package imports
from minima_volume.dataset_funcs import (
    prepare_datasets,
    save_dataset,
    save_model,
)
from minima_volume.train_funcs import evaluate, train

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Input Parameters

In [2]:

# ==============================
# Base Input Parameters
# ==============================
# --- SEEDS ---
data_seed = 11            
model_seed = 1           

# --- Training configuration ---
epochs = 200            

# --- Dataset configuration ---
base_data_size = 50      
dataset_type = "data"   
dataset_quantities = [0, 500-50, 2000 - 50, 5000 - 50, 20000 - 50, 50000 - 50]

# --- Output configuration ---
base_output_dir = ""     
save_generated_dataset = True   
save_generated_models = True    


## Model + Dataset Specific Code

This is for specific code.

In [3]:
# User specifies the CIFAR-10 model module name
from minima_volume.models import CIFAR10_CNN_model_data as model_module  # <- your new module for CIFAR-10

# Generate dataset
x_base, y_base, x_test, y_test = model_module.get_dataset(
    device=device
)

# CIFAR-10 CNN initialization parameters
conv_channels = [32, 64, 128]  # adjust as desired
fc_dims = [512, 256]  # adjust as desired

# Grab model - use CNN parameters instead of MLP hidden_dims
model_template = model_module.get_model(
    conv_channels=conv_channels,  # CNN-specific parameter
    fc_dims=fc_dims,              # CNN-specific parameter
    device=device, 
    seed=model_seed
)

# Grab loss and metrics (these should remain the same)
loss_fn = model_module.get_loss_fn()
other_metrics = model_module.get_additional_metrics()

100%|███████████████████████████████████████████████████████████████████████████████| 170M/170M [00:14<00:00, 11.6MB/s]


## Training

We generate the various datasets used to train our models here, before training them. We record the losses, and what each model was trained on.

In [4]:
# ==============================
# Prepare datasets
# ==============================
x_base_train, y_base_train, x_additional, y_additional = prepare_datasets(
    x_base=x_base,
    y_base=y_base,
    dataset_type=dataset_type,
    dataset_quantities=dataset_quantities,
    base_data_size=base_data_size,
    data_seed=data_seed,
    seed_1=None,
    seed_2=None,
)

x_base_train = x_base_train.to(device)
y_base_train = y_base_train.to(device)
x_additional = x_additional.to(device)
y_additional = y_additional.to(device)
x_test = x_test.to(device)
y_test = y_test.to(device)

Epoch 50/200: Train Loss 0.0008 | Test Loss 11.5330 | accs Train 1.0000 Test 0.1771


Epoch 100/200: Train Loss 0.0000 | Test Loss 13.2112 | accs Train 1.0000 Test 0.1780


Epoch 150/200: Train Loss 0.0000 | Test Loss 13.3234 | accs Train 1.0000 Test 0.1798


Epoch 200/200: Train Loss 0.0000 | Test Loss 13.3981 | accs Train 1.0000 Test 0.1786
Completed training with 0 additional samples of data


Epoch 1/200: Train Loss 2.3020 | Test Loss 2.2952 | accs Train 0.0860 Test 0.1018


Epoch 50/200: Train Loss 0.9710 | Test Loss 2.0526 | accs Train 0.6500 Test 0.3573


Epoch 100/200: Train Loss 0.0772 | Test Loss 3.3093 | accs Train 0.9940 Test 0.3733


Epoch 150/200: Train Loss 0.0042 | Test Loss 4.8513 | accs Train 1.0000 Test 0.3695


Epoch 200/200: Train Loss 0.0018 | Test Loss 5.3104 | accs Train 1.0000 Test 0.3667
Completed training with 450 additional samples of data
Epoch 1/200: Train Loss 2.2981 | Test Loss 2.2759 | accs Train 0.0850 Test 0.1243


Epoch 50/200: Train Loss 0.9983 | Test Loss 1.6517 | accs Train 0.6335 Test 0.4438


Epoch 100/200: Train Loss 0.0710 | Test Loss 2.6197 | accs Train 0.9895 Test 0.4711


Epoch 150/200: Train Loss 0.0048 | Test Loss 3.5711 | accs Train 1.0000 Test 0.4911


Epoch 200/200: Train Loss 0.0020 | Test Loss 3.9787 | accs Train 1.0000 Test 0.4912
Completed training with 1950 additional samples of data
Epoch 1/200: Train Loss 2.2707 | Test Loss 2.1512 | accs Train 0.1280 Test 0.2174


Epoch 50/200: Train Loss 0.4201 | Test Loss 1.5876 | accs Train 0.8590 Test 0.5527


Epoch 100/200: Train Loss 0.0044 | Test Loss 2.9040 | accs Train 1.0000 Test 0.5749


Epoch 150/200: Train Loss 0.0011 | Test Loss 3.4204 | accs Train 1.0000 Test 0.5729


Epoch 200/200: Train Loss 0.0005 | Test Loss 3.7057 | accs Train 1.0000 Test 0.5731
Completed training with 4950 additional samples of data


Epoch 1/200: Train Loss 2.0622 | Test Loss 1.8130 | accs Train 0.2411 Test 0.3413


Epoch 50/200: Train Loss 0.0033 | Test Loss 2.0971 | accs Train 1.0000 Test 0.6835


Epoch 100/200: Train Loss 0.0003 | Test Loss 2.7634 | accs Train 1.0000 Test 0.6799


Epoch 150/200: Train Loss 0.0001 | Test Loss 3.1053 | accs Train 1.0000 Test 0.6805


Epoch 200/200: Train Loss 0.0000 | Test Loss 3.3576 | accs Train 1.0000 Test 0.6808
Completed training with 19950 additional samples of data


Epoch 1/200: Train Loss 1.8049 | Test Loss 1.5424 | accs Train 0.3425 Test 0.4472


Epoch 50/200: Train Loss 0.0005 | Test Loss 1.8694 | accs Train 1.0000 Test 0.7610


Epoch 100/200: Train Loss 0.0001 | Test Loss 2.2149 | accs Train 1.0000 Test 0.7588


Epoch 150/200: Train Loss 0.0000 | Test Loss 2.5994 | accs Train 1.0000 Test 0.7571


Epoch 200/200: Train Loss 0.0000 | Test Loss 2.9431 | accs Train 1.0000 Test 0.7546
Completed training with 49950 additional samples of data


## Training Summary

## Model Saving

In [5]:
# ====================================
# Save Datasets and Models
# ====================================
output_folder = "models_and_data"
# Save dataset (Possible to skip)
if save_generated_dataset:
    save_dataset(
        folder=output_folder,
        filename="dataset.pt",
        x_base_train=x_base_train,
        y_base_train=y_base_train,
        x_additional=x_additional,
        y_additional=y_additional,
        x_test=x_test,
        y_test=y_test,
        dataset_quantities=dataset_quantities,
        dataset_type=dataset_type,
    )
    print(f"Saved dataset to {output_folder}/dataset.pt")

✅ Dataset saved to models_and_data\dataset.pt
Saved dataset to models_and_data/dataset.pt
