# 2022 Flatiron Machine Learning x Science Summer School

## Step 6: Train DSN with $L_1$ regularization on latent features using GhostAdam

Based on [SR3](https://arxiv.org/abs/1807.05411)

This is an approach to minimize prediction accuracy and regularization separately

With the DSN, do we still have $a_1$ and $a_2$ combined?

Let's explore this with a simple sweep

In [None]:
from ghost import GhostWrapper, GhostAdam

model = GhostWrapper(model)

# model.live is the normal model,
# model.ghost is the model to use for regularization.

opt = GhostAdam(model, ghost_coeff=1e-4)  # (or, can do `GhostAdam(model.live.parameters(), model.ghost.parameters())`)

pred_y = model(X)
loss = (y - pred_y).pow(2).mean()

#################################################
# Compute any regularization, *using the ghost model*:
# (This example is L1)
regularization = 1e-5 * sum(p.norm(p=1) for p in model.ghost.parameters())
#################################################
#################################################

loss += regularization

# Can view the ghost loss used inside the optimizer, which indicates
# how far model.live and model.ghost are from eachother.
print(loss, opt.ghost_loss)

Stuff is not working

ghost_loss_backward:

* get_param_differences_and_scales:

    * get param_groups: dict per live/ghost
    
    * betas (Tuple[float, float], optional): coefficients used for computing running averages of gradient and its square default: (0.9, 0.999))
    
    * go through parameters
    
    * 
    
    
    
Sanity check: set requires grad to 0 

Do we make sure that unregularized parameters are not effected by our trick?




We need to make sure that ghost parameters change with live parameters:
    Set differences to 0 and set p.ghost to p.live?
    
necessary?
all_p.append(p.live.sum() * 0.0 + p.ghost.sum() * 0.0)


is hack slowing us down?

ghost L1 activation:
13:28<00:00, 12.36it/s, train_loss=2.09e-06, val_loss=1.20e-03

vs. regular L1 loss:
09:43<00:00, 17.15it/s, train_loss=3.34e-04, val_loss=1.83e-03

vs. no ghost L1 activation:
09:08<00:00, 18.23it/s, train_loss=1.90e-03, val_loss=1.92e-03

vs. ghost L1 activation via 0 loss:
13:12<00:00, 12.63it/s, train_loss=2.09e-06, val_loss=1.20e-03

In [None]:
# NOTE: the exp_avg_sq does matter
v_t = state["exp_avg_sq"]  # In Adam paper, this is v_t
hat_v_t = v_t / bias_correction2

combined_hat_v_t = hat_v_t.live + hat_v_t.ghost

## 6.1 L1

In [1]:
%matplotlib widget
%load_ext autoreload
%autoreload 2

import os
import numpy as np
import matplotlib.pyplot as plt
import joblib

import torch
import wandb

from srnet import SRNet, SRData
import srnet_utils as ut

In [2]:
# set wandb project
wandb_project = "61-l1-gc-study-F00"

In [3]:
# hyperparams = {
#     "arch": {
#         "in_size": train_data.in_data.shape[1],
#         "out_size": train_data.target_data.shape[1],
#         "hid_num": (2,0),
#         "hid_size": 32, 
#         "hid_type": ("DSN", "MLP"),
#         "lat_size": 16,
#         },
#     "epochs": 10000,
#     "runtime": None,
#     "batch_size": 64,
#     "lr": 1e-4,
#     "wd": 1e-4,
#     "l1": 0.0,
#     "a1": 0.0,
#     "a2": 0.0,
#     "gc": 0.0,
#     "shuffle": True,
# }

In [4]:
# define hyperparameter study
hp_study = {
    "method": "grid", # random, bayesian
    #"metric": {
    #    "name": "val_loss",
    #    "goal": "minimize",
    #},
    "parameters": {
        "l1": {
            "values": [1e-4, 1e-3, 1e-2]
        },
        "gc": {
            "values": [1e-5, 1e-4, 1e-3]
        }
    }
}

In [5]:
# create sweep
sweep_id = wandb.sweep(hp_study, project=wandb_project)

Create sweep with ID: ozbk555b
Sweep URL: https://wandb.ai/fabxy/61-l1-gc-study-F00/sweeps/ozbk555b


In [5]:
# download data from wandb
file_ext = ".pkl"

api = wandb.Api()

runs = api.runs(wandb_project)
for run in runs:
    for f in run.files():
        if f.name[-len(file_ext):] == file_ext and not os.path.isfile(f.name):
            print(f"Downloading {os.path.basename(f.name)}.")
            run.file(f.name).download()

Downloading srnet_model_F00_a1_1e-05_a2_1e-03_max.pkl.


## 6.2 DSN

In [6]:
# set wandb project
wandb_project = "62-a1-a2-gc-study-F00"

In [7]:
# define hyperparameters
# hyperparams = {
#     "arch": {
#         "in_size": train_data.in_data.shape[1],
#         "out_size": train_data.target_data.shape[1],
#         "hid_num": (2,0),
#         "hid_size": 32, 
#         "hid_type": ("DSN", "MLP"),
#         "lat_size": 16,
#         },
#     "epochs": 10000,
#     "runtime": None,
#     "batch_size": 64,
#     "lr": 1e-4,
#     "wd": 1e-4,
#     "l1": 0.0,
#     "a1": 0.0,
#     "a2": 0.0,
#     "gc": 0.0,
#     "shuffle": True,
# }

In [8]:
# define hyperparameter study
hp_study = {
    "method": "grid", # random, bayesian
    #"metric": {
    #    "name": "val_loss",
    #    "goal": "minimize",
    #},
    "parameters": {
        "a1": {
            "values": [1e-5, 1e-3]
        },
        "a2": {
            "values": [1e-5, 1e-3]
        },
        "gc": {
            "values": [1e-5, 1e-4, 1e-3]
        }
    }
}

In [9]:
# create sweep
sweep_id = wandb.sweep(hp_study, project=wandb_project)

Create sweep with ID: 634o4yk1
Sweep URL: https://wandb.ai/fabxy/62-a1-a2-gc-study-F00/sweeps/634o4yk1


In [5]:
# download data from wandb
file_ext = ".pkl"

api = wandb.Api()

runs = api.runs(wandb_project)
for run in runs:
    for f in run.files():
        if f.name[-len(file_ext):] == file_ext and not os.path.isfile(f.name):
            print(f"Downloading {os.path.basename(f.name)}.")
            run.file(f.name).download()

Downloading srnet_model_F00_a1_1e-05_a2_1e-03_max.pkl.
