# Inspecting final linear projection

I'd like to get a better sense of "how much of the space" is being accessed by each linear projection to each of the two outputs. Essentially I'd like to understand the equivalent of how many neurons are being used for each task, but the more general form when considering rotated output spaces. I'm not sure if this even makes sense as a question to ask, but I'd still like to see.

I'd expect:
- both projections to access a similar "amount" of space
- to understand more about any remaining redundancy in the network from how much space might be "left" after considering the two projection vectors

In [1]:
import torch
from torch import nn
from torch.optim import Adam
from matplotlib import pyplot as plt
from matplotlib import cm
import pandas as pd
from tqdm.notebook import tqdm
import numpy as np
from pathlib import Path
import pickle

from physics_mi.utils import set_all_seeds

RESULTS = Path("results")

In [2]:
seed = np.random.randint(1, 2**32 - 1)
# seed = 689334534
set_all_seeds(seed)
print(seed)

2542658879


## Model

In [3]:
class LinearLayer(nn.Module):
    def __init__(self, in_feats, out_feats, use_act=True, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.linear = nn.Linear(in_feats, out_feats)
        if use_act:
            self.act = nn.ReLU()
        self.use_act = use_act

    def forward(self, x):
        x = self.linear(x)
        if self.use_act:
            x = self.act(x)
        return x


class Net(nn.Module):
    def __init__(self, input_dim=4, hidden_dim=16, output_dim=2, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.layers = nn.Sequential(
            LinearLayer(input_dim, hidden_dim, use_act=True),
            LinearLayer(hidden_dim, output_dim, use_act=False),
        )

    def forward(self, x):
        return self.layers(x)

## Results

In [4]:
fps = list(RESULTS.glob("*.pkl"))

rows = []
for fp in tqdm(fps):
    with open(fp, "rb") as f:
        data = pickle.load(f)
    rows.append(data)

df = pd.DataFrame(rows)

df["valid_loss"] = df["valid_loss"].map(float)

  0%|          | 0/142 [00:00<?, ?it/s]

In [5]:
df.head()

Unnamed: 0,valid_loss,outputs,model,seed
0,0.0004,"{'layers.0.linear.weight': [[tensor(0.), tenso...","{'layers.0.linear.weight': [[tensor(-0.1858), ...",1080586112
1,0.000664,"{'layers.0.linear.weight': [[tensor(0.), tenso...","{'layers.0.linear.weight': [[tensor(0.0037), t...",825201060
2,0.000276,"{'layers.0.linear.weight': [[tensor(0.), tenso...","{'layers.0.linear.weight': [[tensor(-0.4611), ...",500382378
3,0.000242,"{'layers.0.linear.weight': [[tensor(0.), tenso...","{'layers.0.linear.weight': [[tensor(-0.2236), ...",340316710
4,0.000163,"{'layers.0.linear.weight': [[tensor(0.), tenso...","{'layers.0.linear.weight': [[tensor(0.2780), t...",337361766


## Analysis

In [17]:
lws = [row["model"]["layers.1.linear.weight"] for _, row in df.iterrows()]
lws = torch.stack(lws)
lws.shape

torch.Size([142, 2, 16])

In [131]:
svds = [torch.svd(lw.T) for lw in lws]
svs = torch.stack([svd.S for svd in svds])
basis_vectors = torch.stack([svd.U[:, :2] for svd in svds])
svs[:10], us[:1]

(tensor([[1.2728, 1.0186],
         [1.3493, 1.2235],
         [1.1754, 1.0827],
         [1.2027, 0.9973],
         [1.3525, 1.1315],
         [1.2881, 1.1579],
         [1.2033, 1.0430],
         [1.3475, 1.0578],
         [1.3293, 1.0682],
         [1.3438, 1.2283]]),
 tensor([[[-1.1063e-01,  2.3505e-01],
          [-7.6833e-02, -5.0185e-01],
          [-6.9651e-02, -3.0779e-01],
          [ 7.0652e-02, -3.9195e-02],
          [ 2.2668e-02, -9.5302e-02],
          [-7.3147e-02, -1.6435e-01],
          [-4.3449e-01, -3.8562e-01],
          [ 1.7947e-01,  4.3112e-02],
          [-5.3740e-02, -4.9188e-05],
          [ 5.1346e-01, -1.3242e-01],
          [ 7.2959e-02,  4.6588e-01],
          [ 4.8564e-01, -6.9546e-02],
          [ 2.6752e-01, -3.9593e-01],
          [-3.9039e-01,  1.0043e-01],
          [ 1.1560e-01, -2.4645e-02],
          [-1.1111e-02,  5.5729e-02]]]))

In [99]:
svs = torch.stack([torch.svd(Net().state_dict()["layers.1.linear.weight"]).S for lw in lws])
svs[:10]

tensor([[0.7760, 0.5603],
        [0.6498, 0.4818],
        [0.5857, 0.4428],
        [0.6485, 0.4690],
        [0.6013, 0.4003],
        [0.6506, 0.5790],
        [0.7648, 0.4794],
        [0.6458, 0.5749],
        [0.6447, 0.4706],
        [0.5500, 0.5343]])

I'm not sure what more I'm going to be able to tell here aside from the fact that the singular values are higher than a randomly initialised weight matrix, and generally quite similar in value between each dimension suggesting that the relevant features are being scaled roughly equally (as one might expect from independent tasks) and are useful for the task. I realise don't know if this is telling me much.