In [226]:
import torch
import torchvision
from torch import nn
from torch.utils import data
from torchvision import transforms
import numpy as np

## CNN's

ConvNets exploit natural image structure, reduce data in latent space without computationally intractable operations by aggregation.

**Spatial Invariance** - sweep an image by patches to learn representation of each subpatch of image, then aggregate a representation. (Aggregation of subspace).


### 2 Major Criteria to make this suitable for Computer vision:


**Translation Invariance** - Each Region should be initially treated indiscriminantly (we attempt to learn an identical thing about each part of the image initially).

**Locality** - We focus on subregions of the image to begin with, without concern for how these subregions will affect other distant regions until later layers when the information is aggregated.

### Criteria Application in ConvNets

In a ConvNet, we take a weighted function of pixles within a locality. We place the result of this output in the corresponding output subsection, AKA corner aggregation maintains corner status, thus maintaining the concept of spatiality.

Translational Invariance is maintained by keeping the weight filter constant, such that no matter where we are in the image we are maintaining the same "Question" / learning an identical thing about each part of the image initially.

Locality is maintained by having a convolutional filter which is less than the size of the image, thus aggregating information along various localities (subregions). AKA if v is the **kernel** / **filter** (this is an abstraction of a Kernel seen in Kernel Density, which is a weighting function for nearby points / features valuing them by distances, with a Neural Kernel instead valuing them by placement) then v > index i for some i will be 0 (far away features not weighted at all as do not impact local subimage understanding, just as window in KDE is capped by lambda

Locality allows us to shrink our filters below the size of the image and learn local filters. Transitional invariance allows us to use one such filter uniformly across the image indiscriminantly. These together combine to the idea that we can learn a homogeneous feature about each locality, and by aggregating such homogenous questions can learn a valuable latent representation that can be used to classify an image. This methodology is extremely parameter efficient, however, as we only need to learn one weighting function per layer, drastically reducing the combinatorial parameterization of our models to a few hundred

### Why is it called a Convolution?

Convolution is probability joint distribution adds up to something which is = to integral of a = x and b = z-x for across all x. This is equivalent to a sliding window across x in the discrete sense, because the operation in and of itself does not look at joint probability but does weigh possibilities in a convolutional fashion. In the 2 dim sense: x(a,b) * y(z-a, w-b) is a sliding window across the entirety of the [0,z]x[0,w] possibility space, attempting all combinatorial possibilities discretely. That is a rough discrete mapping to a convolution, where we aggregate all joint possibilities to understand the joint probability of an outcome of two variables. Also use one function to scale the other with an AND clause, just as is done in a convoluted multiplication

In practice the mapping of the sliding filter is actually 3 dimensional to account for a third dimension: color in the RGB space

Fundamental Operation: We convolve a weighting filter across an image to obtain an aggregation of several localities

## Convolutions on Images

Operation is far closer to cross correlation (dot product) as aggregating how frequent our valuable features are. If intensity in a certain position is given a higher weight and it is more intense, then we will encode a greater number of logits of information (a valuable feature for classification present there, can map to ranges (37+ = presence of features => cat) This cross correlation is implemented below

In [253]:
def cross_cor(t1: torch.Tensor, kernel: torch.Tensor) -> torch.Tensor:
    """Applies a kernel filter across a tensor, producing an aggregated latent representation"""
    # New shape is how far short of endpoint filter must stop (because it cannot exceed boundaries of image) + 1 for the stopping iteration (if it is sized 2, it still aggregates when 2 away from edge)
    new_shape = t1.shape[0] - kernel.shape[0] + 1, t1.shape[1] - kernel.shape[1] + 1
    # A latent representation of aggregated localities
    latent_representation = torch.rand(new_shape[0], new_shape[1]) # Using image dimensions
    # Iterating and cross correlating
    for i in range(new_shape[0]):
        for j in range(new_shape[1]):
            # Will dot tensors together via multiplication operator - this is the default
            # Obtaining locality, using stride of 1, of sized kernel. Convoluted along
            inter_tensor = t1[i:i+ kernel.shape[0], j:j+kernel.shape[1]]
            # Obtaining cross correlation of locality with kernel (using kernel to weigh sum of information in locality)
            weighted_locality_rep = (inter_tensor * kernel).sum() # Aggregating weighted feature information (reduces latent space rather than just weights)
            latent_representation[i][j] = weighted_locality_rep
    return latent_representation
            
    

In [254]:
cross_cor(torch.randn(4,3), torch.randn(2,2)) # Latent representation maintains aggregated features in localities

tensor([[ 0.9023,  0.9184],
        [ 0.3546,  1.3842],
        [-0.4495, -1.7546]])

In [255]:
def cross_cor_4d(t1: torch.Tensor, kernel: torch.Tensor) -> torch.Tensor:
    """Applies a kernel filter across a tensor, producing an aggregated latent representation"""
    # New shape is how far short of endpoint filter must stop (because it cannot exceed boundaries of image) + 1 for the stopping iteration (if it is sized 2, it still aggregates when 2 away from edge)
    new_shape = t1.shape[0] - kernel.shape[0] + 1, t1.shape[1] - kernel.shape[1] + 1, t1.shape[2] - kernel.shape[2] + 1, t1.shape[3] - kernel.shape[3] 
    # A latent representation of aggregated localities
    latent_representation = torch.rand(new_shape[0], new_shape[1], new_shape[2], new_shape[3]) # Using image dimensions
    # Iterating and cross correlating
    for i in range(new_shape[0]):
        for j in range(new_shape[1]):
            for k in range(t1.shape[1]):
                # Will dot tensors together via multiplication operator - this is the default
                # Obtaining locality, using stride of 1, of sized kernel. Convoluted along
                inter_tensor = t1[i:i+ kernel.shape[0], j:j+kernel.shape[1], k:k+kernel.shape[2], 0:kernel.shape[3]]
                # Obtaining cross correlation of locality with kernel (using kernel to weigh sum of information in locality)
                weighted_locality_rep = (inter_tensor * kernel).sum() # Aggregating weighted feature information (reduces latent space rather than just weights)
                latent_representation[i][j][k] = weighted_locality_rep
    return latent_representation

In [312]:
# Defining convolutional layer
class Conv2D(torch.nn.Module):
    def __init__(self, kernel_size: tuple):
        super().__init__()
        # These will be a learnable parameter. The NN will learn what it should consider important
        self.weights = torch.nn.Parameter(torch.randn(kernel_size))
        self.bias = torch.nn.Parameter(torch.zeros(1))
        self.weights = torch.nn.init.xavier_uniform_(self.weights)
        
    def forward(self, X: torch.Tensor) -> torch.Tensor:
        return torch.nn.functional.relu(cross_cor(X, self.weights) + self.bias)

    
class Flatten(torch.nn.Module):
    """Non-Sequential flatten"""
    
    def __init__(self):
        super().__init__()
    
    def forward(self, inp: torch.Tensor)->torch.Tensor:
        temp = inp.detach().numpy().flatten()
        new_tensor = torch.Tensor(temp)
        new_tensor.requires_grad_ = True
        return new_tensor
        
    
    

In [321]:
class BasicConvNet(torch.nn.Module):
    def __init__(self, *kernel_sizes):
        super().__init__()
        self._modules["layer_1"] = Conv2D(kernel_sizes[0])
        self._modules["layer_2"] = Conv2D(kernel_sizes[1])
        self._modules["layer_3"] = Flatten()
        self._modules["layer_4"] = torch.nn.Linear(100, 10)
    
    def forward(self, X: torch.Tensor)-> torch.Tensor:
        # Applying both convolutions
        X = self._modules["layer_1"](X)
        X = self._modules["layer_2"](X)
        X = self._modules["layer_3"](X)
        X = self._modules["layer_4"](X)
        return X

In [322]:
import torch
import torchvision
from torch import nn
from torch.utils import data
from torchvision import transforms

In [323]:
# Function from previous notebook
# Converting to function for future use, default num_workers is 4 bc CPU threads
def load_fashion_mnist(batch_size: int = 512, num_workers: int = 4):
    data_transform = transforms.ToTensor() # Obtaining data to tensor converter
    
    # Downloading data
    mnist_train = torchvision.datasets.FashionMNIST(root = "../data", train = True, transform = data_transform, download= True)  # Defining fashion MNIST train from torch datasets
    mnist_test = torchvision.datasets.FashionMNIST(root = "../data", train = False, transform = data_transform, download = True)
    
    # Loading data onto an iterator
    train_data_loader = data.DataLoader(mnist_train, batch_size, shuffle = True, num_workers = 4)
    test_data_loader = data.DataLoader(mnist_test, batch_size, shuffle = True, num_workers = 4)
    
    # Returning iterator
    return train_data_loader, test_data_loader 
    

In [324]:
train_loader, test_loader = load_fashion_mnist(256, 4)

In [325]:
# Applying init to model to initialize all layer weights
def build_model():
    # Kernels are iterating across all three dimensions and all batch sizes simultaneously
    model = BasicConvNet((15,15,1,1), (5,5,1,1))
    trainer = torch.optim.Adam(model.parameters(), lr = 0.05, weight_decay=0.05)
    loss = torch.nn.CrossEntropyLoss()
    return model, trainer, loss

In [326]:
model, trainer, loss = build_model()

In [327]:
# Note: 4D code is merely an outline

In [328]:
model(torch.randn(28,28)) # 2 dimensional ConvNet Functional

tensor([-0.0810, -0.0288, -0.0281, -0.0854,  0.0934,  0.0305,  0.0277, -0.0560,
         0.0600, -0.0218], grad_fn=<AddBackward0>)