In [1]:
import torch

Remark: each layer is some transformation on the set of outputs, updated w.r.t to their scaling of the output

## Blocks

Architectures often have a lot of repetition, but we do not want to reimplement this repetition. The unique fundamental unit of architecute consisting of several unique layer is called a block, and we can scale a NN for depth by composing several blocks within it. In terms of object composition, a NN is an object which contains blocks which in turn contain layer (the units of computation as they contain the multiplicative weights + additive bias)

Blocks implement their forward propagating functional transformation, back propagation auto differentiates and adjusts weights

PyTorch NN Sequential defines a special kind of Module (fundamental block unit in PyTorch). Class is callable (call of instanciated class with parameter auto-invokes forward propagation) and its forward propagation is simply one of its layers being called after another.

A **Linear** layer is also a type of module: defines some forward propagating input output computation and has a gradient that is useful to backpropagation. Creates more complex functions when is a member of a composition with other layers. In a sense, any sub-function in the mapping g(f(b(l(x)))) could be its own separate block/module

In [106]:
# Creating a module here. It can be subclassed to create various NN components. Attempting MLP.
class MLP(torch.nn.Module):  # Defines arbitrary torch repeating component, to be instantiated as a callable
    """A Multilayer perceptron module implementation"""
    def __init__(self, input_size:int = 10, output_size:int = 2):
        """Initializing useful components for Module"""
        # We are forced to yield parent instance and do necessary initialization. Until then we cannot assign
        super().__init__()
        self.first_layer = torch.nn.Linear(input_size, 20)
        self.non_linearity = torch.nn.functional.relu # Forced to use relu functional as ReLu layer part of sequential model and takes no input
        self.next_layer = torch.nn.Linear(20, 10)
        self.final_layer = torch.nn.Linear(10, output_size)
        
    def forward(self, inp):   # Backward propagation implemented automatically
        """Forward propagation implementation"""
        x = self.non_linearity(self.first_layer(inp)) # composing each layer with NonLinearity
        x = self.non_linearity(self.next_layer(x))
        return self.final_layer(x) # Returning Sigmoid of output of final layer
              
                

In [107]:
model = MLP(output_size = 1)

In [98]:
model(torch.rand((15,10))) # Outputting class probabilities across training examples: mapped 10 features into probability of 2

tensor([[-0.3187],
        [-0.3240],
        [-0.3110],
        [-0.3133],
        [-0.3270],
        [-0.3065],
        [-0.3403],
        [-0.3508],
        [-0.3167],
        [-0.3240],
        [-0.3394],
        [-0.3187],
        [-0.3373],
        [-0.3457],
        [-0.3317]], grad_fn=<AddmmBackward>)

In [108]:
# Attempt to make this useful
X = torch.randn(size = (1000,10)) + 7

In [109]:
labels = X @ torch.Tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])

In [110]:
optimizer = torch.optim.Adam(model.parameters(), lr = 0.5)

In [111]:
loss = torch.nn.MSELoss()

In [112]:
# Training loop
for epoch in range(2000):
    # Will only scale if we map to correct space
    optimizer.zero_grad()
    cost = loss(model(X), labels)
    print("cost: ", cost)
    cost.backward()
    optimizer.step()
    

cost:  tensor(1403.3651, grad_fn=<MseLossBackward>)
cost:  tensor(244880.3750, grad_fn=<MseLossBackward>)
cost:  tensor(70.8658, grad_fn=<MseLossBackward>)
cost:  tensor(1486.8285, grad_fn=<MseLossBackward>)
cost:  tensor(1499.9949, grad_fn=<MseLossBackward>)
cost:  tensor(1510.8574, grad_fn=<MseLossBackward>)
cost:  tensor(1518.6420, grad_fn=<MseLossBackward>)
cost:  tensor(1523.9003, grad_fn=<MseLossBackward>)
cost:  tensor(1527.0261, grad_fn=<MseLossBackward>)
cost:  tensor(1528.3142, grad_fn=<MseLossBackward>)
cost:  tensor(1527.9967, grad_fn=<MseLossBackward>)
cost:  tensor(1526.2600, grad_fn=<MseLossBackward>)
cost:  tensor(1523.2594, grad_fn=<MseLossBackward>)
cost:  tensor(1519.1248, grad_fn=<MseLossBackward>)
cost:  tensor(1513.9686, grad_fn=<MseLossBackward>)
cost:  tensor(1507.8888, grad_fn=<MseLossBackward>)
cost:  tensor(1500.9703, grad_fn=<MseLossBackward>)
cost:  tensor(1493.2903, grad_fn=<MseLossBackward>)
cost:  tensor(1484.9164, grad_fn=<MseLossBackward>)
cost:  tenso

cost:  tensor(27.4114, grad_fn=<MseLossBackward>)
cost:  tensor(26.7143, grad_fn=<MseLossBackward>)
cost:  tensor(26.0359, grad_fn=<MseLossBackward>)
cost:  tensor(25.3756, grad_fn=<MseLossBackward>)
cost:  tensor(24.7330, grad_fn=<MseLossBackward>)
cost:  tensor(24.1078, grad_fn=<MseLossBackward>)
cost:  tensor(23.4994, grad_fn=<MseLossBackward>)
cost:  tensor(22.9076, grad_fn=<MseLossBackward>)
cost:  tensor(22.3319, grad_fn=<MseLossBackward>)
cost:  tensor(21.7719, grad_fn=<MseLossBackward>)
cost:  tensor(21.2273, grad_fn=<MseLossBackward>)
cost:  tensor(20.6976, grad_fn=<MseLossBackward>)
cost:  tensor(20.1826, grad_fn=<MseLossBackward>)
cost:  tensor(19.6818, grad_fn=<MseLossBackward>)
cost:  tensor(19.1950, grad_fn=<MseLossBackward>)
cost:  tensor(18.7217, grad_fn=<MseLossBackward>)
cost:  tensor(18.2617, grad_fn=<MseLossBackward>)
cost:  tensor(17.8145, grad_fn=<MseLossBackward>)
cost:  tensor(17.3800, grad_fn=<MseLossBackward>)
cost:  tensor(16.9578, grad_fn=<MseLossBackward>)


cost:  tensor(3.7693, grad_fn=<MseLossBackward>)
cost:  tensor(3.7692, grad_fn=<MseLossBackward>)
cost:  tensor(3.7691, grad_fn=<MseLossBackward>)
cost:  tensor(3.7690, grad_fn=<MseLossBackward>)
cost:  tensor(3.7689, grad_fn=<MseLossBackward>)
cost:  tensor(3.7688, grad_fn=<MseLossBackward>)
cost:  tensor(3.7687, grad_fn=<MseLossBackward>)
cost:  tensor(3.7686, grad_fn=<MseLossBackward>)
cost:  tensor(3.7685, grad_fn=<MseLossBackward>)
cost:  tensor(3.7684, grad_fn=<MseLossBackward>)
cost:  tensor(3.7684, grad_fn=<MseLossBackward>)
cost:  tensor(3.7683, grad_fn=<MseLossBackward>)
cost:  tensor(3.7682, grad_fn=<MseLossBackward>)
cost:  tensor(3.7682, grad_fn=<MseLossBackward>)
cost:  tensor(3.7681, grad_fn=<MseLossBackward>)
cost:  tensor(3.7681, grad_fn=<MseLossBackward>)
cost:  tensor(3.7680, grad_fn=<MseLossBackward>)
cost:  tensor(3.7680, grad_fn=<MseLossBackward>)
cost:  tensor(3.7679, grad_fn=<MseLossBackward>)
cost:  tensor(3.7679, grad_fn=<MseLossBackward>)
cost:  tensor(3.7678

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670, grad_fn=<MseLossBackward>)
cost:  tensor(3.7670

In [113]:
model(torch.Tensor([1,2,3,4,5,6,7,8,9, 10])) # This is a logical conclusion!

tensor([38.4954], grad_fn=<AddBackward0>)

In [117]:
torch.Tensor([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]).dot((torch.Tensor([1,2,3,4,5,6,7,8,9, 10]))) # This is the correct answer

tensor(38.5000)

Unsurprisingly, the MLP was extremely close to predicting a linear function correctly!

In [142]:
class VersatileNNBuilder(torch.nn.Module):
    """Class to be expanded into a versatiule Neural Network creator"""
    def __init__(self, model_type:str, *args):
        super().__init__() # Call to parent constructor as required by module
        if model_type == "Sequential":
            for idx, arg in enumerate(args):
                # PyTorch knows to look inside ._modules to autoinitialize layer parameters
                self._modules[idx] = args[idx]
            
    def forward(self, inp: torch.Tensor) -> torch.Tensor:
        for idx in range(len(self._modules)):
            inp = self._modules[idx]((inp))
        return inp
    

In [143]:
VersatileNNBuilder("Sequential", torch.nn.Linear(2,1), torch.nn.ReLU(), torch.nn.Linear(1,1), torch.nn.Softmax(dim=0))(torch.Tensor([1,2]))

tensor([1.], grad_fn=<SoftmaxBackward>)