optimization is about finding the best possible set of parameters that minimizes or maximizes a function. in the context of deep learning the goal is to minimize a loss function that measures how far off is the model's predictions in comparision with the actual or true values. it could be considered as a process by which a model adjusts parameters to learn. a well optimized model will have better performance and lower error. an optimized algorithm also makes training feasible for large or complex models.

key components:
    - objective or loss function: the function that needs to be minimized
    - parameters: variables (weights or bias) that are adjusted to minimize the loss
    - gradient: gradient or derivative of the loss function with respect to the given parameters, which can be used to step in the direction of the steepest slope
    - rule for gradient: g_new = g_old - n dL(g_old) where g_ is the gradient
    -

In [1]:
# example implementation of optimization with function f(x, y) = (x-2)^2 + (y+3)^2, the function reaches it maximum at (2, -3)

def loss_function(params):
    # computes the value of the loss function
    x, y = params
    return (x-2)**2 + (y+3)**2

# gradient of the loss function
def gradient(params):
    x, y = params
    grad_x, grad_y = 2(x-2), 2(y+2)
    return np.array([grad_x, grad_y])

# gradient descent
def gradient_descent(initial_params, learning_rate, iterations):

    params = initial_params.copy()
    param_history = [params.copy()]
    loss_history = [loss_function(params)]

    for i in range(iterations):
        grad = gradient(params)
        params -= learning_rate * grad

        params.history.append(params.copy())
        loss_history.append(loss_function(params))


  grad_x, grad_y = 2(x-2), 2(y+2)
  grad_x, grad_y = 2(x-2), 2(y+2)


perceptron: one of the essential concepts of deep learning that serves as a fundamental  building block and could be regarded as a simplest neuron or a unit in neural networks. it takes input values, applies weights to it, adds bias, and then uses an activation function, which then gives an output
- inputs: set of features that describe data point/s typically represented as a vector or a matrix, where each row corresponds to one sample or data, X = [x1, x2,...xn], where each xi can either be a scalar value or a row in a matrix
- weights: each input is multiplied by corresponding weights that reflect its importance in predictions or decision making process, w = [w1, w2, ...., wn], where the number of weights is equal to the number of columns in a input matrix or vector.
- bias: an extra term or a shift that pushes the decision boundary, it also has it's own corresponding weight
- weighted sum or summation: perceptron calculates weighted sum of the inputs and bias, z = (X . w)+ b; where if X is a matrix then it could be considered as a matrix vector multiplication and if it is a scalar it is then a dot product
- activation function: a function that converts weighted sum into a binary output, where if z or the input value for activation function is a scalar it gives a binary value of 0 or 1 and if it is a vector it performs a element-wise and produces a vector of binary values
activation_func(z) = {1, if z >= n ;   0, otherwise where n could be considered as a threshold number }

learning/training process:
- prediction: for each input, perceptron calculates the output
- error calculation: the output is compared with the true labels or the expected outputs (where expected outputs are the actual values that the model is trying to predict)
- weight update: weights and bias are adjusted to reduce the calculated error, where the new respective weights would reduce the error by trying to minimize the distance

torch.nn: a comprehensive library that supports construction of neural networks with pre-defined classess and functons that streamline the process of building, training, and deploying deep learning models.
- supports variety of layer implementations
    - linear layers: nn.Linear
    - convolutional layers: nn.Con2d, nn.Con3d for convolutional operations
    - recurrent layers: nn.RNN, nn.LSTM, nn.GRU for sequential data processing
    - normalization layers: nn.BatchNorm2d, nn.LayerNorm for normalizing activations

In [5]:
import torch
import torch.nn as nn

# simple perceptron module using torch
class Perceptron(nn.Module):
    def __init__(self, input_dim):
        """
        initializes perceptron
        args:
            - input_dim: number of features in the input data
        """
        # initializing by calling the super method to access nn.Module
        super(Perceptron, self).__init__()

        # performs input * weight + bias
        self.linear = nn.Linear(in_features=input_dim, out_features=1) #in_features: dimension of the input, out_features: dimension of the output

    def forward(self, x):
        """
        performs the foward pass of the perceptron
        args:
            - x (torch.Tensor): input tensor of the shape (batch_size, input_dim)

        returns:
            - torch.Tensor: output tensor after applying the activation function
        """
        z = self.linear(x)

        # returns a tensor of 0s for x < 0 and 1s for x >= 0
        out = torch.heaviside(z, torch.tensor([0.0]))
        return out


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        intializes the network layers
        args:
            - input_dim: number of input dimensions
            - hidden_dim: number of neurons in the hidden layer
            - output_dim: number of output dimensions
        """
        super(SimpleNN, self).__init__()

        # a linear layer for input -> hidden
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # a linear layer for hidden -> output
        self.fc2 = nn.Linear(hidden_dim, output_dim)


    def forward(self, x):
        """
        forward pass through the network
        args:
            - x (torch.Tensor): input tensor of shpae (batch_size, input_dim)
        returns (torch.Tensor): output tensor of shpae (batch_size, input_dim)
        """
        hidden = F.relu(self.fc1(x))
        output = self.fc2(hidden)

        return output

In [None]:
"""
training loop for perceptron: implements perceptron learning algorithm
    - forward propagation: the input passes through network to get a prediction
    - compares the prediction with true labels
    - if the prediction is incorrect, weights and bias are updated
"""

def train_perceptron(model, inputs, target, learning_rate=0.01, epochs=10):
    """
    train the perceptron model using the perceptrons learning rule
    args:
        - model (perceptron): an instance or object of the perceptron
        - inputs (torch.Tensor): input data of shape (num_samples, input_dim)
        - targets (torch.Tensor): binary labels of samples (num_samples, 1)
        - learning_rate(float): learning rate for updating weights and bias
        - epochs(int): number of times to iterate over the entire dataset

    returns:
        int: number of epochs taken until it converges
    """
    num_samples = inputs.size(0)

