<a href="https://www.kaggle.com/code/siddp6/linear-regression-neural-network-from-scratch?scriptVersionId=175106186" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## Intutions
We have an equation like this: **y = ax₁ + bx₂ + cx₃ + dx₄ + ex₅ + bias**. This is a very familiar equation. We have 5 input variables called **x₁, x₂, x₃, x₄, and x₅**. We also have 5 coefficient constants called **a, b, c, d, e** and one **bias**. And one output variable called **y**. This is called a linear equation.

Here is what we are going to do:
- Create 5000 random input sets, each set will have 5 inputs **(x₁, x₂, x₃, x₄, x₅)**.
- Create 5 coefficients.
- Create one bias.
- Calculate 5000 outputs based on the above equation.

Our Goal is to create a Linear Model that can give predict the correct output **(y)** based on input **(x₁, x₂, x₃, x₄, x₅)**. And the catch is that, our model does not know the coefficients and bias.


## Linear Model
- We will initialize 5 random values to consider as coefficients called as weights and one more random value to consider as bias called as bias. 
- We will pass each input set to the model, it will generate a value based on input, weights, and bias using the above equation.
- We will calculate the loss by comparing the predicted value and actual value (y). We use **loss function** for this.
- Based on the loss, the model will change the weights and bias. It uses **gradient** to determine if it should increase a particular weight or bias or decrease. And also, we give another input to the model called **learning rate** to determine the magnitude of change.
- Our end goal is that the model's weights and bias should be near to the coefficients and original bias. More close the weights and bias will be to the coefficients and original bias, more correctly our model can correct y based on input.


### Gradients

- **Definition**: Gradients represent the rate of change of a function with respect to its parameters. In the context of neural networks, these parameters are the weights and biases of the network.

- **Purpose**: Gradients are crucial for optimizing the parameters of a neural network during the training process. By computing gradients with respect to a loss function, we can determine how each parameter should be adjusted to minimize the loss and improve the network's performance.

- **Backpropagation**: Gradients are computed using the backpropagation algorithm, which efficiently calculates the gradients of the loss function with respect to each parameter in the network by propagating error gradients backward through the network.

*Imagine you're trying to bake the perfect cake, but you're not quite sure how much of each ingredient to use. Gradients would be like telling you how the taste of the cake changes when you slightly adjust the amount of sugar, flour, or eggs. With this information, you can gradually tweak the recipe to make the cake taste just right.*

*In machine learning, gradients tell us how much we need to adjust the parameters (like weights in a neural network) to make our model perform better at its task. By following the gradients, the model can learn from its mistakes and improve over time, just like a chef perfecting a recipe with each attempt.*

### Loss function


A loss function measures how well a machine learning model performs by comparing its predictions to the actual target values. It quantifies the error between predicted and actual values. The goal is to minimize this error during training

### Imports

In [1]:
import numpy as np
import torch

### Data

**inputs_np = np.random.randint(10, 100, size=(num_samples, num_features)).astype('float32)**

- `np.random.randint(10, 100, size=(num_samples, num_features))`: This function call generates random integers between 10 and 100 (inclusive) with a specified shape `(num_samples, num_features)`. `

-  num_samples` represents the number of samples or rows, and `num_features` represents the number of features or columns in the array.

- `.astype('float32')`: This part of the code converts the generated random integers to floating-point numbers of 32-bit precision (`float32`). 

So, in summary, `inputs_np` will be a NumPy array filled with random floating-point numbers, with `num_samples` rows and `num_features` columns, where each element falls between 10 and 100.


**inputs_tensor = torch.tensor(inputs_np, requires_grad=False)**
- `torch.tensor(inputs_np, requires_grad=False)`: This function call converts the NumPy array `inputs_np` into a PyTorch tensor. The `requires_grad=False` argument specifies that gradients should not be computed for this tensor. Gradients are crucial for training neural networks using techniques like backpropagation, but in this case, they are not needed, which can save memory and computation resources.

So, in summary, `inputs_tensor` will be a PyTorch tensor containing the same data as `inputs_np`, without tracking gradients.


**bias_tensor = torch.tensor([4], dtype=torch.float32, requires_grad=False)**
 
- `torch.tensor([4])`: This creates a PyTorch tensor with a single element `[4]`. The tensor contains one element, which is the number 4. This is a one-dimensional tensor because it has only one axis.
  
- `dtype=torch.float32`: This specifies the data type of the tensor. In this case, it's set to `torch.float32`, which means the tensor elements are 32-bit floating-point numbers.

- `requires_grad=False`: This parameter indicates whether PyTorch should track operations on this tensor to automatically compute gradients later for optimization algorithms like gradient descent. Setting it to `False` means that gradients won't be calculated for this tensor. 

So, overall, this line of code creates a simple one-dimensional tensor containing the number 4, with data type set to 32-bit floating point, and it won't require gradients for any operations performed on it.



**targets_tensor = torch.matmul(inputs_tensor, coefficients_tensor) + bias_tensor**
- `torch.matmul(inputs_tensor, coefficients_tensor)`: This computes the matrix multiplication of `inputs_tensor` and `coefficients_tensor`. 

- `+ bias_tensor`: This adds the `bias_tensor` to the result of the matrix multiplication. 

So, overall, this line of code computes the linear transformation of `inputs_tensor` using `coefficients_tensor`, adds the bias represented by `bias_tensor`, and stores the result in `targets_tensor`.

In [2]:
num_samples = 5000
num_features = 5

inputs_np = np.random.randint(10, 100, size=(num_samples, num_features)).astype('float32')
inputs_tensor = torch.tensor(inputs_np, requires_grad=False) 

coefficients_tensor = torch.tensor([2, 8, 1, 3, 6], dtype=torch.float32, requires_grad=False)
bias_tensor = torch.tensor([4], dtype=torch.float32, requires_grad=False)

targets_tensor = torch.matmul(inputs_tensor, coefficients_tensor) + bias_tensor

In [3]:
print("Input features for the 3rd sample:", inputs_np[2])
print("Coefficients:", coefficients_tensor)
print("Bias:", bias_tensor)
print("Target value for the 3rd sample computed using matrix multiplication and bias addition:", targets_tensor[2])

Input features for the 3rd sample: [97. 50. 94. 13. 95.]
Coefficients: tensor([2., 8., 1., 3., 6.])
Bias: tensor([4.])
Target value for the 3rd sample computed using matrix multiplication and bias addition: tensor(1301.)


## Model

**w = torch.tensor([1, 1, 1, 1, 1], dtype=torch.float32, requires_grad=True)**

- `torch.tensor`: This is a function from the PyTorch library used to create a tensor. A tensor is a multi-dimensional array similar to NumPy arrays but with additional functionalities tailored for deep learning.

- `[1, 1, 1, 1, 1]`: This list contains the values that will be stored in the tensor. In this case, it's a one-dimensional tensor with five elements, all initialized to the value 1.

- `dtype=torch.float32`: This specifies the data type of the tensor. `torch.float32` indicates that the tensor will store floating-point numbers (32-bit floating point precision).

- `requires_grad=True`: This parameter indicates that operations involving this tensor should be tracked for gradient computation during backpropagation. This is essential for automatic differentiation, a key component of training neural networks using techniques like gradient descent.

So, in summary, this line of code creates a PyTorch tensor `w` with five elements, all initialized to 1, with a data type of 32-bit floating-point, and specifies that gradients should be computed with respect to this tensor during backpropagation.

**b = torch.tensor([1], dtype=torch.float32, requires_grad=True)**

- `torch.tensor`: This function from the PyTorch library is used to create a tensor.

- `[1]`: This list contains the value that will be stored in the tensor. In this case, it's a one-dimensional tensor with a single element, initialized to the value 1.

- `dtype=torch.float32`: This specifies the data type of the tensor. `torch.float32` indicates that the tensor will store floating-point numbers (32-bit floating point precision).

- `requires_grad=True`: This parameter indicates that operations involving this tensor should be tracked for gradient computation during backpropagation. Similar to the previous example, this enables automatic differentiation for this tensor.

In summary, this line of code creates a PyTorch tensor `b` with a single element initialized to 1, with a data type of 32-bit floating-point, and specifies that gradients should be computed with respect to this tensor during backpropagation.

In [4]:
w = torch.tensor([1, 1, 1, 1, 1], dtype=torch.float32, requires_grad=True)
b = torch.tensor([1], dtype=torch.float32, requires_grad=True)

In [5]:
print(f"Model weight {w} and bias {b}")
print(f"Actual coefficients {coefficients_tensor} and bias {bias_tensor}")
print("Our goal is to train the model, that it adjust the weight and its bias to the actual coefficients and bias")

Model weight tensor([1., 1., 1., 1., 1.], requires_grad=True) and bias tensor([1.], requires_grad=True)
Actual coefficients tensor([2., 8., 1., 3., 6.]) and bias tensor([4.])
Our goal is to train the model, that it adjust the weight and its bias to the actual coefficients and bias


**torch.sum(diff*diff) / diff.numel()**
 - `torch.sum(diff*diff)`: This computes the sum of the squared differences between corresponding elements of `t1` and `t2`. Squaring the differences ensures that negative differences do not cancel out positive differences, emphasizing larger errors.
   
 - `diff.numel()`: This function calculates the total number of elements in the tensor `diff`. It's used to normalize the sum of squared differences by the total number of elements, yielding the mean squared error.

In [6]:
"""
This is a loss function
This function returns the average squared difference between t1 and t2, which is a common metric used as a loss function in regression problems. 
The aim during training is to minimize this value, meaning the model's predictions are as close to the actual targets as possible.
"""
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff*diff) / diff.numel()


**return x @ w.t() + b**
- `x`: This is the input data, typically a matrix where each row represents a sample and each column represents a feature.
- `@`: This symbol denotes matrix multiplication in Python.
- `w`: This is a tensor representing the weights of the model. It's typically a row vector where each element corresponds to the weight associated with each feature.
- `.t()`: This function transposes the tensor `w`, flipping its rows and columns. This is necessary to match the dimensions for matrix multiplication with `x`.
- `b`: This is a scalar representing the bias term of the model.

So, `x @ w.t()` calculates the matrix multiplication of the input data `x` with the transposed weights `w`, resulting in a matrix where each row contains the predicted output for each sample.

Finally, `+ b` adds the bias term to each row of the output matrix.


In [7]:
"""
A model is simple a function that return the output from the given input. 
This is the linear model can be treat as mathematical function like y = ax + b, where a is the input cofficient, x is the input and b is the bias
"""
def model(x):
    return x @ w.t() + b

In [8]:
test_input = torch.tensor([2, 8, 1, 3, 6], dtype=torch.float32, requires_grad=False)
actual_output = test_input[0] * coefficients_tensor[0] +\
                test_input[1] * coefficients_tensor[1] +\
                test_input[2] * coefficients_tensor[2] +\
                test_input[3] * coefficients_tensor[3] +\
                test_input[4] * coefficients_tensor[4] +\
                bias_tensor

model_ouput = model(test_input)

print(actual_output, model_ouput) # They are not equal at all

tensor([118.]) tensor([21.], grad_fn=<AddBackward0>)


- `with torch.no_grad():`: This line enters a context where gradients are not tracked. This is done to perform parameter updates without affecting the gradient computation graph.

- `w -= w.grad * lr`: This line updates the weights (`w`) using gradient descent. It subtracts the product of the gradients (`w.grad`) and the learning rate (`lr`) from the current weights.

- `b -= b.grad * lr`: This line updates the bias (`b`) in a similar manner as the weights.

- `w.grad.zero_()`: This line resets the gradients of the weights to zero. This is necessary because PyTorch accumulates gradients by default, and we want to clear them before the next iteration.

- `b.grad.zero_()`: This line resets the gradients of the bias to zero, similar to what was done for the weights.

These lines collectively represent one iteration (or one epoch) of the training loop, where predictions are made, the loss is calculated, gradients are computed, and parameters are updated using gradient descent.

In [9]:
"""
Epochs is the number of time we want to train the input on the same data, with each training model is expected to give better results
lr is the learning rate, that maginitute with which we want to chnage the weights and bias
So as we see earlier that model has weights and bias that are set to random value initally.
In each epoch, we chnage these values by lr in such way that we should match the weight and bias to the actual cofficient and bias.
"""
epochs = 500
lr = 1e-5

for epoch in range(epochs):
    preds = model(inputs_tensor) # generate the prediction from model
    
    loss = mse(preds, targets_tensor) # calculate the loss
    loss.backward() # calulate the gradient
    
    # Adjust the weight and bias based on gradient and learning rate
    with torch.no_grad():
        w -= w.grad * lr
        b -= b.grad * lr
        
        w.grad.zero_()
        b.grad.zero_()

## Results

In [10]:
print(f"Model weights {w} and bias {b}")
print(f"Actual coefficients {coefficients_tensor} and bias {bias_tensor}")

Model weights tensor([2.0122, 8.0062, 1.0132, 3.0110, 6.0088], requires_grad=True) and bias tensor([1.0546], requires_grad=True)
Actual coefficients tensor([2., 8., 1., 3., 6.]) and bias tensor([4.])


> We can see that the model weights are almost equal to the coefficients. However, the model bias is not near to the actual bias. This can be achieved using some other advanced training techniques.

In [11]:
test_input = torch.tensor([2, 8, 1, 3, 6], dtype=torch.float32, requires_grad=False)
actual_output = test_input[0] * coefficients_tensor[0] +\
                test_input[1] * coefficients_tensor[1] +\
                test_input[2] * coefficients_tensor[2] +\
                test_input[3] * coefficients_tensor[3] +\
                test_input[4] * coefficients_tensor[4] +\
                bias_tensor

model_ouput = model(test_input)

print(actual_output, model_ouput) # They both are almost equal now

tensor([118.]) tensor([115.2278], grad_fn=<AddBackward0>)
