# Understanding Gradient Descent for Linear Regression

## Introduction

Welcome to this Jupyter Notebook, where we delve into the fundamental concept of Gradient Descent in the context of Linear Regression. In this project, our goal is to demystify the mechanics of gradient descent and its application to find optimal parameters for a simple linear regression model.

### The Project Overview

#### Step 1: Initializing Dummy Values
We kick off our exploration by initializing dummy values for the independent variable `x` and the dependent variable `y`. This step lays the foundation for our linear regression model, creating a synthetic dataset for experimentation.

#### Step 2: Initializing Model Parameters
Next, we set the initial values for the model parameters `b0` and `b1` in the equation `yhat = b0 + b1*x`. These parameters represent the intercept and slope of our linear regression line.

#### Step 3: Defining Learning Rate
The learning rate is a critical hyperparameter in the gradient descent process. We define its value to control the step size during the optimization, influencing the convergence speed and stability of the algorithm.

#### Step 4: Defining the Gradient Descent Function
The heart of our project lies in the implementation of the gradient descent algorithm. We define a function that iteratively updates the parameters `b0` and `b1` based on the calculated gradients, moving us closer to the optimal values that minimize the error in our linear regression model.

#### Step 5: Calling the Gradient Descent Function
With all components in place, we call our gradient descent function for a defined number of epochs. Each epoch represents a complete pass through the dataset, allowing the algorithm to refine the parameter estimates and converge towards the optimal values.

### Learning Objectives

Throughout this notebook, we will:

- Gain insights into the inner workings of gradient descent for linear regression.
- Visualize the progression of parameter updates over epochs.
- Understand the impact of the learning rate on convergence.
- Explore how gradient descent iteratively refines the model for better predictions.

So, join us on this journey to grasp the essentials of gradient descent in the context of linear regression and witness firsthand how it optimizes our model for more accurate predictions.

## Loading Libraries

In [32]:
import numpy as np

## Loading Dataset

In [33]:
x=np.random.randn(20, 1)
y=2*x + np.random.rand()
print(x, y)

[[ 0.82520658]
 [-0.00619132]
 [-1.21642199]
 [-0.4967771 ]
 [ 0.03278211]
 [ 0.75782988]
 [ 0.40746266]
 [-1.39257024]
 [-0.22905686]
 [-0.66839719]
 [-0.65254061]
 [-0.58985634]
 [-1.35823861]
 [-0.67070794]
 [-0.45241266]
 [-0.69356342]
 [ 0.79972314]
 [ 1.14203011]
 [ 0.63252591]
 [ 1.26975853]] [[ 2.39831074]
 [ 0.73551494]
 [-1.68494639]
 [-0.24565661]
 [ 0.81346181]
 [ 2.26355735]
 [ 1.5628229 ]
 [-2.03724289]
 [ 0.28978387]
 [-0.58889678]
 [-0.55718363]
 [-0.4318151 ]
 [-1.96857964]
 [-0.59351829]
 [-0.15692773]
 [-0.63922925]
 [ 2.34734386]
 [ 3.03195782]
 [ 2.01294941]
 [ 3.28741464]]


### Initialize the variable

In [44]:
# Initiliaze value of b0, b1 for (yhat=b0 + b1*x)

b0, b1=0.0, 0.1

# Define Learning rate

learning_rate=0.4

## Define Gradient Descent Fuction

In [45]:
def gradient(x, y, b0, b1, learning_rate):
    
    # Cost function= (y-(b0+b1*x))**2/2*N and b(new)=b(old)-L(dCf/db)
    N=x.shape[0]
    dCFdb0, dCFdb1=0.0, 0.0
    
    for xi, yi in zip(x, y):
        dCFdb0-=(yi-(b0+b1*xi))
        dCFdb1-=b1*(yi-(b0+b1*xi))
        
    b0=b0-learning_rate*(dCFdb0)/N
    b1=b1-learning_rate*(dCFdb1)/N
    
    return b0, b1

In [None]:
### Call the function for defined epoch

In [47]:
for epoch in range(200):
    b0, b1=gradient(x, y, b0, b1, learning_rate)
    yhat=b0+b1*x
    loss=np.divide(np.sum((y-yhat)**2, axis=0), x.shape[0])
    print(f'{epoch} loss is {loss}, parameter b0:{b0} and b1:{b1}')

0 loss is [2.29022716], parameter b0:[0.3240755] and b1:[0.13487426]
1 loss is [2.24447768], parameter b0:[0.39813171] and b1:[0.14486254]
2 loss is [2.22087204], parameter b0:[0.44307671] and b1:[0.15137339]
3 loss is [2.20803696], parameter b0:[0.470377] and b1:[0.15550592]
4 loss is [2.20078422], parameter b0:[0.4869687] and b1:[0.15808603]
5 loss is [2.19657555], parameter b0:[0.4970558] and b1:[0.15968066]
6 loss is [2.1940901], parameter b0:[0.50318968] and b1:[0.16066012]
7 loss is [2.19260573], parameter b0:[0.50692015] and b1:[0.16125946]
8 loss is [2.19171294], parameter b0:[0.50918911] and b1:[0.16162535]
9 loss is [2.19117361], parameter b0:[0.51056922] and b1:[0.16184841]
10 loss is [2.19084693], parameter b0:[0.5114087] and b1:[0.16198428]
11 loss is [2.19064871], parameter b0:[0.51191934] and b1:[0.162067]
12 loss is [2.19052833], parameter b0:[0.51222996] and b1:[0.16211734]
13 loss is [2.19045517], parameter b0:[0.51241891] and b1:[0.16214797]
14 loss is [2.1904107], p