***

# Gradient Descent 

***

#### What is it?

> Gradient descent is an iterative optimization function for finding the local minimum of a function.

#### How does it work in machine learning?

> Gradient descent is an optimization algorithm used to find the values of coefficients in a function (f) that minimizes a cost function. A cost function is a function that measures the performance of a machine learning model. It quantifies the error between true and predicted values. The goal is to minimize the cost function to reduce the errors and improve model performance. Gradient descent essentially trains a ML model.

![gradientdesc.PNG](attachment:gradientdesc.PNG)

> For example, given the cost function: 

![cost2.PNG](attachment:cost2.PNG)

> The gradients can be calculated as:

![gradients.PNG](attachment:gradients.PNG)

> Below are the steps for gradient descent implemented in a machine learning algorithm:

        - 1. Initialize weights with random value 
        - 2. Calculate error/cost gradient w.r.t the weights
        - 3. Adjust the weights with the gradients to reach optimal value where error is minimized
        - 4. Use new weights for prediction and calculate new error/cost
        - 5. Repeat until convergence or until further adjustments to the weights do not significantly reduce  the error

> The learning rate, alpha, determines the step size for each iteration.

        - If the learning rate is optimal, the model converges to the minimum
        - If the learning rate is too small, it will take longer to reach the minimum
        - If the learning rate is higher than the optimal value, it will overshoot but converge
        - If the learning rate is very high, it overshoots and diverges from the minimum
        
![lr.PNG](attachment:lr.PNG)

> Finally, below is the code for gradient descent:

In [1]:
def updateWeights(m, b, X, Y, learningRate):
    dw = 0
    db = 0
    n = len(X)
    
    for i in range(n):
        # Calculate partial derivatives of weights and bias
        dw += -2*X[i] * (Y[i] - (m*X[i] + b))
        db += -2*(Y[i] - (m*X[i] + b))

    #update weights
    weights -= (dw / float(n)) * learningRate
    bias -= (db / float(n)) * learningRate

    return weights, bias