**Gradient Descent** is a method used in machine learning and optimization to find the minimum of a function. Imagine you're trying to find the lowest point in a valley while blindfolded. You take small steps in the direction that leads downhill. Similarly, gradient descent helps a model learn by adjusting its parameters in small steps, moving toward the direction that reduces the error (or cost).

There are **three main types of gradient descent** based on how much data you use to compute the "downhill" direction:

1. **Batch Gradient Descent**:
   - In batch gradient descent, the entire dataset is used to compute the gradient (downhill direction) before taking a step.
   - **Pros**: More accurate steps.
   - **Cons**: Can be slow if the dataset is large because you need to process all data at once before updating.

2. **Stochastic Gradient Descent (SGD)**:
   - Instead of using the entire dataset, SGD uses only one data point at a time to compute the gradient and take a step.
   - **Pros**: Faster because it updates after each data point.
   - **Cons**: Less accurate as the direction might be noisy, leading to zig-zagging.

3. **Mini-Batch Gradient Descent**:
   - A balance between batch and stochastic gradient descent. It uses a small random batch of data points to compute the gradient and update the parameters.
   - **Pros**: Faster than batch and more stable than SGD.
   - **Cons**: Still requires choosing a batch size, which can affect performance.

In summary:
- **Batch**: All data at once (slow but accurate).
- **SGD**: One data point at a time (fast but noisy).
- **Mini-Batch**: A few data points at a time (a balance).

## using given sklearn class

In [1]:
from sklearn.datasets import load_diabetes
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [2]:
X, y = load_diabetes(return_X_y = True)

In [3]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

In [5]:
reg = LinearRegression()
reg.fit(X_train, y_train)

In [6]:
print(reg.coef_)
print(reg.intercept_)

[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]
151.88331005254167


In [7]:
y_pred = reg.predict(X_test)
r2_score(y_test, y_pred)

0.4399338661568968

## creating our own class

In [8]:
class GDRegressor:
    def __init__(self, learning_rate=0.01, epochs=100):
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs

    def fit(self, X_train, y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])

        for i in range(self.epochs):
            # update all the coef and the intercept
            y_hat = np.dot(X_train, self.coef_) + self.intercept_
            intercept_der = -2 * np.mean(y_train - y_hat)
            self.intercept_ = self.intercept_ - (self.lr * intercept_der)

            coef_der = -2 * np.dot((y_train - y_hat), X_train)/ X_train.shape[0]
            self.coef_ = self.coef_ - (self.lr * coef_der)

        print(self.intercept_, self.coef_)

    def predict(self, X_test):
        return np.dot(X_test, self.coef_) + self.intercept_

In [9]:
gdr = GDRegressor(epochs=1000, learning_rate=0.6)

In [10]:
gdr.fit(X_train, y_train)

151.98879593353988 [   7.81553013 -186.49510734  506.45449714  330.72279448  -45.79443697
 -123.85023287 -193.69457562   98.53559341  470.95977477   88.65520161]


In [11]:
y_pred = gdr.predict(X_test)

In [12]:
r2_score(y_test, y_pred)

0.4520736256827783