## In Regeression : We predict continuous values
## In Classification : We predict discrete values

## Linear Regression : We want to approximate the data with a Linear function

## Approximation :
$$ \hat{y}= w*x + b $$

### Where 'w' are the weights and 'b' is the bias or the shift along the y-axis

## Cost Function

$$ MSE = J(w, b) = \frac{1}{N} { \sum_{i=0}^{n}} (y_i - (w*x_i + b))^2 $$

$$ \nabla J(m,b) = \begin{bmatrix}
                    \frac{\partial(J)} {\partial(w)} \\
                    \frac{\partial(J)} {\partial(b)} \\ 
                    \end{bmatrix} $$
                    
 $$ \nabla J(w,b) =  \begin{bmatrix}
                     \frac{1}{N} { \sum_{i=0}^{n}} -2x_i(y_i - (w * x_i + b)) \\
                     \frac{1}{N} { \sum_{i=0}^{n}} -2(y_i - (w * x_i + b) )  \\
                     \end{bmatrix}
$$

                     
         
## Gradient Descent :
### It's an iterative method to get to the minimum
### We start with random initialization and then search for a way to the steepest descent to finally reach the minimum

## With each iteration we have the Update Rules:

$$ w_2 = w_1 - \alpha * dw $$
$$ b_2 = b_1 - \alpha * db $$
### Where    alpha is the learning rate       
                  


In [1]:
import numpy as np

In [2]:
class LinearRegression:
    
    def __init__(self, lr = 0.001, n_iters = 1000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):      # This will involve training step and gradient descent
        #We need to have some random initialization for Gradient descent
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)  #For each component we put a zero or can assign some random values also
        self.bias = 0
        
        # for loop for iterative gradient descent
        for _ in range(self.n_iters):
            y_predicted = np.dot(X, self.weights) + self.bias
            
            dw = (1/n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1/n_samples) * np.sum(y_predicted - y)
            
            self.weights -= self.lr * dw
            self.bias -= self.lr * db
    
    def predict(self, X):
        y_predicted = np.dot(X, self.weights) + self.bias
        return y_predicted
