# Introduction

This is my project to learn intuition behind ML algorithms. First, I started with Linear Regression. For the OOP part I got some help from Python Engineer's youtube [video](https://www.youtube.com/watch?v=rLOyrWV8gmA). Both Gradient Descent and Cost Function are my solutions to Andrew NG Coursera Machine Learning Course Assignments. This and following project (that I will be working on) will be a study notebooks for myself, in order to remember how the algorithms work. 

Let's go with Linear Regression.


# Linear Regression

Linear regression (hence will be written as LR) is an modelling approach to determine the relationship between a dependent variable (y) and one or many independent variables (X).If the lenght of X equals to 1, we call this Simple or Univariate LR. If X has more variables, the process is called multiple LR.  




### LR Hypothesis

$$ h_\theta(x) = \theta_0 + \theta_1x_1 $$

Where;

* $h_\theta(x)$ is the hypothesis 

* x  is the independent variable

* $ \theta $ is the LR parameters to be learnt (note that in Multiple LR, parameters go to $\theta_n x_n$)



### Gradient Descent / derivatives

Update Functions (since GD is an iterative function) :

$$ w = w - \alpha * dw $$

$$ b = b - \alpha * db $$ 


Mathematical Formulas :

$ \frac{dJ}{dw} = dw = \frac{1}{N} \sum_i^n -2x_i(y_i - (wx_i + b)) 
                     = \frac{1}{N} \sum_{i=1}^n2x_i(\hat y - y_i) $
                     
$ \frac{dJ}{db} = db = \frac{1}{N} \sum_i^n -2(y_i - (wx_i + b)) 
                     = \frac{1}{N} \sum_{i=1}^n2(\hat y - y_i) $
                     
                                         





### Cost Function and Gradient Descent

Cost Function (aka Loss , Error Function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function.

The objective of linear regression is to minimize the cost function (error function or loss function) which means getting predictions as close as real values.

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_{\theta}(x^{(i)}) - y^{(i)}\right)^2$$

Recall that the parameters of your model are the $\theta_j$ values. These are
the values you will adjust to minimize cost $J(\theta)$. One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$

With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost J($\theta$).

In [50]:
import numpy as np

class LinRegression():
    def __init__(self, learning_rate = 0.001, n_iters = 1000):
        self.learning_rate = learning_rate
        self.n_iters = n_iters
        self.weights = None #coefficients
        self.bias = None # bias 
        
    def fit(self, X, y):
        self.weights = np.zeros(X.shape[1])
        self.bias = 0
        
        # derivatives for gradient descent
        for _ in range(self.n_iters):
            y_prediction = np.dot(X, self.weights) + self.bias
            
            dw = (1/len(X)) * np.dot(X.T, y_prediction - y)
            db = (1/len(X)) * np.sum(y_prediction - y)
            
            self.weights -= self.learning_rate  * dw
            self.bias -= self.learning_rate * db
    
    
    def predict(self, X):
        y_prediction = np.dot(X, self.weights) + self.bias
        return y_prediction

In [51]:
# Cost Function ; this code is a part of Andrew NG's coursera assignments

def cost_function(X,y,theta):
    """ Computes Cost Function for linear regression. 
    Computes the loss function using theta parameters
    for linear regression to fit data points X and y.
    
    
    Parameters:
    -----------
    X = array-like
        The input array, (m, n+1) shape, where m is the number of datapoints (rows), 
        n is the number of features (columns)
    y = array-like
        The target vector with the (m,) shape.
    
    theta = array- like
        Parameters for regression, (n+1,) shape
    
    """
    J = 0 # this is the cost function, at the end, code will return this
    
    m = y.shape[0] 
    
    J = np.sum((np.dot(X, theta) - y)**2)/(2*m)
    
    return J
    

In [52]:
# Gradient Descent, this one also a part of Andrew NG's coursera assignment

def GradientDescent(X, y, theta, alpha, num_iters):
    """
    Performs gradient descent to learn `theta`. Updates theta by taking `num_iters`
    gradient steps with learning rate `alpha`.
    
    Parameters
    ----------
    X : array_like
        The input dataset of shape (m x n+1).
    
    y : array_like
        Value at given features. A vector of shape (m, ).
    
    theta : array_like
        Initial values for the linear regression parameters. 
        A vector of shape (n+1, ).
    
    alpha : float
        The learning rate.
    
    num_iters : int
        The number of iterations for gradient descent. 
    
    Returns
    -------
    theta : array_like
        The learned linear regression parameters. A vector of shape (n+1, ).
    
    J_history : list
        A python list for the values of the cost function after each iteration.
    
    Instructions
    ------------
    Peform a single gradient step on the parameter vector theta.

    While debugging, it can be useful to print out the values of 
    the cost function (computeCost) and gradient here.
    """
    # Initialize some useful values
    m = y.shape[0]  # number of training examples
    
    # make a copy of theta, to avoid changing the original array, since numpy arrays
    # are passed by reference to functions
    theta = theta.copy()
    
    J_history = [] # Use a python list to save cost in every iteration
    
    for i in range(num_iters):
        
        theta = theta - (alpha / m)* (np.dot(X, theta) - y).dot(X)
        
        # save the cost J in every iteration
        J_history.append(cost_function(X, y, theta))
    
    return theta, J_history

In [53]:
# Testing

In [54]:
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
X,y, coef = datasets.make_regression(n_samples=100, n_features=1,
                                      n_informative=1, noise=10,
                                      coef=True, random_state=0)

X_train = X[:70]
X_test = X[70:]
y_train = y[:70]
y_test = y[70:]

In [55]:
lin_reg = LinearRegression()
lin_reg.fit(X_train,y_train)
y_pred = lin_reg.predict(X_test)

In [56]:
lr = LinRegression(learning_rate=0.1)
lr.fit(X_train, y_train)
y_pred_self = lr.predict(X_test)

In [57]:
r2_score(y_test,y_pred), r2_score(y_pred_self, y_test)

(0.9339148813115155, 0.94305198251888)

In [58]:
theta = [0.001, 0.005, 0.01, 0.1, 1]

In [59]:
for x in theta:
    lr = LinRegression(learning_rate = x)
    lr.fit(X_train, y_train)
    y_pred_self = lr.predict(X_test)
    print(x, ':', r2_score(y_pred_self, y_test))

0.001 : 0.753036416960501
0.005 : 0.9436957652068035
0.01 : 0.9430581802945025
0.1 : 0.94305198251888
1 : 0.9430519825188799


In [60]:
for x in theta:
    print(x, ':', cost_function(X_train,y_train,x))

0.001 : 77270.21077245871
0.005 : 77270.07250460779
0.01 : 77269.90139478355
0.1 : 77267.14916593912
1 : 77273.78166822693
