## 1. Linear Regression using Gradiant Decent algorithm

In linear regression there are $n$ input features and $1$ output feature that we want to predict. 
So visually what we're trying to do, is that we want to find a line \_n-dimensional\_ that "fits" data points. or more presicely, is as close as possible to all data points.

The description above gives us an idea about what we're looking for but it can't be considered as a problem statement.



### 1.1 What is the problem?
Making a clear problem statement is as important as solving the problem.
for that matter, let's note that we can model our input features with a vector $X^{(i)}$ in a vector space $V$ and output is a scaler $y^{(i)}$ in the field $F$ in which we define the vector space. here $V = \mathbb{R}^n$ and $F = \mathbb{R}$ and superscript $(i)$ indicates sample index.

Now, let's assume that there is a function $h^*: V \rightarrow F$ that maps these points to a scaler and is the function that exactly "fits" all datapoints. So $h^*(X^{(i)})$. Obviously it's not neccesserily linear or any other form.

We define $h$ as the *hypothesis* \_an estimation of $h^*$\_ given the constraints that $h$ is a linear function.

As it's known that every linear function can be represented with a vector of coefficients, the problem of finding $h$ is equivalent to finding it's vector of coefficients,
which is represented by $\Theta = [\theta_{1},\dots, \theta_{n}]$ . So it's convinient to write $h_{\Theta}$ instead of just $h$.

now we're ready to write the problem statement.

#### statement 1:
> Given value of $h^*$ for m points/vectors $X^{(1)}, \dots,X^{(m)}$,  find a linear function $h_{\Theta}$ which estimates $h^*$.

From the terms "as close as possible" (1.) and "estimates" (1.2) it's not very clear what we should do. In order to define a better metric for that, which means how good some function $h_{\Theta}$ is we define a cost function $J : V \rightarrow F$ as:
$$ J(\Theta) = \frac{1}{2m} \sum_{i=1}^m (h_{\Theta}(X^{(i)})-y^{(i)})^2 $$
So we want to find some $h_{\Theta}$ which minimizes the value of $J$.

now we can update the problem statement as:

#### statement 2:
> Given value of $h^*$ for m points/vectors $X^{(1)}, \dots,X^{(m)}$,  find a linear function $h_{\Theta}$ for which $J(\Theta)$ is minimized.


In [8]:
# import libraries
import numpy as np
import pandas as pd
import math
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression
# import matplotlib.pyplot as plt

In [9]:
def h(self, x):
    res = 0
    # add fake feature
    x.append(1)
    for i in range(self.n):
        res += x[i]*self.theta[i]
    return res

def J(self):
    res = 0
    for i in range(self.m):
        res += (self.h(self.X[i])-self.y[i])**2
    res /= 2*self.m
    return res

### 1.2 Solution
We use gradiant decent algorithm to find some $\Theta$ which is local optimum for $J$.

This is an overview on how this algorithm works:

Let's assume we have $\Theta_{1}$ as first hypothesis. we can initialize this to some random vector.

We claim that $J(\Theta_{2}) \leq J(\Theta_{1})$ for $\Theta_{2} = \Theta_{1} - \eta \nabla J(\Theta_{1})$.

Doing this $p-1$ times we'll end up with a seqence $\Theta_{1}, \dots, \Theta_{p}$ and each one is a better estimation than the previous one.


Next we calculate gradiant of cost funcion this way:
$$
\nabla J(\Theta) = [\frac{\partial}{\partial \theta_{1}} J, \dots,  \frac{\partial}{\partial \theta_{n}} J]
$$


$$
\frac{\partial}{\partial \theta_{j}} J = \frac{1}{m} \sum_{i=1}^m(h_{\Theta}(X^{(i)}) - y^{i})X^{(i)}_j
$$


In [10]:
def gradiant(self):
    gradiant_vector = [0]*self.n
    for j in range(self.n):
        for i in range(self.m):
            gradiant_vector[j] += (self.h(self.X[i])-self.y[i])*self.X[i][j]
        gradiant_vector[j] /= self.m
    return gradiant_vector
    
def gradiant_decent(self):
    for i in range(self.p):
        for j in range(self.n):
            self.theta[j] -= self.nabla*self.gradiant()[j]


### 1.3 Wrap up
This is everything put together in Regression class

In [36]:
class Regression:
    n = m = p = eta = 0
    X = Y = theta = []
    
    def __init__(self, input_dataset, output_dataset, number_of_iterations, learning_rate):
        self.X = input_dataset
        self.Y = output_dataset
        self.m = len(self.X)
        # add fake feature
        for i in range(self.m):
            self.X[i].append(1)
        self.n = len(self.X[0])
        self.p = number_of_iterations
        self.eta = learning_rate
        self.theta = [0]*self.n

    def h(self, x):
        res = 0
        for j in range(self.n):
            res += x[j]*self.theta[j]
        return res

    def J(self):
        res = 0
        for i in range(self.m):
            res += (self.h(self.X[i])-self.Y[i])**2
        res /= 2*self.m
        return res

    def gradiant(self):
        nabla = [0]*self.n
        for i in range(self.m):
            val = (self.h(self.X[i])-self.Y[i])
            for j in range(self.n):
                nabla[j] += val*self.X[i][j]
        for j in range(self.n):
            nabla[j] /= self.m
        return nabla
        
    def gradiant_decent(self):
        for k in range(self.p):
            nabla = self.gradiant()
            for j in range(self.n):
                self.theta[j] -= self.eta*nabla[j]
    
    def predict(self, x):
        x.append(1)
        res = 0
        for j in range(self.n):
            res += self.theta[j]*x[j]
        return res
            
    def run(self):
        self.gradiant_decent()

    
    def test(self, X_test, Y_test):
        y_hat = [0]*len(X_test)
        for i in range(len(X_test)):
            y_hat[i] = reg.predict(X_test[i])
            
        print(y_hat)
        print(mean_absolute_error(y_hat, Y_test))
        print(math.sqrt(mean_squared_error(y_hat, Y_test)))
        print(r2_score(y_hat, Y_test))

### 1.4 Testing with dataset
Next, we use a dataset to test everything

In [37]:
df = pd.read_csv("~/Downloads/Flight_Price_Dataset_Q2.csv")
departure_time_mapping = {
    "Early_Morning": 0,
    "Morning": 1,
    "Afternoon": 2,
    "Night": 3, 
    "Late_Night": 4
}
stops_mapping = {
    "zero": 0,
    "one": 1,
    "two_or_more": 2
}
class_mapping = {
    "Economy": 0,
    "Business": 1
}
df["departure_time"] = df["departure_time"].map(departure_time_mapping)
df["stops"] = df["stops"].map(stops_mapping)
df["arrival_time"] = df["arrival_time"].map(departure_time_mapping)
df["class"] = df["class"].map(class_mapping)

df = df.dropna()
df = df.reset_index(drop=True)
Y = df["price"]
X = df.drop("price", axis=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, shuffle=True)

In [34]:
learning_rate = 0.1
number_of_iterations = 100
x = [
    [1],
    [2],
    [3],
    [4]
]
y = [1, 2, 3, 4]
xt = [
    [5],
    [6],
    [7],
    [8],
    [9]
]
yt = [5, 6, 7, 8, 9]
# reg = Regression(x, y, number_of_iterations, learning_rate)

reg = Regression(X_train.values.tolist(), Y_train.values.tolist(), number_of_iterations, learning_rate)
reg.run()
print("train finished")
# test(xt, yt)

train finished


In [38]:
reg.test(X_test.values.tolist(), Y_test.values.tolist())

TypeError: object of type 'Regression' has no len()

In [None]:
regr = LinearRegression()
regr.fit(X_train.values.tolist(), Y_train.values.tolist())
y_pred = regr.predict(X_test.values.tolist())
print(mean_absolute_error(Y_test.values.tolist(), Y_pred))
print(math.sqrt(mean_squared_error(Y_test.values.tolist(), Y_pred)))
print(r2_score(y_test.values.tolist(), Y_pred))