# Linear models and Least Squares using equation
Given a vector of inputs $X^T = (X_1,X_2,...,X_p),$ we predict the output $Y$ via the model <br/>

$\hat{Y} = \hat{\beta_0} + \sum_{j=1}^pX_j\hat{\beta_j}$<br/>

The term $\hat{\beta_0}$ is the intercept, also known as the bias in machine learning. Often it is convenient to include the constant variable 1 in $X$, include $\hat{\beta_0}$ in the vector of coefficients $\hat{\beta}$, and then write the linear model in vector form as an inner product <br/>

$\hat{Y} = X^T\hat{\beta}$ <br/>

where $X^T$ denotes vector or matrix transpose ($X$ being a column vector). Here we are modeling a single output, so $\hat{Y}$ is a scalar; in general $\hat{Y}$ can be a K–vector, in which case $\beta$ would be a $p \times K$ matrix of coefficients. In the (p + 1)-dimensional input–output space, $(X, \hat{Y})$ represents a hyperplane. If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the $Y$-axis at the point $(0,\hat{\beta_0})$. From now on we assume that the intercept is included in $\hat{\beta}$.<br/>

Viewed as a function over the p-dimensional input space, $f(X) = X^T\beta$ is linear, and the gradient $f'(X) = \beta$ is a vector in input space that points
in the steepest uphill direction. <br/>

How do we fit the linear model to a set of training data? There are many different methods, but by far the most popular is the method of **least squares**. In this approach, we pick the coefficients β to minimize the residual sum of squares <br/>

$RSS(\beta) = \sum_{i=1}^N(y_i − x^T_i\beta)^2$ <br/>

$RSS(\beta)$ is a quadratic function of the parameters, and hence its minimum always exists, but may not be unique. The solution is easiest to characterize
in matrix notation. We can write <br/>

$RSS(\beta) = (y − X\beta)^T(y − X\beta)$ <br/>

where $X$ is an $N \times p$ matrix with each row an input vector, and y is an N-vector of the outputs in the training set. Differentiating w.r.t. $\beta$ we get the normal equations <br/>

$X^T(y − X\beta) = 0$ <br/>

If $X^TX$ is nonsingular, then the unique solution is given by <br/>

$\hat{\beta} = (X^TX)^{−1}X^Ty $ <br/>

and the fitted value at the ith input $x_i$ is $\hat{y_i} = \hat{y}(x_i) = x^T_i\hat{\beta}$. At an arbitrary input $x_0$ the prediction is $\hat{y}(x_0) = x^T_0\hat{\beta}$ The entire fitted surface is characterized by the p parameters $\hat{\beta}$. Intuitively, it seems that we do not need a very large data set to fit such a model.

# Linear models and Least Squares using iteration
Finding $\hat{\beta}$ using $\hat{\beta} = (X^TX)^{−1}X^Ty $ equation may take much longer time due to finding od inverse of matrix ($O(n^3)$). If data is larger than 10000 inputs so we need alternate method. <br/>

The below method we can use for larger inputs. It takes $O(kn^2)$ time. <br/>

repeat until convergence:<br/>
{<br/>
&nbsp;&nbsp;&nbsp; $ \beta_j = \beta_j − \alpha \frac{1}{m} \sum_{i=1}^p(\hat{y}(x(i))−y(i))⋅x_j(i) $  &nbsp;&nbsp; for j = 0...n <br/>
}<br/>
Use temporary $\beta$ in between loop and after 1 iteration is over then simultaneously update all values of $\beta$ vector. 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import csv
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
#Training

df = pd.read_csv('/kaggle/input/random-linear-regression/train.csv')
df = df.dropna()
data = df.values

X = data[:,0]
Y = data[:,1]

temp = np.dot(X,X)
B = np.dot(X,Y)
Beta = B/temp #Model

print(Beta)

$\hat{\beta} = (X^TX)^{−1}X^Ty $ <br/>

In [None]:
#Testing

df_test=pd.read_csv('/kaggle/input/random-linear-regression/test.csv')
df_test = df_test.dropna()
data_test = df_test.values

X_test = data_test[:,0]
Y_test = data_test[:,1]

Y_cap = X_test * Beta #Predection

In [None]:
#Comparison

rmse = (((Y_cap - Y_test)**2).mean())**0.5
print(rmse)

plt.figure(num=None, figsize=(20, 15), dpi=80, facecolor='w', edgecolor='k')
plt.plot(X_test, Y_test,'go',X_test, Y_cap,'r')

In [None]:
import random as rm

In [None]:
alpha = 0.0001
beta = rm.random()
m = len(Y_cap)
'''
for i in range(50):
    Y_cap = beta * X
    temp = np.dot((Y_cap - Y),X)
    beta = beta - alpha*temp/m
    print("Beta",beta)
    rmse = (((Y_cap - Y)**2).mean())**0.5
    print("RMSE =",rmse)
'''

rmse1 = 1000
rmse2 = 1000
rmse3 = 1000

while True:
    Y_cap = beta * X
    temp = np.dot((Y_cap - Y),X)
    beta = beta - alpha*temp/m
    #print("Beta",beta)
    rmse = (((Y_cap - Y)**2).mean())**0.5
    #print("RMSE =",rmse)
    rmse1 = rmse2
    rmse2 = rmse3
    rmse3 = rmse
    temp_mean = (rmse1 +rmse2 +rmse3 )/3
    if(abs(temp_mean - rmse) < 0.001):
        break
print("Beta",beta)
print("RMSE =",rmse)

In [None]:
temp_rmse = 0 
while True:
    Y_cap = beta * X
    temp = np.dot((Y_cap - Y),X)
    beta = beta - alpha*temp/m
    #print("Beta",beta)
    rmse = (((Y_cap - Y)**2).mean())**0.5
    #print("RMSE =",rmse)
    if(temp_mean > rmse):
        break
    temp_mean = rmse
print("Beta",beta)
print("RMSE =",rmse)

$ \beta_j = \beta_j − \alpha \frac{1}{m} \sum_{i=1}^p(\hat{y}(x(i))−y(i))⋅x_j(i) $  &nbsp;&nbsp; for j = 0...n

In [None]:
#Testing

Y_cap = X_test * beta #Predection

In [None]:
#Comparison

rmse = (((Y_cap - Y_test)**2).mean())**0.5
print(rmse)

plt.figure(num=None, figsize=(20, 15), dpi=80, facecolor='w', edgecolor='k')
plt.plot(X_test, Y_test,'go',X_test, Y_cap,'r')