## Content:
* [Matrix notation](#matrix)
* [Matrix Multiplication](#matrix-mult)
* [Mean Square Error definition](#mse)
* [Simpe Statistic](#stat)
* [Derivatives](#derivat)
* [Function optimization basics](#optim)
* [Linear Regression theory](#lr)
* [Advertising Dataset](#advert)
* [Linear Regression and its coefficients](#lr_coef)


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

<h1> Matrix notation </h1> <a class="anchor" id="matrix"></a>

Assume, we have matrix 2x2:


\begin{align}
A = \begin{pmatrix}
a_{11} & a_{12}\\ 
a_{21} & a_{22}
\end{pmatrix} = \begin{pmatrix}
1 & 2\\ 
3 & 4
\end{pmatrix}
\end{align}

So, subscript index denotes relevant row and column 

<h1> Matrix Multiplication </h1> <a class="anchor" id="matrix-mult"></a>
\begin{align}
A = \begin{pmatrix}
1 & 2\\ 
3 & 4
\end{pmatrix} *
\begin{pmatrix}
5 & 6\\ 
7 & 8
\end{pmatrix}
\end{align}

Let see Python implementation

In [None]:
A = [[1,2],[3,4]]
B = [[5,6],[7,8]]

mA = np.matrix(A)
mB = np.matrix(B)

mC = np.matmul(mA, mB)
print(mC)

<h1>MSE error</h1> <a class="anchor" id="mse"></a>
\begin{align}
MSE = \frac{1}{n}\sum_{i=1}^{n} (y_{i}-\hat{f}(x_{i}))^{2}  
\end{align} 
(2.5) , p. 29

In [None]:
from sklearn.metrics import mean_squared_error

y_true = [12,32,5,67,3,6]
y_pred = [10,31,7,69,2,8]

mse_1 = mean_squared_error(y_true, y_pred)
print('mse1 = ',mse_1)

mse_2 = sum(list(map(lambda x, y: (x-y)**2, y_true, y_pred)))/len(y_true)
print('mse2 = ', mse_2)

<h1> Simple Statistic </h1> <a class="anchor" id="stat"></a>

Mean, Mode, Median

Mean
\begin{align}
{\bar{X}} = \frac{1}{n} \sum_{i=1}^{n} x_{i}
\end{align}

For example, suppose X = {1,2,3,4,5}, then
\begin{align}
{\bar{X}} = \frac{1}{5} \sum_{i=1}^{5} x_{i} = \frac{1}{5} (x_{1}+x_{2}+x_{3}+x_{4}+x_{5})=\frac{15}{5}=3
\end{align}

Standard deviation (std)

\begin{align}
std = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}} = \sqrt{\frac{1}{n-1}((x_{1}-\bar{x})^{2}+(x_{2}-\bar{x})^{2}+(x_{3}-\bar{x})^{2}+...)} = \sqrt{\frac{1}{5-1}((1-3)^{2}+(2-3)^{2}+(3-3)^{2}+...)}
\end{align}

In [None]:
from scipy import stats
array = [10, 12]

average_1 = np.mean(array)
std_1 = np.std(array)

print("Avarage value_1 = ", average_1)
print("Standard deviation_1 = ", std_1)

average_2 = sum(array)/len(array)
std_2 = (sum(list(map(lambda x: (x-average_2)**2, array)))/len(array))**(1/2)

print("Avarage value_2 = ", average_2)
print("Standard deviation_2 = ", std_2)




<h1> Derivatives </h1> <a class="anchor" id="derivat"></a>

The [Derivative](https://en.wikipedia.org/wiki/Derivative) of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). There is simple [visualization](https://www.geogebra.org/m/BDYnGhbt). 
For calculate derivative for arbitrari function we need to know [table of derivatives](http://www.math.com/tables/derivatives/tableof.htm)

For example, 
\begin{align}
f(x) = x^{3}-2x
\end{align}
function with it derivative 
\begin{align}
{f}'(x) = 3x^{2}-2
\end{align}
are [visualized](https://www.geogebra.org/m/MeMdCUEm) 

So, **derivative** is a function. **Slope** is a particular value of the derivative at the some point.

For one variable function gradient equivalent to derivative.

<h1> Function optimization basics </h1> <a class="anchor" id="optim"></a>

Given a particular function, we are often interested in determining the largest and smallest values of the function. This points called critical points. Critical points can be found as zeros of the function derivative.
\begin{align}
{f}'(x) = 0
\end{align}


<h1> Linear Regression theory </h1> <a class="anchor" id="lr"></a>

Suppose we have a table with some values. Like below

Table 1.
![image.png](attachment:image.png)

In the regression we try to restore relation between X and Y as analytical function

Equation (1)
![image.png](attachment:image.png) 

Error of prediction is as follow

Equation (2)
![image.png](attachment:image.png)

Simplest case is [linear regression](https://towardsdatascience.com/supervised-machine-learning-using-linear-regression-part1-ed59d0136be8) between one independent value (X) and one dependent value (Y) 

Main equation in linear regression is 

Equation (3)
![image.png](attachment:image.png)

Another words we try to minimize errors. One of the common methods for this problem is the [Least squares](https://en.wikipedia.org/wiki/Least_squares) method

So, we substitute (2) to (3) and try to solve optimization problem. We get follow

Equation (4)
![image.png](attachment:image.png)

In the equation (4) values
\begin{align}
x_{i}, y_{i}
\end{align}
are constants. And parameters *a* and *b* are unknown.  

So, we need to evaluate derivative of equation (4) and equate it to zero as follow

Equation (5)
![image.png](attachment:image.png)

Equation (6)
![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Consider a numerical example

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Lets to reconstract analytical relation between *x* and *y*.
Equation (4) will be as follow

\begin{align}
E = \sum_{i=1}^{5}(a+bx_{i}-y_{i})^2\rightarrow min
\end{align}

\begin{align}
E = (a+b*1-3)^2+(a+b*2-5)^2+(a+b*3-7)^2+(a+b*4-9)^2+(a+b*5-11)^2\rightarrow min 
\end{align}

Equation (6) becomes as follows

\begin{align}
(a+b*1-3)+(a+b*2-5)+(a+b*3-7)+(a+b*4-9)+(a+b*5-11)=0 
\end{align}

\begin{align}
(a+b*1-3)*1+(a+b*2-5)*2+(a+b*3-7)*3+(a+b*4-9)*4+(a+b*5-11)*5=0 
\end{align}

<h1> Advertising Dataset </h1> <a class="anchor" id="advert"></a>

In [None]:
import matplotlib.pyplot as plt 
import seaborn as sns
advertising = pd.read_csv("../input/advertising-dataset/advertising.csv")

advertising.head()



The **rstrip()** method removes any trailing characters (characters at the end a string), space is the default trailing character to remove.


In [None]:
with open('../input/advertising-dataset/advertising.csv') as f:
    alist = [line.rstrip() for line in f]

In [None]:
print(alist)

In [None]:
type(alist)

In [None]:
sns.pairplot(advertising, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales', height=4, aspect=1, kind='scatter')
plt.show()

In [None]:
x_list = list(advertising['TV'])
y_list = list(advertising['Sales'])
print(x_list)
print(y_list)


<h1> Linear regression and coeff </h1> <a class="anchor" id="lr_coef"></a>

\begin{align}
\hat{\beta_{1}} = \frac{\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}} 
\end{align}

\begin{align}
\hat{\beta_{0}} = \bar{y}-\hat{\beta_{1}}\bar{x} 
\end{align}

Lets assume, we have X = {1,2,3,4,5} and Y = {2,4,6,8,10} then




\begin{align}
Y = \hat{\beta_{0}} + \hat{\beta_{1}}*X
\end{align}

\begin{align}
2 = \hat{\beta_{0}} + \hat{\beta_{1}}*1
\end{align}

\begin{align}
4 = \hat{\beta_{0}} + \hat{\beta_{1}}*2
\end{align}

\begin{align}
\bar{x} = 3,
\bar{y} = 5
\end{align}

\begin{align}
\hat{\beta_{1}} = \frac{(x_{1}-\bar{x})(y_{1}-\bar{y})+(x_{2}-\bar{x})(y_{2}-\bar{y})+(x_{3}-\bar{x})(y_{3}-\bar{y})+(x_{4}-\bar{x})(y_{4}-\bar{y})+(x_{5}-\bar{x})(y_{5}-\bar{y})}{(x_{1}-\bar{x})^{2}+(x_{2}-\bar{x})^{2}+(x_{3}-\bar{x})^{2}+(x_{4}-\bar{x})^{2}+(x_{5}-\bar{x})^{2}} =
\frac{(1-3)(2-6)+(2-3)(4-6)+(3-3)(6-6)+(4-3)(8-6)+(5-3)(10-6)}{(1-3)^{2}+(2-3)^{2}+(3-3)^{2}+(4-3)^{2}+(5-3)^{2}} = 2
\end{align}

In [None]:
x_mean = sum(x_list)/len(x_list)
y_mean = sum(y_list)/len(y_list)
beta_1 = sum(list(map(lambda x,y: (x-x_mean)*(y-y_mean), x_list,y_list))) / sum(list(map(lambda x: (x-x_mean)**2, x_list))) 
beta_0 = y_mean - beta_1*x_mean

print('beta_1', beta_1)
print('beta_0', beta_0)

In [None]:
from sklearn.linear_model import LinearRegression

x_toy = [1,2,3,4,5]
y_toy = [2,4,6,8,10]

lr = LinearRegression()
x_list1 = np.transpose(np.atleast_2d(x_toy))
lr.fit(x_list1,y_toy)
print('beta_1',lr.coef_)
print('beta_0',lr.intercept_)
