https://stattrek.com/regression/regression-example.aspx

# Orderly Least Square (OLS) method

In [73]:
import pandas as pd
import numpy as np

def fit(x, y):
    coef = 0
    intercept = 0
    df=pd.DataFrame({'x':x, 'y':y})
    
    x_mean = np.mean(df.x)
    y_mean = np.mean(df.y)

    df['x_dev'] = df.x - x_mean
    df['y_dev'] = df.y - y_mean

    df['xy'] = (df.x_dev * df.y_dev)

    df['x_dev_sq'] = df.x_dev ** 2
    df['y_dev_sq'] = df.y_dev ** 2

    b1=np.sum(df.xy)/ np.sum(df.x_dev_sq)
    b0 = np.mean(df.y) - (b1 * np.mean(df.x))
    
    coef = b1
    intercept = b0
    return coef, intercept
    

In [None]:
def predict(x, beta):
    li = []
    for i in x:
        li.append(beta[1]+(beta[0]*i))
    
    return li

In [98]:
# x = [95,85,80,70,60]
# y = [85,95,70,65,70]

x = [43,44,45,46,47]
y= [41,45,49,47,44]

In [99]:
beta=fit(x,y)

In [105]:
beta

(0.8, 9.200000000000003)

In [106]:
predict(x, beta)

[43.6, 44.400000000000006, 45.2, 46.00000000000001, 46.800000000000004]

# Linear Regression with Orderly least square

In [66]:
from sklearn.linear_model import LinearRegression

In [67]:
lr = LinearRegression()

In [69]:
lr.fit(df[['x1', 'x2']], df[['y']])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [70]:
lr.intercept_

array([1.5])

In [71]:
lr.coef_

array([[-7.02166694e-17,  5.00000000e-01]])

In [101]:
lr.score(X = df[['x1','x2']], y=df['y'])

1.0

# SGD Regressor

In [64]:
x1 = [-1,-2, 1, 2]
x2 = [-1, -1 ,1,1]
y = [1, 1, 2, 2]

In [65]:
df=pd.DataFrame({'x1':x1, 'x2':x2, 'y':y})

In [72]:
from sklearn.linear_model import SGDRegressor

In [74]:
sgd=SGDRegressor(max_iter=1000, tol=1e-3)

In [77]:
sgd.fit(X = df[['x1','x2']], y=df['y'])

SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1,
             eta0=0.01, fit_intercept=True, l1_ratio=0.15,
             learning_rate='invscaling', loss='squared_loss', max_iter=1000,
             n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=None,
             shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0,
             warm_start=False)

In [78]:
sgd.intercept_

array([1.17191275])

In [79]:
sgd.coef_

array([0.19983307, 0.17383926])

In [89]:
import numpy as np

In [93]:
np.ndarray(df.y)

numpy.ndarray

In [100]:
sgd.score(X = df[['x1','x2']], y=df['y'])

0.5267115713561876

# Ordinary Least Squares Regression

- Ordinary least squares (OLS) regression is a statistical method of analysis that estimates the relationship between one or more independent variables and a dependent variable.
- The method estimates the relationship by minimizing the sum of the squares in the difference between the observed and predicted values of the dependent variable configured as a straight line.
- OLS regression will be discussed in the context of a bivariate model, that is, a model in which there is only one independent variable ( X ) predicting a dependent variable ( Y ). However, the logic of OLS regression is easily extended to the multivariate model in which there are two or more independent variables.

# Real Time

- Social scientists are often concerned with questions about the relationship between two variables. 
- These include the following: Among women, is there a **relationship between education and fertility?** Do more-educated women have fewer children, and less-educated women have more children? 

<img src="style/formula.png" width ="500" height=500 >

### intercept

- **Intercept:** where the straight line intersects the Y -axis (the vertical axis);
- **Slope:** b is the slope and indicates the degree of steepness of the straight line; 
- **Error:** e represents the error.

- The intercept, or a, indicates the point where the regression line “intercepts” the Y -axis. It tells the average value of Y when X = 0. 

# Analysis

<img src="style/correlation.jpg" width ="500" height=500 >

# Squared Eror of Regression Line