## Simple Linear Regression Models
”There are three kinds of lies: lies, damned lies and statistics.”
— Mark Twain

## Simple linear regression models
    Response Variable: Estimated variable
    Predictor Variables: Variables used to predict the response
    Also called predictors or factors
    Regression Model: Predict a response for a given set of predictor variables
    Linear Regression Models: Response is a linear function of predictors
    Simple Linear Regression Models: Only one predictor
    
## Outline
* Definition of a Good Model
* **Estimation of Model parameters**
* Allocation of Variation
* Standard deviation of Errors
* Confidence Intervals for Regression Parameters
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 2 Definition of a good regression models?
![](./images/1.png)

Regression models attempt to fit lines (or curves) to the observation points
(data) that minimize the vertical distance between the observation point
and the model line (or curve). The length of this distance is called residual,
modeling error, or simply error. The negative and positive errors should cancel
out => Zero overall error
It is obvious that many lines will satisfy this criterion.

## 2.1 Linear Regression Model:
Given n observation pairs {(x_1, y_1 ),..., (x_n, y_n)}, the estimated response for
the i-th observation is
\hat y_i = b_0 + b_1*x_i where the regression parameters b_0 and b_1 are chosen that
minimizes the sum of squares of the errors at the given data (observations).

Formally, the model has the form:
\hat y = b_0 + b_1*x where, \hat y is the predicted response when the predictor variable is x.
The error is:

e_i = y_i - \hat y_i and \sum_{i=1}^n (e_i)^2 = \sum_{i=1}^n (y_i - b_0 - b_1*x_i)^2 

The best linear model minimizes the sum of squared errors (SSE(, subject to the constraint that ther overall mean error is zero:

\sum_{i=1}^n (e_i) = \sum_{i=1}^n (y_i - b_0 - b_1*x_i) = 0

## Linear Regressional Model - the statistical view
Regression analysis is the art and science of fitting straight lines to patterns
of data. In a linear regression model, the variable of interest (the so-called
“dependent” variable) is predicted from
other variable(s) (the so-called “independent” variable(s)) using a linear
equation. 

If Y denotes the dependent variable, and X 1 , X 2 ,...,X  , are the
independent variables, then the assumption is that the value of Y_i in the population is determined by the linear equation
Y_i = β_0 + β_1*X_i1 + β_2*X_i2 + ... + β_κ*X_iκ + ε_i where the betas are constants and
the epsilons are independent and
identically distributed (i.i.d.) normal random variables with mean zero (the
“noise” in the system). 

β_0 is the so-called intercept of the model—the expected
value of Y when all the X’s are zero and i is the coefficient (multiplier) of the variable X_i. **The betas to-
gether with the mean and standard deviation of the epsilons are the
parameters of the model.**

The corresponding equation for predicting Y i from the corresponding values
of the X’s is therefore where the b’s are estimates of the betas obtained by
least-squares, i.e., minimizing the square
prediction error within the sample. 	__*Multiple regression allows more than
one x variables.*__

**Assumptions**
The error terms ε_i are mutually independent and identically distributed,
with mean = 0 and constant variances E[ε_i] = 0 V [ε_i ] = σ^2
This is so, because the observations Y_1 , Y_2 , ..., Y_κ are a random sample,
they are mutually independent and hence the error terms are also mutually
independent.

The distribution of the error term is independent of the joint distribution of
X_1 , X_2 ,...,X_κ . The unknown parameters β_0 , β_1 , β_2 , ..., β_κ are constants.

## 2.2.1 Summary of multiple linear regression model
**Independent variables:** Χ_1, Χ_2, ...., Χ_n
**Data:** {(y_1 , x_11 , x_21 , ..., x_k1 ), .., (y_n , , x_n1 , x_2n , ..., x_kn )}
**Population Model:** Y_i = β_0 + β_1*X_i1 + β_2*X_i2 + ... + β_κ*X_iκ + ε_i where ε_i are i.i.d. random variables following the normal disribution N (0, σ)
**Regression coefficients:** b_0,b_1,....,b_k are estimates of β_0,β_1,....,β_k
**Regression Estimates of Y_i: \hat y_i = b_0 + b_1*x_i1 + b_2*x_i2 + ... + b_k*x_iκ**
**Goal:** Choose b_0,b_1...,b_k to minimize the residual sum of squares \sum_{i=1}^n e^2 = \sum_{i=1}^n (y_i -\hat y_i)^2

## 2.2.2 Summary of single variable linear regression model
Assuming that the data is a subset of a population then the linear regression
model can be described as follows:
**Data**: {(x_1 , y_1 ), . . . , (x_n , y_n )}

**Model of the population**:y_i = β_0 + β_1*x_i i + ε_i

where ε_1 ,ε_2 , ..., ε_n are independent and identically distributed (i.i.d.) ran-
dom variables, with normal distribution N(0,σ)
This is the true relation between y and x that depends on the estimation of
the unknows β_0 and β_1 based on a sample (data) of the population.

Comments:
E(y_i | x_i ) = β_0 + β_1*x_i
SD(y_i|x_i) = σ
Relationship is linear - described by a "line"
β_0 = "baseline" value of (i.e., value of y if x is 0)
β_1 = "slope" of line (average change in y per unit change in x)

**Prediction regression model:**
\hat y_i = b_0 + b_1*x_i
where the b’s are estimates of the betas obtained by least-squares, i.e., min-
imizing the square prediction error within the sample.

![](./images/2.png)

## Outline
* Definition of a Good Model
* **Estimation of Model parameters**
* Allocation of Variation
* Standard deviation of Errors
* Confidence Intervals for Regression Parameters
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 3 Estimation of model parameters 
![](./images/3.png)

## Example 1
The number of disk I/O's and processor time of seven programs were measured as
![](./images/4.png)
![](./images/5.png)
**Error Computation**
![](./images/6.png)

## Outline
* Definition of a Good Model
* Estimation of Model parameters
* **Allocation of Variation**
* Standard deviation of Errors
* Confidence Intervals for Regression Parameters
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 4 Allocation of variation
**Error variance from the sample mean = Variance of the response from
the mean value of the observation**
Error = ε_i = Observed Response - Predicted Response from the mean value
= y_i - \bar y

Variance of Errors from the sample mean = \frac{1}{n} \sum_{i=1}^n (e_i)^2 = \frac{1}{n} (y_i - \bar y)^2 =
variance of y

Note that the standard error of the model is not the square root of the
average value of the squared
errors within the historical sample of data. Rather, the sum of squared errors
is divided by n - 1
rather than n under the square root sign because this adjusts for the fact
that a ”degree of freedom for error ε"
has been used up by estimating one model parameter (namely the mean)
from the sample of n data points.

The sum of squared errors from the sample mean SST = \sum_{i=1}^n (y_i - \bar y)^2 is
called total sum of squares.

It is a measure of y’s variability and is called variation of y. SST can be
computed as follows:
SST = \sum_{i=1}^n (y_i - \bar y)^2 = (\sum_{i=1}^n (y_i)^2 - n \bar y^2 = SSY - SS0
Where, SSY is the sum of squares of y and SS0 is the sum of squares of \bar y
and is equal to n\bar y^2

The difference between SST ans SSE is the sum of squares explained by the
regression.

It is called SSR: SSR = SST - SSE or SST = SSR + SSE

The fraction of the variation that is explained determines the goodness of
the regression and it is called the coeffiecient of tetermination, R^2 = SSR / SST = (SST - SSE) / SST = 1 - (SSE/SST)

The higher the value of R^2 the better the regression R^2 = 1 -> perfect fit
R^2 = 0 -> No fit
![](./images/7.png)

## Example 3
For the disk I/O-CPU time data: SSE = 5.87 and SST = 205.71 and SSR =
199.84 and R^2 = 0.9715
The linear regression explains 97% of CPU time’s variation.

## Outline
* Definition of a Good Model
* Estimation of Model parameters
* Allocation of Variation
* **Standard deviation of Errors**
* Confidence Intervals for Regression Parameters
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## Standard deviation of errors
Since errors are obtained after calculating two regression parameters from the
data, errors have n 2 degrees of freedom
SSE/(n-2) is called mean squared errors or (MSE)
S_{e}^2 = SSE/(n-2)
Standard deviation of errors = square root of MSE

Note:
SSY has ndegrees of freedom since it is obtained from n independent
observations without estimating any parameters
SS0 has just one degree of freedom since it can be computed simply
from y
SST has n 1 degrees of freedom, since one parameter must be
calculated from the data before SST can be computed
SSR, which is the di↵erence between SST and SSE, has the remain-
ing one degree of freedom.
Overall,
SST = SSY SS0 = SSR + SSE
n - 1 = n - 1 = 1 + (n-2)
Notice that the degrees of freedom add just the way the sums of squares do.

## Example
For the disk I/O-CPU data we have
or the disk I/O-CPU data we have
SS: SST(205.71) = SSy(828) - SS0 (622.29) = SSR (199.84) + SSE(5.87)
DF: SST(6) = SSy(7) - SS0 (1) = SSR (1) + SSE(5)
The mean squared error is:
MSE = SSE/DF for Errors = 5.87/5 = 1.174
The standard deviation of errors is:
s_e = sqrt(MSE) = sqrt(1.174) = 1.0835


## Outline
* Definition of a Good Model
* Estimation of Model parameters
* Allocation of Variation
* Standard deviation of Errors
* **Confidence Intervals (CI) for Regression Parameters**
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 6 Regression Statistics
![](./images/8.png) 

## 7 CIs for regression parameters
1. Regression coefficients b 0 and b 1 are estimates from a single random sample
of size n>=1.
2. Using another sample, the estimates may be different.

**Ιf β_0 and β_1 are true parameters of the population (i.e., y = b_0 +
b_1x), then the computed coefficients b_0 and b_1 are estimates of b_0 and
β_1 , respectively.**

Sample standard deviation of b_0 and b_1
![](./images/9.png) 

The 100(1-a)% confidence intervals for b 0 and b 1 can be computed using
t[1-a/2; n-2] ---- the 1 - a/2 quantile of a t variate with n-2 degrees of
freedom.
The confidence intervals are:
b_0 -+ ts*b_0
b_1 -+ ts*b_1
If a confidence interval includes zero, then the regression parameter cannot
be considered different from zero at the
100(1-a)% confidence level

## Example
![](./images/10.png) 

## Case study: remote procedure call
![](./images/11.png) 

## Outline
* Definition of a Good Model
* Estimation of Model parameters
* Allocation of Variation
* Standard deviation of Errors
* **Confidence Intervals (CI) for Regression Parameters**
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 8 CI for predications 
$$ \hat y_p = b_0 + b_1x_p $$
![](./images/12.png) 
![](./images/13.png) 


## Outline
* Definition of a Good Model
* Estimation of Model parameters
* Allocation of Variation
* Standard deviation of Errors
* Confidence Intervals (CI) for Regression Parameters
* Confidence Intervals for Predictions
* Visual Tests for verifying Regression Assumption

## 9 Visual test for regress assumptions
Regression assumptions:
The true relationship between the response variable y and the predictor
variable x is linear.
The predictor variable x is non-stochastic and it is measured without any
error.
The model errors are statistically independent.
The errors are normally distributed with zero mean and a constant standard
deviation.
![](./images/14.png) 
![](./images/15.png) 
![](./images/16.png) 
![](./images/17.png) 
![](./images/18.png)