# Regression

### 2 Variable Regression:
- How a “response” variable y changes as “predictor/explanatory” variable x changes
- y = mx + b

### Multiple Regression:
- How a “response” variable y changes as “predictor/explanatory” variables x1-xn change’
- y = m1x1 + m2x2 + b

### Polynomial Regression:
- y = b + m1x
- y = b + m1x1 + m2(x)^2 …

### Regression Strategy Ordinary Least Squares (OLS)
- The least-squares regression line of y and x is the line that makes the sum of the squares of the verticle distance of the data points from the line as small as possible
- From the line draw a line to the point:
    - make a square
    - calculate the area
    - add up all the area of all the squares

### Solution for Regression Line

#### Approach #1: Closed Form Solution
- Method 1 - Compute Gradient:
    - Vector of Partial Derivatives
    - Set gradient to zero
    - Compute slope and intercept
- Method 2 - Matrix Approach

#### Approach #2
- Gradient Decent Algorithm
- Gradually change slope and intercept till we reach at the optimum solution

### Approach #1, Method 1:
- Slope = (Sum of X*Y) – (1/N)*( (Sum of X)*(Sum of Y) ) / (Sum of X^2) – (1/N)*( (Sum of X)*(Sum of X) )
- Intercept = (Mean of Y) – slope * (Mean of X)

In [1]:
regressionEquation1 <- function(x,y){
  slope <- ( sum(x*y)-((1/length(x))*((sum(x)*sum(y)))) ) /
    ( sum(x**2)-((1/length(x))*((sum(x)*sum(x)))) )
  
  intercept <- (mean(y) - slope*(mean(x)))
  
  if (intercept < 0){
    paste0("y = ", slope, "x - ", abs(intercept))
  }
  else{
    paste0("y = ", slope, "x + ", intercept)
  }
}

In [2]:
x <- c(0,1,2,3,4)
y <- c(1,3,7,13,21)

regressionEquation1(x,y)

In [3]:
price <- c(49,69,89,99,109)
demand <- c(124,95,71,45,18)

regressionEquation1(price, demand)

### Approach #1, Method 2:
- Computing Slope and Intercept with Mean

In [4]:
regressionEquation2 <- function(x,y){
  slope <- (mean(x*y) - (mean(x)*mean(y))) / (mean(x**2) - (mean(x)*mean(x)))
  
  intercept <- (mean(y) - slope*(mean(x)))

  if (intercept < 0){
    paste0("y = ", slope, "x - ", abs(intercept))
  }
  else{
    paste0("y = ", slope, "x + ", intercept)
  }
}

In [5]:
x <- c(0,1,2,3,4)
y <- c(1,3,7,13,21)

regressionEquation2(x,y)

In [6]:
price <- c(49,69,89,99,109)
demand <- c(124,95,71,45,18)

regressionEquation2(price, demand)

### Approach #1, Method 3:
- Computing Slope and Intercept with Correlation

In [7]:
regressionEquation3 <- function(x,y){
  slope <- cor(x,y)*(sd(y)) / sd(x)
  
  intercept <- (mean(y) - slope*(mean(x)))
  
  if (intercept < 0){
    paste0("y = ", slope, "x - ", abs(intercept))
  }
  else{
    paste0("y = ", slope, "x + ", intercept)
  }
}

In [8]:
x <- c(0,1,2,3,4)
y <- c(1,3,7,13,21)

regressionEquation3(x,y)

In [9]:
price <- c(49,69,89,99,109)
demand <- c(124,95,71,45,18)

regressionEquation3(price, demand)

### Using built in lm() Function

In [10]:
x <- c(0,1,2,3,4)
y <- c(1,3,7,13,21)

lm(y~x)


Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
         -1            5  


In [11]:
summary(lm(y~x))


Call:
lm(formula = y ~ x)

Residuals:
 1  2  3  4  5 
 2 -1 -2 -1  2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -1.0000     1.6733  -0.598  0.59220   
x             5.0000     0.6831   7.319  0.00527 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.16 on 3 degrees of freedom
Multiple R-squared:  0.947,	Adjusted R-squared:  0.9293 
F-statistic: 53.57 on 1 and 3 DF,  p-value: 0.005268


In [12]:
price <- c(49,69,89,99,109)
demand <- c(124,95,71,45,18)

lm(demand~price)


Call:
lm(formula = demand ~ price)

Coefficients:
(Intercept)        price  
    211.271       -1.695  
