# "Regression Analysis"
> "전북대학교 통계학 입문"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Kim Jeewoo
- categories: [Introduction to Statistics]
- image: images/20211130linear.png

In [55]:
library(tidyverse)

# 기울기와 상관관계 추론

## Example 5.2

In [14]:
file_url =  "http://www.lock5stat.com/datasets2e/InkjetPrinters.csv"

In [17]:
inkjet = read.csv(file_url)
inkjet

Model,PPM,PhotoTime,Price,CostBW,CostColor
<chr>,<dbl>,<int>,<int>,<dbl>,<dbl>
HP Photosmart Pro 8500A e-All-in-One,3.9,67,300,1.6,7.2
Canon Pixma MX882,2.9,63,199,5.2,13.4
Lexmark Impact S305,2.7,43,79,6.9,9.0
Lexmark Interpret S405,2.9,42,129,4.9,13.9
Epson Workforce 520,2.4,170,70,4.9,14.4
Brother MFC-J6910DW,4.1,143,348,1.7,7.9
HP Officejet 7500A Wide Format e-All-in-One,3.4,66,299,2.7,9.1
Canon Pixma iX7000 Inkjet Business Printer,2.8,66,248,4.1,9.8
Kodak ESP Office 2170 All-in-One Printer,3.0,42,150,3.7,11.3
HP Photosmart Plus e-All-in-One,3.2,77,150,4.2,11.4


In [19]:
fit = lm(Price ~ PPM, data = inkjet)
summary(fit)


Call:
lm(formula = Price ~ PPM, data = inkjet)

Residuals:
   Min     1Q Median     3Q    Max 
-79.38 -51.40  -3.49  43.85  87.76 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -94.22      56.40  -1.671 0.112086    
PPM            90.88      19.49   4.663 0.000193 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 58.55 on 18 degrees of freedom
Multiple R-squared:  0.5471,	Adjusted R-squared:  0.522 
F-statistic: 21.75 on 1 and 18 DF,  p-value: 0.0001934


- $b_1 = 90.88, SE = 19.49$

In [25]:
t_df = count(inkjet) %>% as.numeric - 2
t_df

자유도가 18인 t-분포에서 $t^*=2.10$이므로 95% 신뢰구간은

$b_1 \pm t^* \centerdot SE$

In [32]:
90.88 - 2.10 * 19.49
90.88 + 2.10 * 19.49

이 데이터로부터 기울기(인쇄 속도가 분당 1페이지 더 빨라질 때 프린터 가격 상승)는 **49.95에서 131.81 사이라고 95% 확신한다.**

## Example 5.3

In [33]:
file_url = "http://www.lock5stat.com/datasets2e/RestaurantTips.csv"

In [35]:
RestaurantTips = read.csv(file_url)
head(RestaurantTips)

Unnamed: 0_level_0,Bill,Tip,Credit,Guests,Day,Server,PctTip
Unnamed: 0_level_1,<dbl>,<dbl>,<chr>,<int>,<chr>,<chr>,<dbl>
1,23.7,10.0,n,2,f,A,42.2
2,36.11,7.0,n,3,f,B,19.4
3,31.99,5.01,y,2,f,A,15.7
4,17.39,3.61,y,2,f,B,20.8
5,15.41,3.0,n,2,f,B,19.5
6,18.62,2.5,n,2,f,A,13.4


In [37]:
fit = lm(Tip ~ Bill, data= RestaurantTips)
summary(fit)


Call:
lm(formula = Tip ~ Bill, data = RestaurantTips)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3911 -0.4891 -0.1108  0.2839  5.9738 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.292267   0.166160  -1.759   0.0806 .  
Bill         0.182215   0.006451  28.247   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9795 on 155 degrees of freedom
Multiple R-squared:  0.8373,	Adjusted R-squared:  0.8363 
F-statistic: 797.9 on 1 and 155 DF,  p-value: < 2.2e-16


In [48]:
b1 = 0.182
SE = 0.006451
n = count(RestaurantTips) %>% as.numeric
df = n - 2
df

$t^* = 1.655$

90% 신뢰구간은 다음과 같다.

In [51]:
b1-1.655*SE
b1+1.655*SE

## Example 5.4

In [54]:
fit = lm(PctTip ~ Bill, data = RestaurantTips)
summary(fit)


Call:
lm(formula = PctTip ~ Bill, data = RestaurantTips)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.9927 -2.3096 -0.6455  1.4679 25.5335 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15.50965    0.73956   20.97   <2e-16 ***
Bill         0.04881    0.02871    1.70   0.0911 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.36 on 155 degrees of freedom
Multiple R-squared:  0.01831,	Adjusted R-squared:  0.01197 
F-statistic:  2.89 on 1 and 155 DF,  p-value: 0.09112


# 상관관계 t-검증

## Example 5.5

In [56]:
r = -0.636