# 단순 선형 회귀 분석
주어진 키와 몸무게 데이터로 회귀모델을 구축하고 각 소문제의 값을 구하시오
- 키 : 종속변수
- 몸무게 : 독립변수

In [1]:
import pandas as pd

df = pd.DataFrame({
    '키': [150, 160, 170, 175, 165, 155, 172, 168, 174, 158,
          162, 173, 156, 159, 167, 163, 171, 169, 176, 161],
    '몸무게': [74, 50, 70, 64, 56, 48, 68, 60, 65, 52,
            54, 67, 49, 51, 58, 55, 69, 61, 66, 53]
})

In [3]:
# 모델 학습 summary 출력
from statsmodels.formula.api import ols
model = ols('키 ~ 몸무게',data=df).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                      키   R-squared:                       0.280
Model:                            OLS   Adj. R-squared:                  0.240
Method:                 Least Squares   F-statistic:                     6.984
Date:                Sat, 11 May 2024   Prob (F-statistic):             0.0165
Time:                        19:00:25   Log-Likelihood:                -64.701
No. Observations:                  20   AIC:                             133.4
Df Residuals:                      18   BIC:                             135.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    135.8209     11.211     12.115      0.0

In [4]:
# 결정계수 R-squared
# 0.28
model.rsquared

0.27954323113299495

In [6]:
# 기울기(회귀계수)
# 0.4938
model.params['몸무게']

0.49376558603491116

In [7]:
# 절편(회귀계수)
# 135.8209
model.params['Intercept']

135.82094763092283

In [8]:
# 몸무게의 회귀계수가 통계적으로 유의핝지 pvalue
# 0.017
model.pvalues['몸무게']

0.0165401344531702

In [10]:
# 몸무게가 50 일때 예측키
newdata = pd.DataFrame({'몸무게': [50]})
model.predict(newdata)

0    160.509227
dtype: float64

In [13]:
# 잔차 제곱합
# 잔차 = 관측(실제)값 - 예측값
# df['키'] - model.predict()
df['잔차'] = df['키'] - model.predict(df['몸무게'])
sum(df['잔차'] **2)

755.9032418952617

In [14]:
df['잔차']

0    -22.359601
1     -0.509227
2     -0.384539
3      7.578055
4      1.528180
5     -4.521696
6      2.602993
7      2.553117
8      6.084289
9     -3.496758
10    -0.484289
11     4.096758
12    -4.015461
13    -2.002993
14     2.540648
15     0.021945
16     1.109227
17     3.059352
18     7.590524
19    -0.990524
Name: 잔차, dtype: float64

In [15]:
# MSE mean squared error
(df['잔차'] **2).mean()


37.795162094763086

In [17]:
# 사이킷런 MSE
from sklearn.metrics import mean_squared_error
pred = model.predict(df)
mean_squared_error(df['키'],pred)

37.795162094763086

In [None]:
# 신뢰구간
# 0.101 0.886


In [18]:
# 몸무게가 50일 때 예측키에 대한 신뢰구간, 예측구간
newdata = pd.DataFrame({'몸무게':[50]})
pred = model.get_prediction(newdata)
pred.summary_frame(alpha=0.05)

Unnamed: 0,mean,mean_se,mean_ci_lower,mean_ci_upper,obs_ci_lower,obs_ci_upper
0,160.509227,2.291332,155.695318,165.323136,146.068566,174.949888


In [None]:
# 신뢰구간 : 155.695318	165.323136
# 예측구간 : 146.068566	174.949888