### 와인 도수 예측하기
- [Intro] 와인에 대해 잘 아시나요? 화이트 와인의 경우 12도 ~ 13.5도, 레드 와인의 경우 12 ~ 15도 사이의 도수를 갖고 있으며 고급 와인일 수록 도수가 높은 편이라고 합니다.
- 싸이킷런의 와인 데이터셋을 사용해 선형회귀분석으로 와인의 알코올 도수를 예측해 봅시다.
- 요구조건1. 와인 데이터셋의 상세 설명을 print() 하세요.
- 요구조건2. 와인 데이터를 데이터프레임으로 만들고 변수 y에 alcohol 칼럼을, X에 나머지 칼럼을 모두 저장해주세요
- 요구조건3. X_0 변수에 상수항을 추가한 X 데이터를 저장해주세요.
- 요구조건4. X_0과 y를 활용해 선형 회귀 모델을 만들고 summary를 print 하세요
- OLS을 사용하세요

In [1]:
from sklearn.datasets import load_wine
from statsmodels.regression.linear_model import OLS

import pandas as pd
import statsmodels.api as sm

In [15]:
wine = load_wine()
print(wine.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

In [19]:
wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)

In [20]:
wine_df.tail()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.7,0.64,1.74,740.0
174,13.4,3.91,2.48,23.0,102.0,1.8,0.75,0.43,1.41,7.3,0.7,1.56,750.0
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.2,0.59,1.56,835.0
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.3,0.6,1.62,840.0
177,14.13,4.1,2.74,24.5,96.0,2.05,0.76,0.56,1.35,9.2,0.61,1.6,560.0


In [26]:
y = wine_df['alcohol']
X = wine_df[wine.feature_names[1:]]
X.tail()

Unnamed: 0,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
173,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.7,0.64,1.74,740.0
174,3.91,2.48,23.0,102.0,1.8,0.75,0.43,1.41,7.3,0.7,1.56,750.0
175,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.2,0.59,1.56,835.0
176,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.3,0.6,1.62,840.0
177,4.1,2.74,24.5,96.0,2.05,0.76,0.56,1.35,9.2,0.61,1.6,560.0


In [29]:
X0 = sm.add_constant(X)
X0.tail()

Unnamed: 0,const,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
173,1.0,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.7,0.64,1.74,740.0
174,1.0,3.91,2.48,23.0,102.0,1.8,0.75,0.43,1.41,7.3,0.7,1.56,750.0
175,1.0,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.2,0.59,1.56,835.0
176,1.0,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.3,0.6,1.62,840.0
177,1.0,4.1,2.74,24.5,96.0,2.05,0.76,0.56,1.35,9.2,0.61,1.6,560.0


In [30]:
model_wine = OLS(y, X0)
result_wine = model_wine.fit()
print(result_wine.summary())

                            OLS Regression Results                            
Dep. Variable:                alcohol   R-squared:                       0.594
Model:                            OLS   Adj. R-squared:                  0.564
Method:                 Least Squares   F-statistic:                     20.08
Date:                Thu, 18 Feb 2021   Prob (F-statistic):           1.61e-26
Time:                        22:47:52   Log-Likelihood:                -134.83
No. Observations:                 178   AIC:                             295.7
Df Residuals:                     165   BIC:                             337.0
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                                   coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------
const           