# 重回帰分析　問題の解答

不動産に関するデータセットがあります

不動産には値段と床面積などに因果関係があることが一般的です

データは以下のファイルとして保存してあります|
'real_estate_price_size_year.csv'. 

ここで、重回帰分析のモデルを作成してみましょう

この問題では、従属変数がpriceで独立変数がsizeとyearなります

## ライブラリのインポート

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set() 

## データの読み込み

In [2]:
data = pd.read_csv('real_estate_price_size_year.csv')

In [3]:
data.head()

Unnamed: 0,price,size,year
0,234314.144,643.09,2015
1,228581.528,656.22,2009
2,281626.336,487.29,2018
3,401255.608,1504.75,2015
4,458674.256,1275.46,2009


In [4]:
data.describe()

Unnamed: 0,price,size,year
count,100.0,100.0,100.0
mean,292289.47016,853.0242,2012.6
std,77051.727525,297.941951,4.729021
min,154282.128,479.75,2006.0
25%,234280.148,643.33,2009.0
50%,280590.716,696.405,2015.0
75%,335723.696,1029.3225,2018.0
max,500681.128,1842.51,2018.0


## 回帰の作成

### 従属変数と独立変数の宣言

In [5]:
y = data['price']
x1 = data[['size','year']]

### 回帰

In [8]:
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.776
Model:,OLS,Adj. R-squared:,0.772
Method:,Least Squares,F-statistic:,168.5
Date:,"Thu, 11 Mar 2021",Prob (F-statistic):,2.7700000000000004e-32
Time:,22:22:41,Log-Likelihood:,-1191.7
No. Observations:,100,AIC:,2389.0
Df Residuals:,97,BIC:,2397.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.772e+06,1.58e+06,-3.647,0.000,-8.91e+06,-2.63e+06
size,227.7009,12.474,18.254,0.000,202.943,252.458
year,2916.7853,785.896,3.711,0.000,1357.000,4476.571

0,1,2,3
Omnibus:,10.083,Durbin-Watson:,2.25
Prob(Omnibus):,0.006,Jarque-Bera (JB):,3.678
Skew:,0.095,Prob(JB):,0.159
Kurtosis:,2.08,Cond. No.,941000.0


In [None]:
plt.scatter(x1, y)
yhat = 223.1787*x1 + 1.019e+5
plt.plot(x1, yhat, lw=4, c='orange', label='regression line')
plt.xlabel('size',fontsize=20)
plt.ylabel('price',fontsize=20)
plt.show()