- Ev fiyatları veri setini kullanarak, evlerin fiyatlarını tahmin ediniz.
- Verileri araştırın ve konut fiyatlarının tahmininde faydalı olacağını düşündüğünüz bazı değişkenleri bulun.
- Bu özellikleri kullanarak ilk modelinizi oluşturun ve OLS kullanarak parametreleri tahmin edin.

In [10]:
%matplotlib inline
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
sns.set()

In [11]:
df = pd.read_csv("house.csv")
df_cat = df[df.select_dtypes(include = ["object"]).columns]
df_num = df[df.select_dtypes(exclude = ["object"]).columns]

In [12]:
#verilerin temizlenmesi
total = df.isnull().sum().sort_values(ascending=False)
percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)
missing = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing.head(20)

df = df.drop(df.loc[df['Electrical'].isnull()].index)
df = df.drop((missing[missing['Total'] > 0]).index,1)

In [13]:
#kategorik verinin iki değişkene indirgenmesi 
df_cat = pd.get_dummies(df_cat)
df_all = pd.concat([df_cat,df_num],axis=1)

In [14]:
#kolerasyon
(df_all.corr().SalePrice[df_all.corr().SalePrice > 0.4]).sort_values(ascending=False)

Y = df_all['SalePrice']
X = df_all[['OverallQual', 'GrLivArea','GarageCars','GarageArea','TotalBsmtSF','1stFlrSF','FullBath','TotRmsAbvGrd','YearBuilt','YearRemodAdd','KitchenQual_Ex','Foundation_PConc','Fireplaces','ExterQual_Gd','ExterQual_Ex','HeatingQC_Ex','Neighborhood_NridgHt']]

In [15]:
#scikit-learn kütüphanesi üzerinde modelin incelenmesi
model = LinearRegression().fit(X, Y)
prediction = model.predict(X)
print('Değişkenler:', model.coef_)
print('Sabit Değer:', model.intercept_)

Değişkenler: [ 1.29309173e+04  4.92352201e+01  1.05960152e+04  1.02169229e+01
  1.54199301e+01  7.13074657e+00 -2.93398619e+03 -9.33819464e+02
  2.41261067e+02  2.17177533e+02  3.41672570e+04  9.30015188e+02
  9.95790319e+03  7.09792609e+03  3.67109343e+04  4.80940434e+03
  1.86438895e+04]
Sabit Değer: -932656.43248767


In [16]:
#statsmodels kütüphanesi üzerinde modelin incelenmesi
X = sm.add_constant(X)
results = sm.OLS(Y, X).fit()
results.summary()

0,1,2,3
Dep. Variable:,SalePrice,R-squared:,0.805
Model:,OLS,Adj. R-squared:,0.803
Method:,Least Squares,F-statistic:,351.2
Date:,"Tue, 31 Mar 2020",Prob (F-statistic):,0.0
Time:,23:19:26,Log-Likelihood:,-17349.0
No. Observations:,1460,AIC:,34730.0
Df Residuals:,1442,BIC:,34830.0
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-9.327e+05,1.41e+05,-6.626,0.000,-1.21e+06,-6.57e+05
OverallQual,1.293e+04,1226.027,10.547,0.000,1.05e+04,1.53e+04
GrLivArea,49.2352,4.098,12.014,0.000,41.196,57.274
GarageCars,1.06e+04,2858.719,3.707,0.000,4988.323,1.62e+04
GarageArea,10.2169,9.698,1.053,0.292,-8.807,29.241
TotalBsmtSF,15.4199,4.028,3.829,0.000,7.520,23.320
1stFlrSF,7.1307,4.698,1.518,0.129,-2.084,16.346
FullBath,-2933.9862,2536.426,-1.157,0.248,-7909.467,2041.494
TotRmsAbvGrd,-933.8195,1056.411,-0.884,0.377,-3006.086,1138.447

0,1,2,3
Omnibus:,658.066,Durbin-Watson:,2.01
Prob(Omnibus):,0.0,Jarque-Bera (JB):,94966.443
Skew:,-1.034,Prob(JB):,0.0
Kurtosis:,42.457,Cond. No.,550000.0
