## This is a statistical analysis on the cleaned data set of housing properties in the City of Burlington. The main objective is to explore relationships between features, and especially the relationships of CurrentValue/SalePrice with other features. 

In [1]:
import pandas as pd
import numpy as np

#### We will first convert the Grade column to numerics based on a reasonable rating system. 

In [2]:
df = pd.read_csv('Property_2018.12.4.csv')
df.Grade.unique()

array(['FAIR PLUS', 'AVERAGE', 'GOOD', 'AVERAGE PLUS', 'GOOD MINUS',
       'AVERAGEMINUS', 'GOOD PLUS', 'FAIR', 'FAIR MINUS', 'EXCLNT MINUS',
       'EXCELLENT', 'VRYGOODPLUS', 'VERY GOOD', 'VRYGOODMINUS',
       'EXCLT PLUS', 'POOR MINUS', 'CUSTOM MINUS', 'POOR', 'POOR PLUS',
       'CUSTOM'], dtype=object)

In [9]:
df.columns

Index(['FID', 'StreetNumber', 'StreetName', 'LandUse', 'CurrentAcres',
       'TotalGrossArea', 'FinishedArea', 'CurrentValue', 'CurrentLandValue',
       'CurrentYardItemsValue', 'CurrentBuildingValue', 'BuildingType',
       'HeatFuel', 'HeatType', 'Grade', 'YearBlt', 'SalePrice', 'NumofRooms',
       'NumofBedrooms', 'NumofUnits', 'ZoningCode', 'Foundation',
       'Depreciation', 'PropertyCenterPoint_x', 'PropertyCenterPoint_y'],
      dtype='object')

#### A reference for ratings can be found in https://appraisersforum.com/forums/threads/defining-fair-average-good.157245/.
#### Since CUSTOM has high average CurrentValue as seen in the Visualization code, it will be rated right below 'EXCLNT MINUS'

In [3]:
Grade_dict={'POOR MINUS':0, 'POOR':1, 'POOR PLUS':2, 'FAIR MINUS':3, 'FAIR':4, 'FAIR PLUS':5, 'AVERAGEMINUS':6,
           'AVERAGE':7,'AVERAGE PLUS':8, 'GOOD MINUS':9,'GOOD':10,'GOOD PLUS':11, 'VRYGOODMINUS':12,'VERY GOOD':13,
           'VRYGOODPLUS':14,'CUSTOM MINUS':15,'CUSTOM':16,'EXCLNT MINUS':17,'EXCELLENT':18,'EXCLT PLUS':19}
Grade_dict

{'POOR MINUS': 0,
 'POOR': 1,
 'POOR PLUS': 2,
 'FAIR MINUS': 3,
 'FAIR': 4,
 'FAIR PLUS': 5,
 'AVERAGEMINUS': 6,
 'AVERAGE': 7,
 'AVERAGE PLUS': 8,
 'GOOD MINUS': 9,
 'GOOD': 10,
 'GOOD PLUS': 11,
 'VRYGOODMINUS': 12,
 'VERY GOOD': 13,
 'VRYGOODPLUS': 14,
 'CUSTOM MINUS': 15,
 'CUSTOM': 16,
 'EXCLNT MINUS': 17,
 'EXCELLENT': 18,
 'EXCLT PLUS': 19}

In [4]:
df.Grade = df.Grade.map(Grade_dict)

In [5]:
df.to_csv('Property_2019_1_13.csv')

#### When calculating Corrlation matrix, the resulting matrix misses the feature 'StreetNumber'. This is due to the fact that the StreetNumber column intially contains data of object type. It will be converted to float type. Then the correlation matrix looks fine.

In [38]:
df.StreetNumber = pd.to_numeric(df.StreetNumber,'float')

In [39]:
df_features = df.drop(columns=['FID'])

#### We do not need the FID column for statistical analysis. df_features contains 24 columns excluding FID, the correlation matrix for df_features is 24 by 24.

In [40]:
df_features.corr().shape

(24, 24)

In [41]:
len(df_features.columns)

24

In [42]:
df_features.corr()

Unnamed: 0,StreetNumber,StreetName,LandUse,CurrentAcres,TotalGrossArea,FinishedArea,CurrentValue,CurrentLandValue,CurrentYardItemsValue,CurrentBuildingValue,...,YearBlt,SalePrice,NumofRooms,NumofBedrooms,NumofUnits,ZoningCode,Foundation,Depreciation,PropertyCenterPoint_x,PropertyCenterPoint_y
StreetNumber,1.0,0.618315,0.028728,0.051382,0.024646,0.023446,0.005545,0.006574,0.033257,0.002747,...,-0.001154,-0.002163,-0.005766,-0.006574,0.028176,0.079725,0.073228,0.055325,-0.036015,0.050908
StreetName,0.618315,1.0,-0.22605,-0.190254,-0.064949,-0.023519,-0.066724,-0.187422,-0.018919,0.043478,...,0.149794,-0.06464,-0.075995,-0.086926,0.012692,-0.230019,-0.215561,-0.147415,0.048879,-0.119226
LandUse,0.028728,-0.22605,1.0,0.335019,-0.065114,-0.136868,-0.027871,0.151014,0.067701,-0.153841,...,0.039381,-0.042184,-0.274425,-0.214037,-0.321328,0.506096,0.420221,0.085857,-0.240148,0.169863
CurrentAcres,0.051382,-0.190254,0.335019,1.0,0.383233,0.281656,0.416455,0.609727,0.272474,0.144911,...,-0.142166,0.284079,0.26439,0.249298,0.15164,0.294689,0.310166,0.181362,-0.115001,0.073909
TotalGrossArea,0.024646,-0.064949,-0.065114,0.383233,1.0,0.970654,0.852118,0.694539,0.311617,0.712064,...,-0.196885,0.825751,0.674898,0.653386,0.834319,0.033821,0.102979,0.081369,0.08156,-0.100479
FinishedArea,0.023446,-0.023519,-0.136868,0.281656,0.970654,1.0,0.851266,0.637366,0.273201,0.753937,...,-0.147517,0.842388,0.660258,0.637366,0.885147,-0.028326,0.048617,0.021483,0.094574,-0.098365
CurrentValue,0.005545,-0.066724,-0.027871,0.416455,0.852118,0.851266,1.0,0.763421,0.220616,0.875832,...,-0.091734,0.899271,0.55032,0.515244,0.704488,-0.025296,0.057184,-0.088107,0.138348,-0.20544
CurrentLandValue,0.006574,-0.187422,0.151014,0.609727,0.694539,0.637366,0.763421,1.0,0.191185,0.356987,...,-0.35229,0.716825,0.49264,0.474533,0.502617,0.125808,0.261677,0.252196,0.148931,-0.163977
CurrentYardItemsValue,0.033257,-0.018919,0.067701,0.272474,0.311617,0.273201,0.220616,0.191185,1.0,0.165487,...,-0.031255,0.127509,0.29716,0.279611,0.186994,0.088764,0.086237,0.037608,-0.055122,0.033783
CurrentBuildingValue,0.002747,0.043478,-0.153841,0.144911,0.712064,0.753937,0.875832,0.356987,0.165487,1.0,...,0.130454,0.765794,0.425879,0.388797,0.642933,-0.131536,-0.113477,-0.316251,0.089809,-0.175495


#### The following lists some highly correlated pairs with the abolute values of their correlations. It verifies the observation in visualization and preliminary data analysis that 
#### 1.SalePrice is highly related to CurrentValue;
#### 2.TotalGrossArea is highly related to FinishedArea. 
#### Some interesting findings are
#### 3.NumofRooms is highly related to NumofBedrooms. This could mean most of rooms are bedrooms in a large portion of housing properties.
#### 4.FinishedArea  is highly related to NumofUnits.

In [55]:
c = df_features.corr().abs()
s = c.unstack()
so = s.sort_values(kind="quicksort",ascending=False).to_frame()
so.iloc[25:50,:]

Unnamed: 0,Unnamed: 1,0
TotalGrossArea,FinishedArea,0.970654
NumofRooms,NumofBedrooms,0.909316
NumofBedrooms,NumofRooms,0.909316
CurrentValue,SalePrice,0.899271
SalePrice,CurrentValue,0.899271
FinishedArea,NumofUnits,0.885147
NumofUnits,FinishedArea,0.885147
CurrentBuildingValue,CurrentValue,0.875832
CurrentValue,CurrentBuildingValue,0.875832
CurrentValue,TotalGrossArea,0.852118


In [56]:
so.iloc[51:75,:]

Unnamed: 0,Unnamed: 1,0
PropertyCenterPoint_x,PropertyCenterPoint_y,0.732928
SalePrice,CurrentLandValue,0.716825
CurrentLandValue,SalePrice,0.716825
CurrentBuildingValue,TotalGrossArea,0.712064
TotalGrossArea,CurrentBuildingValue,0.712064
NumofUnits,CurrentValue,0.704488
CurrentValue,NumofUnits,0.704488
SalePrice,NumofUnits,0.697539
NumofUnits,SalePrice,0.697539
TotalGrossArea,CurrentLandValue,0.694539


#### The followings show the highly correlated variables to CurrentValue and SalePrice. The list serves as a strong recommendation for feature selection in machine learning. The top features which have strong correlations with CurrentValue are unsurprisingly:CurrentBuildingValue, TotalGrossArea, FinishedArea, CurrentLandValue, NumofUnits, NumofRooms, NumofBedrooms, CurrentAcres, Grade and CurrentYardItemsValue.
#### Note that PropertyCenterPoint_y has an absolute correlation of 0.205 with CurrentValue which is close to that of CurrentYardItemsValue and CurrentValue. Locations are also important.

In [59]:
df_features.corr().CurrentValue.sort_values(ascending=False)

CurrentValue             1.000000
SalePrice                0.899271
CurrentBuildingValue     0.875832
TotalGrossArea           0.852118
FinishedArea             0.851266
CurrentLandValue         0.763421
NumofUnits               0.704488
NumofRooms               0.550320
NumofBedrooms            0.515244
CurrentAcres             0.416455
Grade                    0.385713
CurrentYardItemsValue    0.220616
PropertyCenterPoint_x    0.138348
HeatFuel                 0.068820
Foundation               0.057184
StreetNumber             0.005545
HeatType                -0.012826
ZoningCode              -0.025296
LandUse                 -0.027871
BuildingType            -0.059030
StreetName              -0.066724
Depreciation            -0.088107
YearBlt                 -0.091734
PropertyCenterPoint_y   -0.205440
Name: CurrentValue, dtype: float64

#### The top list of correlations to SalePrice is almost the same. But both PropertyCenterPoint_x and PropertyCenterPoint_y have higher absolute correlation coffecients with SalePrice than that of CurrentYardItemsValue. Locations are really important to determine sale prices.

In [60]:
df_features.corr().SalePrice.sort_values(ascending=False)

SalePrice                1.000000
CurrentValue             0.899271
FinishedArea             0.842388
TotalGrossArea           0.825751
CurrentBuildingValue     0.765794
CurrentLandValue         0.716825
NumofUnits               0.697539
NumofRooms               0.444048
NumofBedrooms            0.439134
Grade                    0.290607
CurrentAcres             0.284079
PropertyCenterPoint_x    0.133821
CurrentYardItemsValue    0.127509
HeatFuel                 0.058047
Foundation               0.049044
HeatType                -0.000754
StreetNumber            -0.002163
ZoningCode              -0.020556
BuildingType            -0.036554
LandUse                 -0.042184
Depreciation            -0.057738
StreetName              -0.064640
YearBlt                 -0.092049
PropertyCenterPoint_y   -0.172680
Name: SalePrice, dtype: float64

#### Among the other features (those with absolute correlation coefficients less than 0.12), YearBlt is relatively important but not as important as the features in the top list (those with absolute correlation coefficient greater than 0.12). StreetName, StreetNumber, ZoningCode, Foundation, BuildingType, LandUse, HeatFuel, HeatType, Depreciation seems to be independent of SalePrice/CurrentValue as far as the correlation matrix is concerned.

#### The following shows NumofUnits does not have a super strong relationship with NumofRooms/NumberofBedrooms, while NumofRooms and NumofBedrooms have a stong relationship.

In [61]:
df_features.corr().NumofUnits.sort_values(ascending=False)

NumofUnits               1.000000
FinishedArea             0.885147
TotalGrossArea           0.834319
CurrentValue             0.704488
SalePrice                0.697539
NumofRooms               0.664093
CurrentBuildingValue     0.642933
NumofBedrooms            0.584383
CurrentLandValue         0.502617
CurrentYardItemsValue    0.186994
CurrentAcres             0.151640
PropertyCenterPoint_x    0.127032
Depreciation             0.095117
HeatFuel                 0.048022
Foundation               0.039489
StreetNumber             0.028176
StreetName               0.012692
Grade                   -0.000842
BuildingType            -0.042042
HeatType                -0.066729
PropertyCenterPoint_y   -0.073696
ZoningCode              -0.112449
YearBlt                 -0.217301
LandUse                 -0.321328
Name: NumofUnits, dtype: float64

In [62]:
df_features.corr().NumofRooms.sort_values(ascending=False)

NumofRooms               1.000000
NumofBedrooms            0.909316
TotalGrossArea           0.674898
NumofUnits               0.664093
FinishedArea             0.660258
CurrentValue             0.550320
CurrentLandValue         0.492640
SalePrice                0.444048
CurrentBuildingValue     0.425879
CurrentYardItemsValue    0.297160
CurrentAcres             0.264390
Depreciation             0.175260
PropertyCenterPoint_x    0.173674
Foundation               0.160711
BuildingType             0.086882
HeatFuel                 0.079025
Grade                    0.020691
StreetNumber            -0.005766
ZoningCode              -0.021093
HeatType                -0.035883
StreetName              -0.075995
PropertyCenterPoint_y   -0.109298
LandUse                 -0.274425
YearBlt                 -0.396568
Name: NumofRooms, dtype: float64

In [66]:
df_features[['NumofUnits','NumofRooms','NumofBedrooms']].head(10)

Unnamed: 0,NumofUnits,NumofRooms,NumofBedrooms
0,1,5,2
1,1,7,3
2,1,6,2
3,1,8,4
4,26,103,53
5,1,6,2
6,1,4,2
7,1,8,3
8,1,6,2
9,1,6,4


In [82]:
df_features.corr().Grade.sort_values(ascending=False)

Grade                    1.000000
CurrentBuildingValue     0.565667
CurrentValue             0.385713
YearBlt                  0.324017
SalePrice                0.290607
PropertyCenterPoint_x    0.158225
StreetName               0.128065
FinishedArea             0.124200
HeatFuel                 0.101719
TotalGrossArea           0.094721
NumofRooms               0.020691
NumofUnits              -0.000842
CurrentLandValue        -0.009341
CurrentYardItemsValue   -0.011624
HeatType                -0.019057
NumofBedrooms           -0.019352
StreetNumber            -0.047932
CurrentAcres            -0.056472
LandUse                 -0.194518
PropertyCenterPoint_y   -0.249449
ZoningCode              -0.259491
Foundation              -0.331517
BuildingType            -0.392814
Depreciation            -0.565802
Name: Grade, dtype: float64

#### Just curious on how grades are determined. It seems grades are related to Depreciation, CurrentBuidlingValue, BuidlingType, CurrentValue, Foundation, YearBlt more significantly than to other features. 

In [85]:
grade_corr_abs = df_features.corr().Grade.abs()
grade_corr_abs.sort_values(kind="quicksort",ascending=False)

Grade                    1.000000
Depreciation             0.565802
CurrentBuildingValue     0.565667
BuildingType             0.392814
CurrentValue             0.385713
Foundation               0.331517
YearBlt                  0.324017
SalePrice                0.290607
ZoningCode               0.259491
PropertyCenterPoint_y    0.249449
LandUse                  0.194518
PropertyCenterPoint_x    0.158225
StreetName               0.128065
FinishedArea             0.124200
HeatFuel                 0.101719
TotalGrossArea           0.094721
CurrentAcres             0.056472
StreetNumber             0.047932
NumofRooms               0.020691
NumofBedrooms            0.019352
HeatType                 0.019057
CurrentYardItemsValue    0.011624
CurrentLandValue         0.009341
NumofUnits               0.000842
Name: Grade, dtype: float64

### We will see the statistics of several linear regression models to explore the relationships between variables.

#### CurrentValue and FinishedArea are strongly correlated to SalePrice as the following three models indicate. R-squared's are not low while all p-values for the t-tests are all zero. The t-test tests whether the coeffiecient of a variable in the linear model is zero. Reject the null hypothesis that the coefficient is zero (there is no relation between the independent varaible and the target variable) when the p-value is small.

In [67]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
m = ols('SalePrice ~ CurrentValue+FinishedArea',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.830
Model:                            OLS   Adj. R-squared:                  0.830
Method:                 Least Squares   F-statistic:                 1.052e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        18:41:36   Log-Likelihood:                -56509.
No. Observations:                4308   AIC:                         1.130e+05
Df Residuals:                    4305   BIC:                         1.130e+05
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept     -2.46e+04   3091.275     -7.957   

In [78]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
m = ols('SalePrice ~ CurrentValue',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.809
Model:                            OLS   Adj. R-squared:                  0.809
Method:                 Least Squares   F-statistic:                 1.820e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        18:55:01   Log-Likelihood:                -56765.
No. Observations:                4308   AIC:                         1.135e+05
Df Residuals:                    4306   BIC:                         1.135e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept    -4.124e+04   3191.760    -12.921   

In [79]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
m = ols('SalePrice ~ FinishedArea',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.710
Model:                            OLS   Adj. R-squared:                  0.710
Method:                 Least Squares   F-statistic:                 1.052e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        18:55:20   Log-Likelihood:                -57664.
No. Observations:                4308   AIC:                         1.153e+05
Df Residuals:                    4306   BIC:                         1.153e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept     7.811e+04   3229.692     24.186   

#### Incoporating more features results in a higher R-square, which means a better fit linear model in some sense. We can see from the statistics that HeatFuel, HeatType and BuildingType fail the t-test at a confidence level of 99% (HeatFuel even fails at 95%). This means HeatFuel, HeatType and BuildingType might not be quite relevant to SalePrice, at least not in the following linear model.

In [75]:
m = ols('SalePrice ~ CurrentLandValue+CurrentBuildingValue+CurrentYardItemsValue+CurrentAcres+HeatFuel+HeatType+BuildingType+YearBlt',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.834
Model:                            OLS   Adj. R-squared:                  0.834
Method:                 Least Squares   F-statistic:                     2708.
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        18:48:36   Log-Likelihood:                -56454.
No. Observations:                4308   AIC:                         1.129e+05
Df Residuals:                    4299   BIC:                         1.130e+05
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept             -6.061e+

#### The low R-square values in the following models means the linear models are poorly fitted, indicating there is no strong direct linear relationships of YearBlt and BuildingType to SalePrice.

In [76]:
m = ols('SalePrice ~ YearBlt',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     36.80
Date:                Sun, 13 Jan 2019   Prob (F-statistic):           1.42e-09
Time:                        18:49:25   Log-Likelihood:                -60309.
No. Observations:                4308   AIC:                         1.206e+05
Df Residuals:                    4306   BIC:                         1.206e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.706e+06   2.32e+05      7.359      0.0

In [77]:
m = ols('SalePrice ~ BuildingType',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     5.761
Date:                Sun, 13 Jan 2019   Prob (F-statistic):             0.0164
Time:                        18:51:05   Log-Likelihood:                -60324.
No. Observations:                4308   AIC:                         1.207e+05
Df Residuals:                    4306   BIC:                         1.207e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept     3.144e+05   7468.550     42.100   

#### High R-square and zero p-value of t-test indicates Finished Area is highly linearly dependent on TotalGrossArea.

In [81]:
m = ols('FinishedArea ~ TotalGrossArea',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:           FinishedArea   R-squared:                       0.942
Model:                            OLS   Adj. R-squared:                  0.942
Method:                 Least Squares   F-statistic:                 7.015e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        19:18:40   Log-Likelihood:                -32705.
No. Observations:                4308   AIC:                         6.541e+04
Df Residuals:                    4306   BIC:                         6.543e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept       -378.4946     10.999    -34.

#### The following linear model for Grade is not well-fitted enough since the R-squared is not high enough. Grade does not seem to depend linearly on LandUse, since LandUse fails the t-test with a high p-value.  Grade depends on CurrentBuildingValue, Foundation, YearBlt, BuildingType, ZoningCode and Depreciation serves as an indicator for Grade. But no strong relationships from other features to Grade have been found.

In [88]:
m = ols('Grade ~ CurrentBuildingValue+Foundation+YearBlt+BuildingType+Depreciation+ZoningCode+LandUse',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:                  Grade   R-squared:                       0.518
Model:                            OLS   Adj. R-squared:                  0.517
Method:                 Least Squares   F-statistic:                     660.1
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        19:32:19   Log-Likelihood:                -8305.4
No. Observations:                4308   AIC:                         1.663e+04
Df Residuals:                    4300   BIC:                         1.668e+04
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept               15.3214 

#### NumofRooms and NumofBedrooms are highly correlated.

In [89]:
m = ols('NumofRooms ~ NumofBedrooms',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:             NumofRooms   R-squared:                       0.827
Model:                            OLS   Adj. R-squared:                  0.827
Method:                 Least Squares   F-statistic:                 2.056e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        19:42:04   Log-Likelihood:                -8434.8
No. Observations:                4308   AIC:                         1.687e+04
Df Residuals:                    4306   BIC:                         1.689e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         1.1092      0.049     22.755

#### FinishedArea and NumofUnits are correlated.

In [90]:
m = ols('FinishedArea ~ NumofUnits',df_features).fit()
print(m.summary())

                            OLS Regression Results                            
Dep. Variable:           FinishedArea   R-squared:                       0.783
Model:                            OLS   Adj. R-squared:                  0.783
Method:                 Least Squares   F-statistic:                 1.558e+04
Date:                Sun, 13 Jan 2019   Prob (F-statistic):               0.00
Time:                        19:42:45   Log-Likelihood:                -35549.
No. Observations:                4308   AIC:                         7.110e+04
Df Residuals:                    4306   BIC:                         7.111e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    578.1823     17.192     33.630      0.0

## Conclusions
#### 1.SalePrice is highly related to CurrentValue.
#### 2.TotalGrossArea is highly related to FinishedArea. 
#### 3.NumofRooms is highly related to NumofBedrooms. This could mean most of rooms are bedrooms in a large portion of housing properties.
#### 4.FinishedArea is highly related to NumofUnits.