Presentables
Given housing data for Kings County Seattle, to find price predictors and create a linear regression model capable of predicting potential prices for new homes. This README shall also serve as a genreal outline and explanation of the analysis and devolpment process.
An exploration into distances did find some interesting trends that warrant further investigation and after numerous tried and failed attempts at using recursive feature elimination we settled on standard feature selection for our final model. While initially the model seemed promising, further investigation revealed our perceived model score to be artifically high due to the lack of a constant in the model.
Our initial investigation looked at the relationship between house grade and house price here.
As we can see, there does appear to be a linear relationship with higher house grades and higher sell prices.
We would then look into the affects of renovation and if more recent renovations had a stronger effect of the sale price of the house
While not a strong relationship with our data, it was interesting to see that there did appear to be a slight trend with more recently renovated homes tending to have a slightly higher possible selling price than those that were renovated longer ago.
Due to the lack of information of waterfront properties, we were curious if the houses proximity to water would have an effect on the sale price as well as the houses distance from downtown Seattle, which we investigated here
There did indeed appear to be a correlation between the price of a house and its distance from Seattle, with houses that were closer to the city tending to have higher price possibilites.
Next we would determine a few hotspot locations in the water to get a better idea of house prices and their proximity to water. Using these few hotspots we would then compare the distance from the closest hotspot to the homes price which would give us... Which shows a much stronger relationship between distance from water and house price than distance from Seattle.
We attempted numerous modeling approaches with typically poor results. Here we made our first model using a blanket approach just to see what kind of results we'd get. In terms of R2, our results weren't bad. However, everything else about the model essentially was with numerous instances of multicollinearity and an unacceptable amount of kurtosis.
Next we attempted to use Recursive Feature Elimination here and here
While not all models were saved, numerous attempts were made with various adjustments that all seemed to provide similiar or diminishing results. High amounts of multicollinearity seemed unavoidable with this method.
Eventually we settled on our final model. At first glance, it looked like we had finally made a great model for our provided data! However, further investigation showed that our test results weren't matching up with each other. Essentially we had forgotten to include a constant in our model as a house will never sell for $0. After including the constant we got a much different result that confirmed what our initial test results were trying to tell us...