***


## Executive Summary

This exercise examines the effects of features on the prediction of a house pricing model in the city of Ames, Iowa.

Using historical housing sale data, we are tasked to optimise the price prediction model with features that could help explain the sale value. 


Contents:
- 1.0 Data Cleaning
- 2.0 Exploratory Data Analysis
- 3.0 Preprocessing and Feature Engineering
- 4.0 Modeling Benchmarks
- 5.0 Prepare Test Set for Modeling
- 6.0 Model on Test Set
- 7.0 The Ames Housing Project Report 


Data:
- The Ames Housing Data

Using the results from our model we know that it the following coefficients have a large impact on how the price is being predicted.

We may narrow value-added features these down into 3 groups of interest:

1. Quality
2. Total Area (Sqft)
3. Neighbourhood

However, this model may not generalise well to other cities as these features are specifically unique to this area and property market. To generalise the model, we would have to underfit it which will have a negative impact on its accuracy, but increase the generalisation and decrease the bais of the model.






***

# 7.0 The Ames Housing Project Report

## 7.1 Problem Statement

The sale prices for all kinds of houses are currently valued using different parameters, each having its own significant value in the buyer and seller’s eyes. 

In the property market, there is no one-stop solution to how these values can affect the house’s final sale value. Estimators exist, but their accuracy varies. The only way to get a house evaluated is through a Real Estate Appraiser. 

While many would put a few important parameters to evaluate these properties in its current economic situation, it doesn’t give a clear justification on how the house and its current offerings affect the prices.  

I aim to create an ElasticNet model that is accurate in predicting a house’s sale price using specific parameters from previous sales data to create a more accurate estimate and its justifications for both buyer and seller in the market.

***

## 7.2 My Model


As our goal is to locate the optimum model complexity while minimising over fitting, I have chosen ElasticNet for its ability to incorporate 2 reggression models into one. 

ElasticNet has the ability to regularise large coefficients which exist in my feature engineered columns, which my otherwise command a a larger coefficient resulting in higher variance. By combining Ridge regressiona and Lasso regression, it gives the best balance of penalty calculation towards our coefficients by adding both penalties to the loss function. 

To improve on optimisation, I have also chosen to process the ElasticNet with a pipeline along with GridSearchCV to calculate the best parameters for ElasticNet. 

our model works best with an alpha of 4.893900918477494 and a L1 to L2 ratio of 0.8105263157894738



***

#### Correlation Coefficient 

![Coeff_postiive.png](attachment:Coeff_postiive.png)


We see the top positive coefficients used by the model to achieve its predicted sales value.

The top factors positively affecting price are the property's quality of builds or materials, total areas, and neighbourhood.


![distplot_saleprice.png](attachment:distplot_saleprice.png)

We see that the distribution graphs are right skewed with a very long tail, indicating that there are some extreme values. We have established that they do not seem to be outliers as they still fall inside city demographics sourced from the internet. 

![barplot_saleprice.png](attachment:barplot_saleprice.png)

We see the bars following a pattern of increment in value as the ordinal ratings increase. This means that a house with a better quality build will receive much more in sales price than a house who has only scored average on quality.

![Scatterplot_saleprice.png](attachment:Scatterplot_saleprice.png)

The scatter plots of the highest coefficients show a high correlation to sale price

![boxplot_neighbourhood.png](attachment:boxplot_neighbourhood.png)

A boxplot showing the percentiles of sale values of the neighbourhood and the average median of the neoghbourhoods. This boxplot also shows a good number of neighbourhoods falling below the average. 

## 7.3 Conclusion & Recommendations


Using the results from our model we know that it the following coefficients have a large impact on how the price is being predicted. 

We may narrow value-added features these down into 3 groups of interest:
1. Quality
2. Total Area (Sqft)
3. Neighbourhood


1. Quality
We can see that Quality and its permutations in the data has a large weight in the predictions. It shows that Buyers are willing to pay top dollar for houses which are built with quality in mind. The resonates with the quality of other assets on the property. 

Sellers should then focus on improving the quality of their house since they will not be able to change its location, or totsl area easily. 

2. Total Area(Sqft)
As we can expect to pay more for a larger cake, we should expect same with buying land. Although we can see that it has a large impact in the value, the negative impact of poor quality in a poorly valued neighbourhood will reverse the incremental increase in price given the area.

3. Neighbourhood
We can see from the EDA that some neighbourhood clearly have higher valued lots. We do not know the exact causation to how the neighbourhood is valued, more data is needed here. The top 3 neighbourhoods with the highest median sale price are: Stone Brook, Northridge and Northridge Heights. 


Conversely, it is suggested that poorly constructed, aesthetically poor or unfinished houses often receive lower values. 

However, this model may not generalise well to other cities as these features are specifically unique to this area and property market. To generalise the model, we would have to underfit it which will have a negative impact on its accuracy, but increase the generalisation  and decrease the bais of the model. 

***

proceed back to:
- [01_Cleaning_for_Train_and_Test_Sets](01_Cleaning_for_Train_and_Test_Sets.ipynb)
- [02_EDA](02_EDA.ipynb)
- [03_Preprocessing_and_Modeling](03_Preprocessing_and_Modeling.ipynb)
