# Forecasting Real-Estate Property Value around Chicago Metropolitan Area
Modeling and analysis by Matt Carr, Johnhoy Stephens, and Luluva Lakdawala

## Setting the Scene:

This project investigates the trend in the market value of homes around the Chicago metropolitan area and uses times series modeling techniques to make forecasts for a one year time period. Using the forecasts made with our models, Return on Investment (ROI) are calculated for each forecast and the best regions are selected purely based on ROI. Real Estate investments are capital intensive and locating properties in which to invest can involve substantial work. We aim to provide a guide for narrowing down the search to Zip codes for properties to invest in. 

### Goals:

Our project aims to:
- Identify regions around Chicago metropolitan area where the market value of properties have seen steady growth over the years.
- Build Time Series Models based on the values of homes sold in the past in those regions.
- Make forecasts of median home prices for the identified regions.
- Make recommendations for investment in real estate in Zip Codes that exhibit the best return on investment.

### Definitions:

- Return on Investment:
    - Return on Investment (ROI) is defined as a performance measure used to calculate the efficiency of an investment. To calculate ROI - the return is divided by the cost of investment.
        
- Historical ROI :
    - Similar to ROI but it is calculated on historical data -> (present value less past value) divided by past value.
    
- Model:
    - The term model referred to through this project is in reference to the variuos Time Series Models that are built for making forecasts.
    
- AIC score:
    - The metric we are using to compare the various models we build. Akaike information criterion (AIC) is a defined as an estimator that calculates the prediction error of a model. It estimates the quality of a model relative to other models. A good model is the one that has minimum AIC among all the other models.
    
### Data:

The data used in this project is from [Zillow Research Page](https://www.zillow.com/research/data/)

Note that this modeling analysis contains the 'stream lined' version iterations from getting our first simple model to our final models. To get a more in depth view of our exploration process, mistakes, and triumphs, please refer to the exploratory folder within the notebooks folder.

### Analysis Takeaways:  


### Future Investigations:


### Recommendations:


In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os, sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

from src import cleaning_functions as cfs
from matplotlib.pylab import rcParams
plt.style.use('fivethirtyeight')
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
from statsmodels.tsa.arima_model import ARIMA
from pmdarima import auto_arima
from statsmodels.tsa.statespace.sarimax import SARIMAX
import itertools
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.easter import easter
from fbprophet import Prophet
from sklearn.metrics import mean_squared_error as mse