This project leverages machine learning techniques to predict median housing prices across different states in the USA over the next three years. The model utilizes various macroeconomic indicators such as interest rates, inflation, and unemployment rates to provide accurate forecasts. The aim is to assist real estate investors, policymakers, and financial institutions in making informed decisions by providing reliable predictions of housing market trends.
- Data
- data/: Contains all datasets used in the project.
- cpi.xlsx: Consumer Price Index data.
- fed_interest_rate.csv: Federal interest rates.
- house_rental_index.csv: House rental index data.
- house_value_index.csv: House value index data.
- pop_unemployment.xlsx: Population and unemployment data.
- data.csv: Joined and cleansed dataset combining all the above tables.
- Code
- us-housing-price.ipynb: Jupyter notebook containing all the code for data wrangling, preprocessing, EDA, model training, and prediction.
- Documentation
- docs/: Contains all project documentation.
- Predicting_USA_Median_Housing_Prices.pdf: The detailed white paper.
- Presentation_Slides.pptx: PowerPoint presentation of the project.
Clone the repository
git clone https://github.com/yourusername/housing-price-prediction.git
cd housing-price-prediction
Create a virtual environment and install dependencies
python3 -m venv housing-price-venv
source housing-price-venv/bin/activate
pip install -r requirements.txt
Run the Jupyter Notebook
jupyter notebook us-housing-price.ipynb
This notebook includes:
- Data wrangling and preprocessing
- Exploratory Data Analysis (EDA)
- Model training and evaluation
- Predictions for future median housing prices
This project develops a machine learning model to predict median housing prices across US states, leveraging economic indicators, demographic data, and historical housing prices.
Predicting housing prices by focusing on macroeconomic indicators to help stakeholders make informed decisions.
- Sources: Consumer Price Index, Federal Reserve Economic Data, Zillow, public housing databases, U.S. Census Bureau.
- Preparation: Handling missing values, scaling, and encoding.
- Models Used: Linear Regression, Random Forest, XGBoost.
- Evaluation Metrics: R-squared, MAE, RMSE.
- Feature Importance: Key features include rent, CPI index, total population, interest rates, and inflation.
- Model Performance: High accuracy with R-squared of 0.9988, MAE of 1817, and RMSE of 4093.
- Real-Time Analysis: Integrate with real-time data feeds.
- Local Predictions: Extend to city or neighborhood levels.
- Policy and Planning: Aid in sustainable development.
- Data Privacy: Ensure compliance with regulations.
- Bias and Fairness: Regular audits to prevent bias.
- Transparency: Use interpretable models.
Jubyung Ha - jyubaeng@gmail.com