# COGS 108 - Project Proposal
## Names
- Peng Yutong
- Richard Rangel
- Muhammad Omer
- Matthew Palmer
- Shijun Li

## Research Question
*"Using ensemble methods, how do our selected institutional investor metrics compare to traditional economic indicators in predicting San Diego housing market movements (2018 - 2023), and can we identify zip codes where investor activity appears to have outsized influence relative to local economic fundamentals?"*

---

## Background and Prior Work 
In recent years, the San Diego housing market has experienced significant changes, notably the increasing involvement of institutional investors in residential real estate. Traditionally, housing market analyses have focused on economic indicators such as employment rates, median income, and interest rates. However, we think the growing presence of institutional investors introduces new dynamics that may not be fully captured by these conventional metrics. Understanding how investor activity interacts with local market fundamentals is crucial for accurately modeling housing price movements and assessing potential long-term impacts in both San Diego housing markets and communities.
 
Research has documented the rising role of institutional investors in the housing market. A report by the U.S. Government Accountability Office (GAO) found that while institutional investors own approximately 2% of single-family rental homes nationwide, their ownership is more concentrated in certain metropolitan areas, such as Atlanta and Charlotte, where they own a larger share of the market. The report suggests that institutional investors' activities can influence local housing markets, potentially affecting home prices and availability (GOA, 2020).

In terms of predictive modeling, machine learning techniques have been increasingly applied to housing price prediction. A study published in the International Journal of Advanced Computer Science and Applications explored the use of advanced machine learning algorithms, including support vector regression and artificial neural networks, for predicting house prices. The study found that these models could effectively capture complex patterns in housing data, leading to improved prediction accuracy (AL-Masum, 2021).

Despite these advancements, there is a noticeable gap in the literature regarding predictive models that explicitly incorporate institutional investor activity as a variable. Most existing models focus on traditional economic indicators and property features, without accounting for the influence of large-scale investors. Addressing this gap could enhance the accuracy of housing market predictions and provide deeper insights into the factors driving price changes.

Our project aims to fill this gap by developing a predictive model that integrates both traditional economic indicators and data on institutional investor activity. By leveraging machine learning techniques, we seek to assess the impact of institutional investors on housing price movements at the zip code level in the San Diego area. This approach will allow us to identify areas where investor influence is particularly pronounced and evaluate how this influence interacts with other market factors.


References

[1] U.S. Government Accountability Office. "Rental Housing: As More Households Rent, the Poorest Face Affordability and Housing Quality Challenges." GAO-20-427, May 2020. https://www.gao.gov/products/gao-20-427

[2] Al-Masum, Mohammad, et al. "Advanced Machine Learning Algorithms for House Price Prediction." International Journal of Advanced Computer Science and Applications, vol. 12, no. 12, 2021, pp. 699-706. https://thesai.org/Downloads/Volume12No12/Paper_91-Advanced_Machine_Learning_Algorithms.pdf

---

## Hypothesis
We predict that zip codes with high institutional investor activity will show housing price movements that deviate significantly from what traditional economic indicators would suggest. Furthermore, we expect that ensemble methods combining both institutional investor metrics and traditional economic indicators will provide more accurate predictions of housing market movements than either set of metrics alone.

---

## Data

### Variables to be Measured:
- Monthly housing price indices by zip code
- Institutional investor purchase volumes and total transaction values
- Property types acquired by investors
- Investor hold periods and exit strategies
- Traditional economic indicators (unemployment rates, median income, interest rates)
- Local demographic data
- Building permits and new construction data
- Rental market metrics

### Potential Data Sources:
- **San Diego County Assessor's Office** (property transactions)
- **U.S. Bureau of Labor Statistics** (economic indicators)
- **Federal Reserve Economic Data (FRED)**
- **CoreLogic or Zillow** (housing price data)
- **SEC filings for publicly traded institutional investors**
- **San Diego Association of Governments (SANDAG)** (demographic data)


---

## Ethics & Privacy
- **Data Privacy & Confidentiality** 

    Data privacy concerns exist regarding property transaction records and demographic information. While most of this data is publicly available, we must ensure our analysis doesn't inadvertently reveal personally identifiable information about individual property owners or tenants. We will aggregate data at the zip code level to maintain privacy.

- **Bias in Data Collection & Representation**  

    There are potential biases in our dataset that need to be acknowledged and addressed. Institutional investor activity may be underreported in certain areas, and some demographic groups might be disproportionately affected by investor activity. We will carefully document any data limitations and potential sampling biases in our analysis.

- **Detection and Mitigation of Bias** 

    Exploratory Data Analysis (EDA) will be conducted to examine demographic distributions and missing data patterns. If biases are detected, adjustments such as re-weighting or stratified sampling will be applied.

- **Post-Analysis Considerations** 
    
    The findings of this research could have implications for housing policy and community development. We must be transparent about our methodology and careful not to draw causative conclusions where only correlative relationships exist. We will also consider the potential impact of our findings on various stakeholders, including local residents, policymakers, and market participants.

---

## Team Expectations
-  Group Chat is our main form of communication, Discord as data sharing
- Polite and blunt communication.
- For decision making, we decided on majority agreement, if someone passed the response time, the majority available will make the decision.
- People volunteer for tasks as they become known during meetings, credit will be given accordingly.
- Future deadline will be discussed during meetings.
- Zoom meetings every Sunday, in person every other week.
- If someone is struggling it is preferred they mention it at least 3 days before deadline, so that others can help them.


---

## Project Timeline Proposal
| Meeting Date | Meeting Time | Completed Before Meeting | Discuss at Meeting |
|-------------|-------------|------------------------|--------------------|
| 2/1 | 11:30 AM | Zoom meeting, ice breaker. Previous Project Review. | Briefly discussed our project’s topic. Collaborated on reviewing the previous projects |
| 2/5 | 3 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions | Further discussed our project idea. Making adjustments, making group policies. |
| 2/8 | 1 PM | Shijun Li came up with some great project topic ideas and hypotheses, however, we need further discussion | Came up with new project ideas. |
| 2/9 | 3:50 PM | For our project proposal Richard decided to do the Background and Hypothesis. Peng decided to work on ethics & privacy problems and the time line proposal. | More details about who is working on each section of the project proposal. |
| 2/9 | 7 PM | Zoom meetings, discussed more about the Project Proposal on github | Discuss/edit Analysis; Complete project check-in |

