# ECON 5140: Applied Econometrics
## Midterm Project Options

**Instructions:**

For the midterm project, you will work in groups and choose **ONE** project from the options below. Each project focuses on either:
- **GLM (Logistic Regression)**: Binary classification problems
- **Time Series Forecasting**: Predicting future values using ARIMA, ETS, Prophet, or other forecasting models

**Project Requirements:**
- Complete data loading, preprocessing, and exploratory data analysis
- Implement appropriate models (logistic regression for GLM projects, time series models for forecasting projects)
- Evaluate model performance using appropriate metrics
- Provide visualizations and interpretations
- Prepare a 20-minute presentation covering:
  - Project introduction and data overview
  - Methodology and model implementation
  - Results and key findings
  - Conclusions and insights


## Project Option 1: Loan Default Prediction (GLM)

**Objective:** GLM (Logistic Regression) - Binary Classification

**Description:** Predict whether a borrower will default on a loan based on borrower characteristics, loan terms, and economic conditions. Relevant for financial economics and credit risk analysis.

**Data Link:** https://www.kaggle.com/datasets/wordsforthewise/lending-club

**Alternative:** https://www.kaggle.com/datasets/laotse/credit-risk-dataset

**Key Tasks:**
- Predict loan default (default/non-default) using logistic regression
- Analyze which borrower and loan characteristics are most predictive
- Evaluate model using accuracy, ROC-AUC, confusion matrix
- Discuss implications for credit risk management and lending policies

## Project Option 2: Employment Status Prediction (GLM)

**Objective:** GLM (Logistic Regression) - Binary Classification

**Description:** Predict employment status (employed/unemployed) or job finding probability based on individual characteristics, education, and economic conditions. Relevant for labor economics.

**Data Link:** 
- HR Analytics Dataset (predict employee attrition/employment status): https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset

**Key Tasks:**
- Predict employment status using logistic regression
- Analyze which individual and economic factors are most predictive
- Evaluate model performance and interpret coefficients
- Discuss implications for labor market policies and unemployment

## Project Option 3: Store Sales Forecasting (Time Series)

**Objective:** Time Series Forecasting

**Description:** Forecast store sales by product family (one level of hierarchy). You can forecast sales for different product families separately, or aggregate to overall sales. This allows for manageable complexity while still working with real retail data.

**Data Link:** 
- Kaggle Competition: https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data

**How to Download Data:**
1. Install Kaggle API: `pip install kaggle`
2. Set up Kaggle credentials (download `kaggle.json` from your Kaggle account settings)
3. Download data using one of these methods:
   - **Command line:** `kaggle competitions download -c store-sales-time-series-forecasting`
   - **Python code:**
     ```python
     import kaggle
     from kaggle.api.kaggle_api_extended import KaggleApi
     api = KaggleApi()
     api.authenticate()
     api.competition_download_files('store-sales-time-series-forecasting', path='./data')
     ```

**Key Tasks:**
- Load and explore the store sales data
- Aggregate sales by product family (one level of hierarchy) or use overall sales
- Create time series at daily/weekly level
- Forecast **1-month ahead** (30 days) sales using ARIMA, ETS, Prophet models
- Handle seasonalities (weekly, monthly, yearly patterns)
- Compare different forecasting models using MAE, RMSE, MAPE
- Provide forecast visualizations and confidence intervals
- **Note:** You can forecast by product family (e.g., GROCERY I, BEVERAGES, etc.) or aggregate to total sales - choose one approach
- **Forecast Horizon:** 1 month (30 days) ahead

## Project Option 4: Energy Consumption Forecasting (Time Series)

**Objective:** Time Series Forecasting

**Description:** Forecast household electric power consumption using historical usage patterns. The dataset contains minute-level measurements of power consumption.

**Data Link:** https://www.kaggle.com/datasets/uciml/electric-power-consumption-data-set

**Alternative:** https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption

**Data Structure:**
- **Date;Time**: Timestamp
- **Global_active_power**: Total household active power consumption (kilowatts) - **RECOMMENDED TO FORECAST**
- **Global_reactive_power**: Reactive power (kilowatts)
- **Voltage**: Voltage (volts)
- **Global_intensity**: Current intensity (amperes)
- **Sub_metering_1**: Kitchen appliances (dishwasher, oven, microwave)
- **Sub_metering_2**: Laundry room (washing machine, tumble-drier, refrigerator, light)
- **Sub_metering_3**: Electric water-heater and air-conditioner

**What to Forecast:**
- **Primary target**: `Global_active_power` (total household power consumption in kilowatts)
- **Alternative**: You can also forecast individual sub-metering categories (Sub_metering_1, 2, or 3) or aggregate them
- **Aggregation**: Aggregate minute-level data to hourly or daily level for forecasting

**Key Tasks:**
- Load and aggregate data (minute â†’ hourly or daily)
- Forecast **1-week ahead** (7 days) energy consumption using time series models (ARIMA, ETS, Prophet)
- Analyze trends and seasonality patterns (daily, weekly, seasonal)
- Compare different forecasting models using MAE, RMSE, MAPE
- Evaluate forecast accuracy and provide insights for energy planning
- **Forecast Horizon:** 1 week (7 days) ahead

## Project Option 5: NASDAQ Index Forecasting (Time Series)

**Objective:** Time Series Forecasting

**Description:** Forecast NASDAQ Composite Index prices using historical data. Provide 5-day ahead forecasts.

**Data Link:** 
- Yahoo Finance (via yfinance): NASDAQ Composite Index (^IXIC)
- Can be downloaded using: `yfinance` library in Python
- Example code: `import yfinance as yf; data = yf.download('^IXIC', start='2019-01-01', end=None)`

**Key Tasks:**
- Download NASDAQ index data using yfinance
- Forecast **5-day ahead** stock prices using ARIMA, ETS, Prophet models
- Analyze trends and volatility patterns
- Compare forecast accuracy across different models (MAE, RMSE, MAPE)
- Evaluate 5-day ahead forecast performance
- Discuss the challenges of forecasting financial time series
- **Forecast Horizon:** 5 days ahead