# Project Proposal: Predicting Hurricane Categories Using Machine Learning

## Course: UNC MADS DATA780

### Authors:
- **Hubert Hwang**
- **Sooji Rhodes**

---

### 1. Objective:
We aim to develop a machine learning model to accurately predict hurricane categories based on meteorological data (wind speed, pressure, geographic position) from NOAA’s Atlantic Hurricane Database (HURDAT2). The goal is to improve predictive accuracy and better understand storm dynamics, offering a tool for enhancing disaster preparedness.

---

### 2. Current Practice and Limitations:
Today, hurricane categories are typically determined by meteorologists based on observed data and models. However, these methods often struggle with real-time prediction accuracy due to complex storm patterns, especially with rapidly intensifying storms. Traditional models may miss key nonlinear relationships that machine learning can uncover.

---

### 3. Approach:
We will explore machine learning techniques (Random Forest, Gradient Boosting, Artificial Neural Networks, LSTMs) to handle both aggregated and sequential data. Our approach allows us to incorporate time-series data to capture storm evolution. Using advanced models like LSTMs, we can potentially improve on current methods by analyzing the temporal progression of hurricanes.

---

### 4. Impact:
If successful, this project will provide an improved tool for predicting hurricane categories, which is critical for early warnings and disaster management. More accurate predictions could save lives and reduce damage by enabling timely evacuations and preparations.

---

### 5. Challenges & Risks:
- **Data Imbalance**: Fewer examples of extreme categories (e.g., Category 5) may cause model bias.
- **Sequential Modeling**: Training LSTMs requires large amounts of data and careful tuning to avoid overfitting.
- **Missing Data**: Early-year data may have gaps, necessitating imputation or exclusion of some records.

---

### 6. Evaluation Criteria:
- **Accuracy** and **Confusion Matrix** will be our primary metrics. 
- **ROC-AUC** scores will evaluate overall performance across categories.
- **Precision, Recall, and F1-Score** will be used to ensure balanced performance across all hurricane categories, particularly for underrepresented ones.

---

### 7. Methods:
1. **Baseline**: Random Forest and Gradient Boosting for tabular data.
2. **Extensions**: Artificial Neural Networks (ANNs) and LSTMs for sequential data. These will allow us to test whether time-series analysis improves predictive power.
   
---

### 8. Data Sources:
The primary dataset for this project is the [NOAA Atlantic Hurricane Database (HURDAT2)](https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2023-051124.txt). This dataset contains detailed historical information on Atlantic hurricanes from 1851 to 2023. An explanation of the data format can be found in the [HURDAT2 Data Format Documentation](https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-atl-1851-2021.pdf).

---

### 9. Timeline:
- **Week 1**: Data Collection, Preprocessing, Feature Engineering.
- **Week 2**: Baseline Model Training (Random Forest, Gradient Boosting).
- **Week 3**: Advanced Model Training (ANNs, LSTMs).
- **Week 4**: Model Evaluation, Tuning, and Final Report.

---

### 10. Tools:
- **Python**, **Scikit-learn**, **TensorFlow**.
- **GitHub repository**: [here](https://github.com/soojirhodes/DATA780_Final_Project).

---

### AI Usage Documentation:
This project made use of **ChatGPT** in the following ways:

| **Usage** | **Tool Used (e.g., ChatGPT-4)** | **How you edited the output** | **Conversation Link (If available)** |
|-----------|----------------------------------|-------------------------------|-------------------------------------|
| Brainstorming and idea generation | ChatGPT | Output reviewed and refined | Not available |
| Research | ChatGPT | Verified information accuracy | Not available |
| Drafting | ChatGPT | Reworded for clarity and adherence to project requirements | Not available |
| Polishing | ChatGPT | Edited and revised for final submission | Not available |

#### Acknowledgment:
ChatGPT was used to assist in brainstorming ideas, drafting sections, and improving the organization and clarity of this proposal. The final content was reviewed, verified, and edited by the authors to ensure accuracy and adherence to course guidelines.

---

### Appendix D: Syllabus Guidelines for Generative AI Use:
Per **Appendix D: Syllabus Guidelines for Generative AI**, the use of generative AI technologies (such as ChatGPT) must be declared and explained in submissions. As outlined, AI was employed responsibly in this project, and the authors remain fully accountable for the content, including verification of the AI-generated material.