<a href="https://colab.research.google.com/github/rp-rurouni/AIM240-capstone-rahul/blob/main/Rahul_Poddar_AIM240_Initial_Project_Planning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIM 240: AI/ML Capstone Project Initialization Document

**Student Name: Rahul Poddar

**Semester/Year:** Spring 2026

---

## Introduction

Welcome to your Capstone Project! This notebook will guide you through the process of defining, scoping, and planning your individual AI/ML project. Since this course is asynchronous, this document serves as our initial communication tool for understanding your vision and providing targeted feedback.

**Please complete ALL sections thoroughly.** Incomplete submissions will be returned for revision before approval.

### How to Use This Notebook

1. Read through the entire notebook first to understand what's expected
2. Complete Section 1 (Project Ideation) with 2-3 potential project ideas
3. Use Section 2 to systematically evaluate and rank your ideas
4. Complete Sections 3-12 for your **top-ranked project only**
5. Export as HTML or PDF and submit via the course LMS by the posted deadline
6. Await instructor feedback before beginning implementation

---

## Section 1: Project Ideation

### 1.1 Brainstorming Your Project Ideas

Before committing to a single project, explore multiple possibilities. List **2-3 distinct project ideas** that interest you. For each idea, provide a brief description (3-5 sentences) covering:

- What problem does it solve?
- Who would benefit from this solution?
- What type of AI/ML would be involved (classification, generation, detection, etc.)?

#### Project Idea A

**Working Title:** <span style="color:blue">Airline Flight Delay Prediction Using Pre-Departure Data</span>

**Brief Description:**

<span style="color:blue">Airlines experience frequent delays due to factors such as scheduling, congestion, and operational constraints. This project builds a binary classification model that predicts whether a U.S. domestic flight will arrive 15 minutes or more late using only pre-departure information. Features include airline, origin and destination airports, scheduled departure time (hour of day), calendar variables (month, day, and day of week), scheduled flight duration, and distance. The solution is implemented as a Kaggle-style, reproducible end-to-end notebook, covering data preparation, modeling, and evaluation.</span>

**Primary ML Domain(s):** <span style="color:blue">Tabular/Structured Data, Time Series/Forecasting</span>

#### Project Idea B

**Working Title:** <span style="color:blue">Safety Incident Risk Prediction in Manufacturing Environments</span>

**Brief Description:**

<span style="color:blue">Manufacturing facilities collect large amounts of safety data, including incidents, inspections, and injury records, but this data is often reviewed only after incidents occur. This project aims to predict which work areas or time periods have a higher risk of safety incidents using historical safety records. Safety managers and plant leadership would benefit by taking preventive actions before incidents happen. The project uses supervised machine learning classification on structured safety data.</span>

**Primary ML Domain(s):** <span style="color:blue">Tabular/Structured Data</span>

#### Project Idea C (Optional but Recommended)

**Working Title:** <span style="color:blue">Classification of Airline Delay Causes Using Operational Data</span>

**Brief Description:**

<span style="color:blue">Flight delays can occur for multiple reasons, such as weather, airline operations, or air traffic control issues. This project focuses on classifying the primary cause of a flight delay using historical operational data. Airline performance analysts and planners would benefit from identifying recurring delay patterns and root causes. The project uses multi-class supervised machine learning classification on structured data.</span>

**Primary ML Domain(s):** <span style="color:blue">Tabular/Structured Data</span>

---

## Section 2: Project Ranking & Selection

### 2.1 Evaluation Criteria Matrix

Rate each of your project ideas on a scale of **1-5** for each criterion below. Be honest in your assessment - this exercise helps you avoid projects that may become problematic later.

**Rating Scale:**
- 1 = Very Poor / Major Concerns
- 2 = Below Average / Some Concerns
- 3 = Adequate / Neutral
- 4 = Good / Minor Concerns
- 5 = Excellent / No Concerns

In [1]:
# ============================================================
# PROJECT RANKING CALCULATOR
# Fill in your scores below (1-5 for each criterion)
# ============================================================

import pandas as pd

# Define the evaluation criteria
criteria = [
    "Personal Interest",      # How excited are you about this project?
    "Data Availability",      # Is appropriate data publicly available?
    "Technical Feasibility",  # Can this be accomplished with your current skills + learning?
    "Scope Appropriateness",  # Is this achievable in one semester?
    "Novelty/Learning Value", # Will you learn new skills beyond what you know?
    "Portfolio Value",        # Will this impress future employers?
    "Ethical Clarity",        # Are there minimal ethical concerns?
    "MLOps Applicability"     # Can you apply MLOps concepts?
]

# ============================================================
# ENTER YOUR SCORES HERE (1-5 for each)
# ============================================================

# Project Idea A scores
idea_a_scores = {
    "Personal Interest": 5,       # <-- Enter score 1-5
    "Data Availability": 4,       # <-- Enter score 1-5
    "Technical Feasibility": 4,   # <-- Enter score 1-5
    "Scope Appropriateness": 4,   # <-- Enter score 1-5
    "Novelty/Learning Value": 3,  # <-- Enter score 1-5
    "Portfolio Value": 3,         # <-- Enter score 1-5
    "Ethical Clarity": 5,         # <-- Enter score 1-5
    "MLOps Applicability": 3      # <-- Enter score 1-5
}

# Project Idea B scores
idea_b_scores = {
    "Personal Interest": 4,       # <-- Enter score 1-5
    "Data Availability": 4,       # <-- Enter score 1-5
    "Technical Feasibility": 4,   # <-- Enter score 1-5
    "Scope Appropriateness": 3,   # <-- Enter score 1-5
    "Novelty/Learning Value": 5,  # <-- Enter score 1-5
    "Portfolio Value": 3,         # <-- Enter score 1-5
    "Ethical Clarity": 3,         # <-- Enter score 1-5
    "MLOps Applicability": 3      # <-- Enter score 1-5
}

# Project Idea C scores (set to 0 if not using)
idea_c_scores = {
    "Personal Interest": 3,       # <-- Enter score 1-5 (or leave as 0)
    "Data Availability": 3,       # <-- Enter score 1-5 (or leave as 0)
    "Technical Feasibility": 3,   # <-- Enter score 1-5 (or leave as 0)
    "Scope Appropriateness": 3,   # <-- Enter score 1-5 (or leave as 0)
    "Novelty/Learning Value": 5,  # <-- Enter score 1-5 (or leave as 0)
    "Portfolio Value": 4,         # <-- Enter score 1-5 (or leave as 0)
    "Ethical Clarity": 3,         # <-- Enter score 1-5 (or leave as 0)
    "MLOps Applicability": 3      # <-- Enter score 1-5 (or leave as 0)
}

df_scores = pd.DataFrame({
    "Criterion": criteria,
    "Idea A": [idea_a_scores[c] for c in criteria],
    "Idea B": [idea_b_scores[c] for c in criteria],
    "Idea C": [idea_c_scores[c] for c in criteria]
})

totals = pd.DataFrame({
    "Criterion": ["TOTAL SCORE"],
    "Idea A": [sum(idea_a_scores.values())],
    "Idea B": [sum(idea_b_scores.values())],
    "Idea C": [sum(idea_c_scores.values())]
})

df_display = pd.concat([df_scores, totals], ignore_index=True)

print("=" * 60)
print("PROJECT EVALUATION MATRIX")
print("=" * 60)
print(df_display.to_string(index=False))
print("\n" + "=" * 60)
print(f"Maximum possible score: 40")
print("=" * 60)

PROJECT EVALUATION MATRIX
             Criterion  Idea A  Idea B  Idea C
     Personal Interest       5       4       3
     Data Availability       4       4       3
 Technical Feasibility       4       4       3
 Scope Appropriateness       4       3       3
Novelty/Learning Value       3       5       5
       Portfolio Value       3       3       4
       Ethical Clarity       5       3       3
   MLOps Applicability       3       3       3
           TOTAL SCORE      31      29      27

Maximum possible score: 40


### 2.2 Final Ranking & Justification

**Your Final Ranking:**

1. **First Choice:** <span style="color:blue">Flight Delay Prediction Using Historical Airline Operations Data</span>
2. **Second Choice:** <span style="color:blue">Safety Incident Risk Prediction in Manufacturing Environments</span>
3. **Third Choice:** <span style="color:blue">Classification of Airline Delay Causes Using Operational Data</span>

**Justification for Your Top Choice (3-5 sentences):**

<span style="color:blue">It combines strong personal interest due to my clients business in airline operations software with excellent data availability and manageable technical complexity. Public datasets from the U.S. Department of Transportation provide large, well-structured flight-level data that is well suited for a supervised machine learning classification problem. The problem scope can be clearly constrained to pre-departure features, reducing complexity while remaining realistic and meaningful. In addition, this project aligns well with my limited current skill level particularly to Python and ML coding and supports a fully reproducible, end-to-end notebook workflow building upon a Kaggle competition references.</span>

**What would need to change for you to switch to your second choice?**

<span style="color:blue">I would switch to the Safety Incident Risk Prediction project if airline datasets were unavailable, incomplete, or required excessive preprocessing beyond the project timeline. A second trigger would be if model performance for flight delay prediction proved consistently weak despite reasonable feature engineering, suggesting limited learning value. In that case, the safety incident project would provide a strong alternative with similarly structured data and a clearer signal for classification.</span>

---

## Section 3: Problem Statement

*Complete this section and all following sections for your TOP-RANKED PROJECT ONLY.*

### 3.1 Problem Definition

Write a clear, concise problem statement below.

#### Example Problem Statements:

**Good Example:**
> Small-scale farmers in developing regions currently struggle with identifying crop diseases early because they lack access to agricultural experts. This results in significant crop losses (estimated 20-40% annually) and reduced income. An AI/ML solution could enable early disease detection and treatment recommendations by analyzing smartphone photos of affected plants.

**Poor Example:**
> I want to make a plant disease detector because it would be cool and use computer vision.

*The poor example lacks specificity about users, problems, consequences, and approach.*

---

**Your Problem Statement:**

<span style="color:blue">Flight delays are a persistent operational challenge for airlines and a frequent source of disruption for passengers. In the United States, one analysis of nationwide flight-level data published by the U.S. Department of Transportation Bureau of Transportation Statistics (BTS) found that approximately 22.6% of U.S. domestic flights were delayed in 2025, indicating that delays affect a substantial portion of airline operations. This conclusion is based on aggregated BTS on-time performance data covering millions of scheduled flights across major U.S. airports and carriers.

Despite the availability of this detailed historical data, delay analysis is often retrospective and focused on reporting and compliance rather than proactive prediction. As a result, airline operations teams have limited ability to anticipate delays before they occur and communicate expected disruptions early. An AI/ML solution can address this gap by using historical flight and airport data to predict the likelihood of delays in advance, enabling better operational planning and earlier communication with passengers.</span>

### 3.2 Existing Solutions

Research and document current solutions to this problem:

| Solution | Type | Strengths | Weaknesses | Why Yours Will Be Different |
|----------|------|-----------|------------|-----------------------------|
| <span style="color:blue">Airline Dashboards</span> | <span style="color:blue">Descriptive Analytics</span> | <span style="color:blue">Widely Used, Trends</span> | <span style="color:blue">Retrospective</span> | <span style="color:blue">Prediction focus</span> |
| <span style="color:blue">Sabre/CAE, Amadeus</span> | <span style="color:blue">Rule-based Ops SW</span> | <span style="color:blue">Integrates schedules, rotation, crew plng</span> | <span style="color:blue">Ltd. predictions, opaque</span> | <span style="color:blue">transparent ML models to est. delay risk</span> |
| <span style="color:blue">Public Delay Reports (US DOT/BTS)</span> | <span style="color:blue">Reportng/Compl.</span> | <span style="color:blue">Std, authoritative, public data</span> | <span style="color:blue">non-actionable for op prediction</span> | <span style="color:blue">same data but predictive models</span> |
| <span style="color:blue">Flight delay prediction notebooks on Kaggle</span> | <span style="color:blue">Community ML Projects</span> | <span style="color:blue">Demo e-2-e ML workflows; public datasets; strong baselines</span> | <span style="color:blue">Often optimized for leaderboard perf; limited interpretability, scope control, or reproducibility</span> | <span style="color:blue">clean, scoped, and reproducible notebook focused on pre-departure features</span> |

**If no existing solutions exist, explain why:**

<span style="color:blue">Not applicable. Multiple solutions exist for monitoring and reporting airline delays, as well as for operational scheduling. However, these solutions primarily focus on descriptive analytics or rule-based decision support rather than using publicly available historical data to build transparent, supervised machine learning models for proactive delay prediction.</span>

---

## Section 4: Project Scope Definition

### 4.1 Core Features (Must-Have)

List the **essential features** your project must have to be considered minimally viable. These are non-negotiable deliverables. Aim for 3-5 core features.

| # | Feature | Description | Success Looks Like |
|---|---------|-------------|-------------------|
| 1 | <span style="color:blue">Flight data ingestion</span> | <span style="color:blue">Load and preprocess historical U.S. airline on-time performance data</span> | <span style="color:blue">Clean, filtered dataset ready for modeling</span> |
| 2 | <span style="color:blue">Delay Label Creation</span> | <span style="color:blue">Define a binary delay label (arrival delay ≥ 15 minutes)</span> | <span style="color:blue">Label matches delay definition correctly</span> |
| 3 | <span style="color:blue">Delay Prediction Model</span> | <span style="color:blue">Train a supervised ML model to predict delays</span> | <span style="color:blue">Model outperforms simple baseline </span> |
| 4 | <span style="color:blue">Model Evaluation</span> | <span style="color:blue">Evaluate model using standard performance metrics</span> | <span style="color:blue">Metrics clearly reported and explained</span> |
| 5 | <span style="color:blue">Reproducible Workflow</span> | <span style="color:blue">Implement an end-to-end ML workflow in a single notebook</span> | <span style="color:blue">Notebook runs end-to-end without errors</span> |

**Example Core Features (Plant Disease Detection):**

| # | Feature | Description | Success Looks Like |
|---|---------|-------------|-------------------|
| 1 | Image Classification | Model classifies plant images into healthy/diseased categories | >85% accuracy on test set |
| 2 | Multi-disease Detection | System identifies at least 5 common diseases per crop type | Correct disease identification in 80% of cases |
| 3 | Web Interface | Users can upload images via simple web form | Interface loads in <3 seconds, works on mobile |

### 4.2 Extended Features (Nice-to-Have)

List features you would add if time permits, in priority order:

| Priority | Feature | Description | Dependencies |
|----------|---------|-------------|--------------|
| 1 | <span style="color:blue">Multi-Airport Analysis</span> | <span style="color:blue">Compare delay prediction performance across multiple airports</span> | <span style="color:blue">Core delay prediction model completed</span> |
| 2 | <span style="color:blue">Feature Importance Visualization</span> | <span style="color:blue">Visualize which pre-departure features most influence delay predictions</span> | <span style="color:blue">Trained model and evaluation metrics available</span> |
| 3 | <span style="color:blue">Time-Based Performance Analysis</span> | <span style="color:blue">Analyze model performance by time of day or season</span> | <span style="color:blue">Cleaned dataset with time features</span> |

### 4.3 Out of Scope

Explicitly state what your project will NOT include. This prevents scope creep and sets clear expectations.

**This project will NOT include:**
- <span style="color:blue">Real-time or live flight data ingestion</span>
- <span style="color:blue">Integration with airline operational, dispatch, or crew management systems</span>
- <span style="color:blue">Passenger-facing applications, dashboards, or user interfaces</span>
- <span style="color:blue">Production deployment or real-time decision support systems</span>

**Example Out of Scope Items:**
- Real-time video processing (images only)
- Integration with farm management software
- Support for languages other than English
- IoT sensor integration

### 4.4 Scope Validation Checklist

Answer honestly by changing `[ ]` to `[x]` for items that apply:

- [X] Can I complete the core features in 10-12 weeks of part-time work?
- [X] Have I built something of similar complexity before, OR have I allocated time to learn?
- [X] If I removed all extended features, would I still have a meaningful project?
- [X] Is this project substantially different from a homework assignment or tutorial?
- [X] Does this project require me to stretch my skills without being overwhelming?

**If you checked fewer than 4 boxes, reconsider your scope.**

---

## Section 5: Data Strategy

### 5.1 Data Requirements

**What data do you need?**

| Data Type | Description | Volume Needed | Format |
|-----------|-------------|---------------|--------|
| <span style="color:blue">Flight Operations Data</span> | <span style="color:blue">Historical U.S. domestic flight records including scheduled and actual arrival times</span> | <span style="color:blue">1–3 years of flights (millions of records available; subset used)</span> | <span style="color:blue">[Format]</span> |
| <span style="color:blue">Pre-Departure Flight Features</span> | <span style="color:blue">Airline, origin airport, destination airport, scheduled departure time, scheduled duration, and distance</span> | <span style="color:blue">Same as flight records</span> | <span style="color:blue">[Format]</span> |
| <span style="color:blue">Calender Features</span> | <span style="color:blue">Month, day of month, and day of week derived from flight date</span> | <span style="color:blue">Same as flight records</span> | <span style="color:blue">Derived fields (CSV)</span> |
| <span style="color:blue">Delay Outcome Label</span> | <span style="color:blue">Binary indicator of arrival delay (≥ 15 minutes)</span> | <span style="color:blue">One label per flight</span> | <span style="color:blue">Derived fields (CSV)</span> |

**Notes on data sourcing**

All required data can be obtained from public datasets derived from the U.S. Department of Transportation Bureau of Transportation Statistics (BTS) on-time performance data.

Only historical, publicly available data is used.

No real-time or proprietary airline data is required.

**Example:**
| Data Type | Description | Volume Needed | Format |
|-----------|-------------|---------------|--------|
| Plant Images | Photos of healthy and diseased plants | 10,000+ images | JPEG/PNG |
| Disease Labels | Classification of disease type | Labels for all images | CSV/JSON |
| Treatment Info | Recommended treatments per disease | 1 entry per disease type | Text/JSON |

### 5.2 Data Sources

For each data source, complete the following:

#### Data Source 1

- **Name/URL:** <span style="color:blue">U.S. Department of Transportation – Bureau of Transportation Statistics (BTS) On-Time Performance Data
https://www.transtats.bts.gov/ONTIME/</span>
- **Type:** <span style="color:blue">Public Dataset</span>
- **Access Method:** <span style="color:blue">Downloaded as CSV files from the BTS TranStats website or accessed via BTS-derived datasets mirrored on Kaggle</span>
- **License/Terms:** <span style="color:blue">Publicly available U.S. government data; no usage restrictions for academic or non-commercial use</span>
- **Quality Assessment:** <span style="color:blue">High-quality, standardized, and widely used in academic research and industry analysis. Data is generally reliable but may contain missing values, canceled flights, or reporting inconsistencies that require preprocessing.</span>
- **Volume Available:** <span style="color:blue">Millions of U.S. domestic flight records per year, covering multiple years and all major airports and carriers</span>

#### Data Source 2

- **Name/URL:** <span style="color:blue">Flight Delay Prediction Datasets on Kaggle https://www.kaggle.com/code/rahulladhani/predicting-flight-delays-using-machine-learning/input</span>
- **Type:** <span style="color:blue>Public Dataset</span>
- **Access Method:** <span style="color:blue">Downloaded directly from Kaggle as CSV files and loaded into a Jupyter/Colab notebook</span>
- **License/Terms:** <span style="color:blue">Dataset-specific licenses; generally permitted for educational and research use as stated on Kaggle dataset pages. Notebook page says released under Apache 2.0 license open source license</span>
- **Quality Assessment:** <span style="color:blue">Preprocessed and cleaned versions of BTS data suitable for modeling. Some datasets may simplify or omit fields, so feature selection must align with available columns.</span>
- **Volume Available:** <span style="color:blue">Typically 1–5 years of flight records, ranging from hundreds of thousands to several million rows depending on the dataset (~3M rows in this dataset)</span>

*(Add more sources as needed)*

### 5.3 Data Collection Plan

If you need to collect, generate, or augment data, describe your plan:

**Data Collection Methodology:**

<span style="color:blue">This project will use existing public datasets rather than collecting new data. Historical U.S. domestic flight data will be obtained from the U.S. Department of Transportation Bureau of Transportation Statistics (BTS) or BTS-derived datasets available on Kaggle. Data will be downloaded as CSV files and loaded into a Jupyter/Google Colab notebook using Python libraries such as pandas and numpy. The timeline for data collection is minimal, as data access is immediate; most effort will be spent on filtering records, selecting pre-departure features, handling missing values, and creating the delay label.</span>

**Data Augmentation Strategy (if applicable):**

<span style="color:blue">No data augmentation techniques will be used. The dataset is large enough to support model training without synthetic data generation. The focus will be on feature engineering using existing pre-departure information rather than augmenting or artificially expanding the dataset.</span>

### 5.4 Data Validation Checklist

Answer honestly by changing `[ ]` to `[x]` for items that apply:

- [X] I have verified that my primary data source is accessible right now
- [X] I have reviewed the license and confirmed I can use this data for my project
- [X] I have examined sample data and understand its structure
- [X] I have a backup plan if my primary data source becomes unavailable
- [X] My data volume is sufficient for training a meaningful model
- [X] I have considered and documented any data quality issues

**Backup Data Plan:**

<span style="color:blue">If the primary data source from the U.S. Department of Transportation Bureau of Transportation Statistics (BTS) becomes unavailable, I will use previously downloaded local copies or BTS-derived datasets hosted on Kaggle. These datasets mirror the same underlying structure and fields needed for the project and are sufficient to complete the modeling and evaluation. If necessary, the project scope can be limited to a smaller date range or subset of airports to ensure timely completion.</span>

---

## Section 6: Technical Approach

### 6.1 High-Level Architecture

Describe the overall technical architecture of your system:

1. **Input:** <span style="color:blue">[What does your system take as input?]</span>
2. **Processing:** <span style="color:blue">[What happens to the input?]</span>
3. **Model:** <span style="color:blue">[What type of model(s) will you use?]</span>
4. **Output:** <span style="color:blue">[What does your system produce?]</span>
5. **Interface:** <span style="color:blue">[How do users interact with the system?]</span>

**Architecture Diagram:**

```
[Draw or describe your system architecture here]
```

### 6.2 Model Selection

**Primary Model Architecture:**

<span style="color:blue">[Describe the model type you plan to use and why]</span>

**Baseline Model:**

<span style="color:blue">[What simple model will you compare against? Every project needs a baseline.]</span>

**Alternative Models to Explore:**

| Model | Why Consider It | When to Try It |
|-------|-----------------|----------------|
| <span style="color:blue">[Model 1]</span> | <span style="color:blue">[Reason]</span> | <span style="color:blue">[Condition]</span> |
| <span style="color:blue">[Model 2]</span> | <span style="color:blue">[Reason]</span> | <span style="color:blue">[Condition]</span> |

**Example Model Selection (Image Classification):**
- **Primary:** EfficientNet-B0 (good accuracy/efficiency tradeoff, transfer learning available)
- **Baseline:** Logistic regression on image features (establishes minimum performance)
- **Alternatives:** ResNet50 (if EfficientNet underperforms), Vision Transformer (if more data becomes available)

### 6.3 Training Strategy (if applicable)

**Training Approach:** <span style="color:blue">[Select: Train from scratch / Transfer learning / Fine-tuning / Few-shot/Zero-shot / Other]</span>

**Justification:**

<span style="color:blue">[Why this approach?]</span>

**Hyperparameter Tuning Plan:**

<span style="color:blue">[How will you search for optimal hyperparameters? Grid search, random search, Bayesian optimization, etc.]</span>

### 6.4 Technology Stack

Complete the following table with your planned tools. You don't have to use the items listed below, they are just provided to give you some ideas for what to include:

| Component | Technology | Justification |
|-----------|------------|---------------|
| Programming Language | <span style="color:blue">[e.g., Python 3.10]</span> | <span style="color:blue">[Why?]</span> |
| ML Framework | <span style="color:blue">[e.g., PyTorch, TensorFlow]</span> | <span style="color:blue">[Why?]</span> |
| Data Processing | <span style="color:blue">[e.g., Pandas, NumPy]</span> | <span style="color:blue">[Why?]</span> |
| Experiment Tracking | <span style="color:blue">[e.g., MLflow, W&B]</span> | <span style="color:blue">[Why?]</span> |
| Model Registry | <span style="color:blue">[e.g., MLflow]</span> | <span style="color:blue">[Why?]</span> |
| Version Control | <span style="color:blue">[e.g., Git + GitHub]</span> | <span style="color:blue">[Why?]</span> |
| Deployment Platform | <span style="color:blue">[e.g., FastAPI + Docker]</span> | <span style="color:blue">[Why?]</span> |
| Monitoring | <span style="color:blue">[e.g., Prometheus]</span> | <span style="color:blue">[Why?]</span> |
| Testing Framework | <span style="color:blue">[e.g., pytest]</span> | <span style="color:blue">[Why?]</span> |
| Documentation | <span style="color:blue">[e.g., Sphinx]</span> | <span style="color:blue">[Why?]</span> |

---

## Section 7: Evaluation Strategy

### 7.1 Success Metrics

Define **quantitative metrics** for evaluating your project:

| Metric | Target Value | Measurement Method | Priority |
|--------|--------------|-------------------|----------|
| <span style="color:blue">[Metric 1]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Must-have/Should-have/Nice-to-have]</span> |
| <span style="color:blue">[Metric 2]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Priority]</span> |
| <span style="color:blue">[Metric 3]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Priority]</span> |

**Example Metrics:**
| Metric | Target Value | Measurement Method | Priority |
|--------|--------------|-------------------|----------|
| Classification Accuracy | >85% | Test set evaluation | Must-have |
| F1 Score (macro) | >0.80 | Test set evaluation | Must-have |
| Inference Latency | <500ms | End-to-end timing | Should-have |
| Model Size | <100MB | File size measurement | Nice-to-have |

### 7.2 Evaluation Methodology

**Train/Validation/Test Split:**

<span style="color:blue">[Describe your data splitting strategy and rationale. Example: 70% train, 15% validation, 15% test, stratified by class]</span>

**Cross-Validation Plan:**

<span style="color:blue">[Will you use cross-validation? What kind and why?]</span>

**Evaluation Frequency:**

<span style="color:blue">[How often will you evaluate during training? What triggers a checkpoint?]</span>

### 7.3 Baseline Comparisons

You must compare your model against baselines. List your planned comparisons:

| Baseline | Expected Performance | Purpose |
|----------|---------------------|--------|
| Random/Majority Class | <span style="color:blue">[Expected]</span> | Shows model is learning |
| Simple Model (e.g., Logistic Regression) | <span style="color:blue">[Expected]</span> | Justifies complexity |
| Existing Solution (if any) | <span style="color:blue">[Expected]</span> | Shows improvement |
| Human Performance (if measurable) | <span style="color:blue">[Expected]</span> | Establishes ceiling |

### 7.4 Qualitative Evaluation

How will you evaluate aspects that can't be measured numerically?

<span style="color:blue">[Describe your plan for user testing, expert review, or other qualitative evaluation]</span>

---

## Section 8: Deployment Strategy

At a high level, please detail the way in which you plan to deploy your project. You don't need to know the details of how to do this yet, but its important to think about the end state of how you would have users interact with your system once its completed. This can affect many other components of your project including your requirements, system architecture, and timeline. For example:

**Deployment Target:** <span style="color:blue">[REST API / Web Application / Mobile App / Desktop App / Batch Pipeline / Serverless / Other]</span>

**Deployment Platform:**

<span style="color:blue">[Where will you deploy? Local, Managed Cloud Platofrms, Hosted Inference Services, Serverless, High-Performance Inference Serves, Simple App Frameworks (e.g. FastAPI)]</span>

---

## Section 9: Timeline & Milestones

### 9.1 Project Phases

Break your project into phases aligned with the semester schedule:

| Phase | Week(s) | Deliverables | Checkpoint |
|-------|---------|--------------|------------|
| 1: Setup & Data | 1-2 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 2: Baseline & EDA | 3-4 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 3: Model Development | 5-7 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 4: Iteration & Improvement | 8-10 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 5: Deployment & Polish | 11-13 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 6: Documentation & Presentation | 14-15 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |

**Example Phase Breakdown:**

| Phase | Week(s) | Deliverables | Checkpoint |
|-------|---------|--------------|------------|
| 1: Setup & Data | 1-2 | Repo setup, data acquired, EDA notebook | Data Review |
| 2: Baseline & EDA | 3-4 | Baseline model, data pipeline, initial metrics | Baseline Review |
| 3: Model Development | 5-7 | Primary model trained, experiment tracking in place | Midterm Review |
| 4: Iteration & Improvement | 8-10 | Model improvements, hyperparameter tuning complete | Progress Check |
| 5: Deployment & Polish | 11-13 | API deployed, monitoring set up, tests passing | Deployment Review |
| 6: Documentation & Presentation | 14-15 | Final report, presentation, code cleanup | Final Submission |

### 9.2 Weekly Goals

For the first four weeks, define specific goals:

**Week 1:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 2:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 3:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 4:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

### 9.3 Checkpoint Submissions

List what you will submit at each checkpoint that you have identified:

| Checkpoint | Due Date | Deliverables |
|------------|----------|-------------|
| Project Proposal (this document) | <span style="color:blue">[Date]</span> | This completed notebook |
| Data & Baseline Review | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Midterm Progress Report | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Deployment Demo | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Final Submission | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |

---

## Section 10: Risk Assessment

### 10.1 Risk Identification

Identify potential risks and their mitigation strategies:

| Risk | Likelihood (L/M/H) | Impact (L/M/H) | Mitigation Strategy | Contingency Plan |
|------|-------------------|----------------|--------------------|-----------------|
| <span style="color:blue">[Risk 1]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 2]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 3]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 4]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 5]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |

**Common Risks to Consider:**
- Data unavailability or quality issues
- Model underperformance
- Computational resource limitations
- Technical skill gaps
- Time constraints / scope creep
- Dependency/library issues
- Deployment platform issues

**Example Risk Entry:**
| Risk | Likelihood | Impact | Mitigation | Contingency |
|------|------------|--------|------------|-------------|
| Primary dataset becomes unavailable | Low | High | Download and store locally immediately | Use PlantVillage dataset as backup |
| Model accuracy below target | Medium | Medium | Start with proven architecture, iterate | Reduce scope to fewer disease classes |

### 10.2 Assumptions

List assumptions you are making that, if wrong, could affect your project:

1. <span style="color:blue">[Assumption 1]</span>
2. <span style="color:blue">[Assumption 2]</span>
3. <span style="color:blue">[Assumption 3]</span>
4. <span style="color:blue">[Assumption 4]</span>
5. <span style="color:blue">[Assumption 5]</span>

**Example Assumptions:**
1. The public dataset labels are accurate and consistent
2. I will have access to a GPU for training (university cluster or Google Colab)
3. Users will have smartphones with decent cameras
4. 10,000 images will be sufficient for training

### 10.3 Dependencies

List external dependencies that could affect your timeline:

| Dependency | Type | Criticality | Alternative |
|------------|------|-------------|-------------|
| <span style="color:blue">[Dependency 1]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |
| <span style="color:blue">[Dependency 2]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |
| <span style="color:blue">[Dependency 3]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |

**Example Dependencies:**
| Dependency | Type | Criticality | Alternative |
|------------|------|-------------|-------------|
| AWS Free Tier | Platform | Medium | Google Cloud, local deployment |
| Pre-trained EfficientNet weights | Model | High | Train from scratch (longer) |
| Plant pathology expert for validation | Human | Low | Rely on dataset labels only |

---

## Appendix A: Project Complexity Guidelines

Your project should fall within the "Appropriate" range:

| Level | Characteristics | Example |
|-------|-----------------|--------|
| **Too Simple** | Tutorial-level, no novel combination, single notebook | Image classifier with dataset from Kaggle |
| **Appropriate** | Real-world problem, multiple components, deployed | Plant disease detector with API, monitoring, and CI/CD, extensive evals |