# AIM 240: AI/ML Capstone Project Initialization Document

**Student Name:**

**Semester/Year:** Spring 2026

---

## Introduction

Welcome to your Capstone Project! This notebook will guide you through the process of defining, scoping, and planning your individual AI/ML project. Since this course is asynchronous, this document serves as our initial communication tool for understanding your vision and providing targeted feedback.

**Please complete ALL sections thoroughly.** Incomplete submissions will be returned for revision before approval.

### How to Use This Notebook

1. Read through the entire notebook first to understand what's expected
2. Complete Section 1 (Project Ideation) with 2-3 potential project ideas
3. Use Section 2 to systematically evaluate and rank your ideas
4. Complete Sections 3-12 for your **top-ranked project only**
5. Export as HTML or PDF and submit via the course LMS by the posted deadline
6. Await instructor feedback before beginning implementation

---

## Section 1: Project Ideation

### 1.1 Brainstorming Your Project Ideas

Before committing to a single project, explore multiple possibilities. List **2-3 distinct project ideas** that interest you. For each idea, provide a brief description (3-5 sentences) covering:

- What problem does it solve?
- Who would benefit from this solution?
- What type of AI/ML would be involved (classification, generation, detection, etc.)?

#### LLM Projection to Image generator.

**Brief Description:**

<span style="color:blue">[Finetune  or place a head on a model like Qwen Vl 8B  and have it output embedding tokens for an image generator, either my own pixel art diffusion model I have already created or something like stable diffusion. Maybe learn some tricks from Meta's Chameleon model. From discussions with gemini, research examples have done this using a frozen llm, so that sounds appealing. An exciting possibility, if I could get the image output generated fast enough, could be the start of a game-like interface. A related avenue is Experimenting with feed back loops where the model outputs an image then modifies it in a painterly way. Doing this project as a clone of Instruct Pix2pix would also be great. I am referring to a text conditioned diffusionmodel that can be given an input image and text editing instructions and it will edit a portion of the input image. I would be willing to construct and even larger dataset, but perhap with more video game themed edits, or things lke player conroller input, like the d-pad. I would be willing to drop the resolution of the model to something more inline with classic video games and not more than 512x512. lastly exploring preference tuning on pix2pix or image generating llms would be interesting. I thought since companies will have model perform something like RLHF. what if I did an RLHF with one of these LLM image generators. Have a VLM judge output and express preference between image outputs. This would not be on a frozen LLM but there are many finetuning frameworks, and also preference tuning, right? ]</span>

**Primary ML Domain(s):** <span style="color:blue">[ Computer Vision, NLP, Generative AI, Multi-modal]</span>

gpt suggested data sources:
| Dataset                     | Size    | Caption Quality | Best For                    |
| --------------------------- | ------- | --------------- | --------------------------- |
| **MS-COCO**                 | 330k    | ⭐⭐⭐⭐⭐ (human)   | Grounded objects, actions   |
| **Flickr30k**               | 31k     | ⭐⭐⭐⭐⭐           | Fast prototyping            |
| **Conceptual Captions 3M**  | 3M      | ⭐⭐⭐⭐            | Medium-scale training       |
| **Conceptual Captions 12M** | 12M     | ⭐⭐⭐             | Larger noisy training       |
| **LAION-400M / 5B**         | 400M–5B | ⭐⭐              | Full-scale diffusion models |
| **WIT (Wikipedia)**         | 37M     | ⭐⭐⭐⭐            | Factual / entity grounding  |
| **WikiArt**                 | 80k     | ⭐⭐⭐ (labels)    | Artistic style learning     |
| **Synthetic (I am comfortable with the concept of data augmentation)**  | Any     | ⭐⭐⭐⭐⭐           | Controllability, structure  |




Project Idea B
Working Title: [GODOT or Pygame specialized dataset- Vetted code]

Brief Description:

[Create a dataset of Godot Code using local models and commercial models in a mixture of code vetting. Would benefit learing from a curriculum which I would have to establish. I might be able to use for example the alpaca clean dataset and a english to french dataset to spice up the generation, and ask it to make a coding example based on say every method available for each node in the Godot Game engine. I could pick particular design patterns from the gang of four book, and implement those systems in Godot. Perhaps use a Rag system to retrieve other working samples so it can perform trial and error, but then maybe in the future when it uses that function it will see a proper example and use that instead. Here we consider a training lookp where the current llm generating the data only benefits from prior experience in the form of  a rag interface, presenting relevent examples. Eventually the whole corpus could then be used to fine tune an llm. ehh... but maybe it should be alway a RAG system. Shrugs. This is what benchmarks are for , right? While commercial model know some GD script , local ones certainingly need some help. It should be possible to make them do better because I feel GDscript is under represented but I can't really prove this yet.]

Primary ML Domain(s): [ NLP,  Other]

#### Project Idea C (Optional but Recommended)

**Working Title:** <span style="color:blue">[Synth speaking and/or Singing Moshi Full Dulpex Voice Assistant]</span>

**Brief Description:**

<span style="color:blue">[It could help people wanting to interact with LLMs in a different way, and because of the duplex thing, it might be able to interact with music in a harmonic and perhaps even rhythmically aware way. Explore using embellishment technique markup to denote special vocal articulations. something like this list:
https://imgv2-2-f.scribdassets.com/img/document/380229964/original/7aeedd745b/1588863549?v=1
 Explore audio generation which was removed from ChatGpt's advanced voice mode.]</span>

**Primary ML Domain(s):** <span style="color:blue">[Select all that apply: Computer Vision, NLP, Tabular/Structured Data, Time Series/Forecasting, Recommender Systems, Reinforcement Learning, Generative AI, Multi-modal, Other]\</span>

### additional notes:

Make a dataset to finetune the Moshi model, get it working on my local machine, or serve it up online. Make a dataset that include pitch and scale annotations.

Since it models and response to a user audio stream in a duplex manner, I have hopes maybe it could recognize a scale and try to play something back.

Make samples of monophinic synths moving up and down scales, perhaps using different timbres. In order to finetune I would need to use rented compute, which would be a good lesson.

The dataset I found was not terribly large. I could perhaps perform data augmentation on these samples by transposing the samples and adjustin g the labels accordingly. I could maybe find differnt datasets, but multiple singers might present an issue.

https://zenodo.org/records/1442513?utm_source=chatgpt.com

---

## Section 2: Project Ranking & Selection

### 2.1 Evaluation Criteria Matrix

Rate each of your project ideas on a scale of **1-5** for each criterion below. Be honest in your assessment - this exercise helps you avoid projects that may become problematic later.

**Rating Scale:**
- 1 = Very Poor / Major Concerns
- 2 = Below Average / Some Concerns
- 3 = Adequate / Neutral
- 4 = Good / Minor Concerns
- 5 = Excellent / No Concerns

In [None]:
# ============================================================
# PROJECT RANKING CALCULATOR
# Fill in your scores below (1-5 for each criterion)
# ============================================================

import pandas as pd

# Define the evaluation criteria
criteria = [
    "Personal Interest",      # How excited are you about this project?
    "Data Availability",      # Is appropriate data publicly available?
    "Technical Feasibility",  # Can this be accomplished with your current skills + learning?
    "Scope Appropriateness",  # Is this achievable in one semester?
    "Novelty/Learning Value", # Will you learn new skills beyond what you know?
    "Portfolio Value",        # Will this impress future employers?
    "Ethical Clarity",        # Are there minimal ethical concerns?
    "MLOps Applicability"     # Can you apply MLOps concepts?
]

# ============================================================
# ENTER YOUR SCORES HERE (1-5 for each)
# ============================================================

# Project Idea A scores
idea_a_scores = {
    "Personal Interest": 0,       # <-- Enter score 1-5
    "Data Availability": 0,       # <-- Enter score 1-5
    "Technical Feasibility": 0,   # <-- Enter score 1-5
    "Scope Appropriateness": 0,   # <-- Enter score 1-5
    "Novelty/Learning Value": 0,  # <-- Enter score 1-5
    "Portfolio Value": 0,         # <-- Enter score 1-5
    "Ethical Clarity": 0,         # <-- Enter score 1-5
    "MLOps Applicability": 0      # <-- Enter score 1-5
}

# Project Idea B scores
idea_b_scores = {
    "Personal Interest": 0,       # <-- Enter score 1-5
    "Data Availability": 0,       # <-- Enter score 1-5
    "Technical Feasibility": 0,   # <-- Enter score 1-5
    "Scope Appropriateness": 0,   # <-- Enter score 1-5
    "Novelty/Learning Value": 0,  # <-- Enter score 1-5
    "Portfolio Value": 0,         # <-- Enter score 1-5
    "Ethical Clarity": 0,         # <-- Enter score 1-5
    "MLOps Applicability": 0      # <-- Enter score 1-5
}

# Project Idea C scores (set to 0 if not using)
idea_c_scores = {
    "Personal Interest": 0,       # <-- Enter score 1-5 (or leave as 0)
    "Data Availability": 0,       # <-- Enter score 1-5 (or leave as 0)
    "Technical Feasibility": 0,   # <-- Enter score 1-5 (or leave as 0)
    "Scope Appropriateness": 0,   # <-- Enter score 1-5 (or leave as 0)
    "Novelty/Learning Value": 0,  # <-- Enter score 1-5 (or leave as 0)
    "Portfolio Value": 0,         # <-- Enter score 1-5 (or leave as 0)
    "Ethical Clarity": 0,         # <-- Enter score 1-5 (or leave as 0)
    "MLOps Applicability": 0      # <-- Enter score 1-5 (or leave as 0)
}

df_scores = pd.DataFrame({
    "Criterion": criteria,
    "Idea A": [idea_a_scores[c] for c in criteria],
    "Idea B": [idea_b_scores[c] for c in criteria],
    "Idea C": [idea_c_scores[c] for c in criteria]
})

totals = pd.DataFrame({
    "Criterion": ["TOTAL SCORE"],
    "Idea A": [sum(idea_a_scores.values())],
    "Idea B": [sum(idea_b_scores.values())],
    "Idea C": [sum(idea_c_scores.values())]
})

df_display = pd.concat([df_scores, totals], ignore_index=True)

print("=" * 60)
print("PROJECT EVALUATION MATRIX")
print("=" * 60)
print(df_display.to_string(index=False))
print("\n" + "=" * 60)
print(f"Maximum possible score: 40")
print("=" * 60)

### 2.2 Final Ranking & Justification

**Your Final Ranking:**

1. **First Choice:** <span style="color:blue">[Enter project title]</span>
2. **Second Choice:** <span style="color:blue">[Enter project title]</span>
3. **Third Choice:** <span style="color:blue">[Enter project title or N/A]</span>

**Justification for Your Top Choice (3-5 sentences):**

<span style="color:blue">[Explain why your first choice is the best fit for you. Consider not just the scores, but any qualitative factors that influenced your decision.]</span>

**What would need to change for you to switch to your second choice?**

<span style="color:blue">[Your response here - this helps identify risk factors]</span>

---

## Section 3: Problem Statement

*Complete this section and all following sections for your TOP-RANKED PROJECT ONLY.*

### 3.1 Problem Definition

Write a clear, concise problem statement below.

#### Example Problem Statements:

**Good Example:**
> Small-scale farmers in developing regions currently struggle with identifying crop diseases early because they lack access to agricultural experts. This results in significant crop losses (estimated 20-40% annually) and reduced income. An AI/ML solution could enable early disease detection and treatment recommendations by analyzing smartphone photos of affected plants.

**Poor Example:**
> I want to make a plant disease detector because it would be cool and use computer vision.

*The poor example lacks specificity about users, problems, consequences, and approach.*

---

**Your Problem Statement:**

<span style="color:blue">[Write your problem statement here using the template above]</span>

### 3.2 Existing Solutions

Research and document current solutions to this problem:

| Solution | Type | Strengths | Weaknesses | Why Yours Will Be Different |
|----------|------|-----------|------------|-----------------------------|
| <span style="color:blue">[Solution 1]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Strengths]</span> | <span style="color:blue">[Weaknesses]</span> | <span style="color:blue">[Differentiation]</span> |
| <span style="color:blue">[Solution 2]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Strengths]</span> | <span style="color:blue">[Weaknesses]</span> | <span style="color:blue">[Differentiation]</span> |
| <span style="color:blue">[Solution 3]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Strengths]</span> | <span style="color:blue">[Weaknesses]</span> | <span style="color:blue">[Differentiation]</span> |

**If no existing solutions exist, explain why:**

<span style="color:blue">[Your explanation]</span>

---

## Section 4: Project Scope Definition

### 4.1 Core Features (Must-Have)

List the **essential features** your project must have to be considered minimally viable. These are non-negotiable deliverables. Aim for 3-5 core features.

| # | Feature | Description | Success Looks Like |
|---|---------|-------------|-------------------|
| 1 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Success criteria]</span> |
| 2 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Success criteria]</span> |
| 3 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Success criteria]</span> |
| 4 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Success criteria]</span> |
| 5 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Success criteria]</span> |

**Example Core Features (Plant Disease Detection):**

| # | Feature | Description | Success Looks Like |
|---|---------|-------------|-------------------|
| 1 | Image Classification | Model classifies plant images into healthy/diseased categories | >85% accuracy on test set |
| 2 | Multi-disease Detection | System identifies at least 5 common diseases per crop type | Correct disease identification in 80% of cases |
| 3 | Web Interface | Users can upload images via simple web form | Interface loads in <3 seconds, works on mobile |

### 4.2 Extended Features (Nice-to-Have)

List features you would add if time permits, in priority order:

| Priority | Feature | Description | Dependencies |
|----------|---------|-------------|--------------|
| 1 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[What must be done first]</span> |
| 2 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[What must be done first]</span> |
| 3 | <span style="color:blue">[Feature]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[What must be done first]</span> |

### 4.3 Out of Scope

Explicitly state what your project will NOT include. This prevents scope creep and sets clear expectations.

**This project will NOT include:**
- <span style="color:blue">[Item 1]</span>
- <span style="color:blue">[Item 2]</span>
- <span style="color:blue">[Item 3]</span>
- <span style="color:blue">[Item 4]</span>

**Example Out of Scope Items:**
- Real-time video processing (images only)
- Integration with farm management software
- Support for languages other than English
- IoT sensor integration

### 4.4 Scope Validation Checklist

Answer honestly by changing `[ ]` to `[x]` for items that apply:

- [ ] Can I complete the core features in 10-12 weeks of part-time work?
- [ ] Have I built something of similar complexity before, OR have I allocated time to learn?
- [ ] If I removed all extended features, would I still have a meaningful project?
- [ ] Is this project substantially different from a homework assignment or tutorial?
- [ ] Does this project require me to stretch my skills without being overwhelming?

**If you checked fewer than 4 boxes, reconsider your scope.**

---

## Section 5: Data Strategy

### 5.1 Data Requirements

**What data do you need?**

| Data Type | Description | Volume Needed | Format |
|-----------|-------------|---------------|--------|
| <span style="color:blue">[Type]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Volume]</span> | <span style="color:blue">[Format]</span> |
| <span style="color:blue">[Type]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Volume]</span> | <span style="color:blue">[Format]</span> |
| <span style="color:blue">[Type]</span> | <span style="color:blue">[Description]</span> | <span style="color:blue">[Volume]</span> | <span style="color:blue">[Format]</span> |

**Example:**
| Data Type | Description | Volume Needed | Format |
|-----------|-------------|---------------|--------|
| Plant Images | Photos of healthy and diseased plants | 10,000+ images | JPEG/PNG |
| Disease Labels | Classification of disease type | Labels for all images | CSV/JSON |
| Treatment Info | Recommended treatments per disease | 1 entry per disease type | Text/JSON |

### 5.2 Data Sources

For each data source, complete the following:

#### Data Source 1

- **Name/URL:** <span style="color:blue">[Enter name and URL]</span>
- **Type:** <span style="color:blue">[Public Dataset / API / Web Scraping / Self-collected / Synthetic / Other]</span>
- **Access Method:** <span style="color:blue">[How will you obtain this data?]</span>
- **License/Terms:** <span style="color:blue">[What are the usage restrictions?]</span>
- **Quality Assessment:** <span style="color:blue">[What is the quality like? Any known issues?]</span>
- **Volume Available:** <span style="color:blue">[How much data is available?]</span>

#### Data Source 2

- **Name/URL:** <span style="color:blue">[Enter name and URL]</span>
- **Type:** <span style="color:blue">[Public Dataset / API / Web Scraping / Self-collected / Synthetic / Other]</span>
- **Access Method:** <span style="color:blue">[How will you obtain this data?]</span>
- **License/Terms:** <span style="color:blue">[What are the usage restrictions?]</span>
- **Quality Assessment:** <span style="color:blue">[What is the quality like? Any known issues?]</span>
- **Volume Available:** <span style="color:blue">[How much data is available?]</span>

*(Add more sources as needed)*

### 5.3 Data Collection Plan

If you need to collect, generate, or augment data, describe your plan:

**Data Collection Methodology:**

<span style="color:blue">[Describe your data collection methodology, timeline, and any tools you'll use]</span>

**Data Augmentation Strategy (if applicable):**

<span style="color:blue">[Describe augmentation techniques you'll use: rotation, flipping, synthetic generation, etc.]</span>

### 5.4 Data Validation Checklist

Answer honestly by changing `[ ]` to `[x]` for items that apply:

- [ ] I have verified that my primary data source is accessible right now
- [ ] I have reviewed the license and confirmed I can use this data for my project
- [ ] I have examined sample data and understand its structure
- [ ] I have a backup plan if my primary data source becomes unavailable
- [ ] My data volume is sufficient for training a meaningful model
- [ ] I have considered and documented any data quality issues

**Backup Data Plan:**

<span style="color:blue">[What will you do if your primary data source fails?]</span>

---

## Section 6: Technical Approach

### 6.1 High-Level Architecture

Describe the overall technical architecture of your system:

1. **Input:** <span style="color:blue">[What does your system take as input?]</span>
2. **Processing:** <span style="color:blue">[What happens to the input?]</span>
3. **Model:** <span style="color:blue">[What type of model(s) will you use?]</span>
4. **Output:** <span style="color:blue">[What does your system produce?]</span>
5. **Interface:** <span style="color:blue">[How do users interact with the system?]</span>

**Architecture Diagram:**

```
[Draw or describe your system architecture here]
```

### 6.2 Model Selection

**Primary Model Architecture:**

<span style="color:blue">[Describe the model type you plan to use and why]</span>

**Baseline Model:**

<span style="color:blue">[What simple model will you compare against? Every project needs a baseline.]</span>

**Alternative Models to Explore:**

| Model | Why Consider It | When to Try It |
|-------|-----------------|----------------|
| <span style="color:blue">[Model 1]</span> | <span style="color:blue">[Reason]</span> | <span style="color:blue">[Condition]</span> |
| <span style="color:blue">[Model 2]</span> | <span style="color:blue">[Reason]</span> | <span style="color:blue">[Condition]</span> |

**Example Model Selection (Image Classification):**
- **Primary:** EfficientNet-B0 (good accuracy/efficiency tradeoff, transfer learning available)
- **Baseline:** Logistic regression on image features (establishes minimum performance)
- **Alternatives:** ResNet50 (if EfficientNet underperforms), Vision Transformer (if more data becomes available)

### 6.3 Training Strategy (if applicable)

**Training Approach:** <span style="color:blue">[Select: Train from scratch / Transfer learning / Fine-tuning / Few-shot/Zero-shot / Other]</span>

**Justification:**

<span style="color:blue">[Why this approach?]</span>

**Hyperparameter Tuning Plan:**

<span style="color:blue">[How will you search for optimal hyperparameters? Grid search, random search, Bayesian optimization, etc.]</span>

### 6.4 Technology Stack

Complete the following table with your planned tools. You don't have to use the items listed below, they are just provided to give you some ideas for what to include:

| Component | Technology | Justification |
|-----------|------------|---------------|
| Programming Language | <span style="color:blue">[e.g., Python 3.10]</span> | <span style="color:blue">[Why?]</span> |
| ML Framework | <span style="color:blue">[e.g., PyTorch, TensorFlow]</span> | <span style="color:blue">[Why?]</span> |
| Data Processing | <span style="color:blue">[e.g., Pandas, NumPy]</span> | <span style="color:blue">[Why?]</span> |
| Experiment Tracking | <span style="color:blue">[e.g., MLflow, W&B]</span> | <span style="color:blue">[Why?]</span> |
| Model Registry | <span style="color:blue">[e.g., MLflow]</span> | <span style="color:blue">[Why?]</span> |
| Version Control | <span style="color:blue">[e.g., Git + GitHub]</span> | <span style="color:blue">[Why?]</span> |
| Deployment Platform | <span style="color:blue">[e.g., FastAPI + Docker]</span> | <span style="color:blue">[Why?]</span> |
| Monitoring | <span style="color:blue">[e.g., Prometheus]</span> | <span style="color:blue">[Why?]</span> |
| Testing Framework | <span style="color:blue">[e.g., pytest]</span> | <span style="color:blue">[Why?]</span> |
| Documentation | <span style="color:blue">[e.g., Sphinx]</span> | <span style="color:blue">[Why?]</span> |

---

## Section 7: Evaluation Strategy

### 7.1 Success Metrics

Define **quantitative metrics** for evaluating your project:

| Metric | Target Value | Measurement Method | Priority |
|--------|--------------|-------------------|----------|
| <span style="color:blue">[Metric 1]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Must-have/Should-have/Nice-to-have]</span> |
| <span style="color:blue">[Metric 2]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Priority]</span> |
| <span style="color:blue">[Metric 3]</span> | <span style="color:blue">[Target]</span> | <span style="color:blue">[Method]</span> | <span style="color:blue">[Priority]</span> |

**Example Metrics:**
| Metric | Target Value | Measurement Method | Priority |
|--------|--------------|-------------------|----------|
| Classification Accuracy | >85% | Test set evaluation | Must-have |
| F1 Score (macro) | >0.80 | Test set evaluation | Must-have |
| Inference Latency | <500ms | End-to-end timing | Should-have |
| Model Size | <100MB | File size measurement | Nice-to-have |

### 7.2 Evaluation Methodology

**Train/Validation/Test Split:**

<span style="color:blue">[Describe your data splitting strategy and rationale. Example: 70% train, 15% validation, 15% test, stratified by class]</span>

**Cross-Validation Plan:**

<span style="color:blue">[Will you use cross-validation? What kind and why?]</span>

**Evaluation Frequency:**

<span style="color:blue">[How often will you evaluate during training? What triggers a checkpoint?]</span>

### 7.3 Baseline Comparisons

You must compare your model against baselines. List your planned comparisons:

| Baseline | Expected Performance | Purpose |
|----------|---------------------|--------|
| Random/Majority Class | <span style="color:blue">[Expected]</span> | Shows model is learning |
| Simple Model (e.g., Logistic Regression) | <span style="color:blue">[Expected]</span> | Justifies complexity |
| Existing Solution (if any) | <span style="color:blue">[Expected]</span> | Shows improvement |
| Human Performance (if measurable) | <span style="color:blue">[Expected]</span> | Establishes ceiling |

### 7.4 Qualitative Evaluation

How will you evaluate aspects that can't be measured numerically?

<span style="color:blue">[Describe your plan for user testing, expert review, or other qualitative evaluation]</span>

---

## Section 8: Deployment Strategy

At a high level, please detail the way in which you plan to deploy your project. You don't need to know the details of how to do this yet, but its important to think about the end state of how you would have users interact with your system once its completed. This can affect many other components of your project including your requirements, system architecture, and timeline. For example:

**Deployment Target:** <span style="color:blue">[REST API / Web Application / Mobile App / Desktop App / Batch Pipeline / Serverless / Other]</span>

**Deployment Platform:**

<span style="color:blue">[Where will you deploy? Local, Managed Cloud Platofrms, Hosted Inference Services, Serverless, High-Performance Inference Serves, Simple App Frameworks (e.g. FastAPI)]</span>

---

## Section 9: Timeline & Milestones

### 9.1 Project Phases

Break your project into phases aligned with the semester schedule:

| Phase | Week(s) | Deliverables | Checkpoint |
|-------|---------|--------------|------------|
| 1: Setup & Data | 1-2 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 2: Baseline & EDA | 3-4 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 3: Model Development | 5-7 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 4: Iteration & Improvement | 8-10 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 5: Deployment & Polish | 11-13 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |
| 6: Documentation & Presentation | 14-15 | <span style="color:blue">[Deliverables]</span> | <span style="color:blue">[Checkpoint]</span> |

**Example Phase Breakdown:**

| Phase | Week(s) | Deliverables | Checkpoint |
|-------|---------|--------------|------------|
| 1: Setup & Data | 1-2 | Repo setup, data acquired, EDA notebook | Data Review |
| 2: Baseline & EDA | 3-4 | Baseline model, data pipeline, initial metrics | Baseline Review |
| 3: Model Development | 5-7 | Primary model trained, experiment tracking in place | Midterm Review |
| 4: Iteration & Improvement | 8-10 | Model improvements, hyperparameter tuning complete | Progress Check |
| 5: Deployment & Polish | 11-13 | API deployed, monitoring set up, tests passing | Deployment Review |
| 6: Documentation & Presentation | 14-15 | Final report, presentation, code cleanup | Final Submission |

### 9.2 Weekly Goals

For the first four weeks, define specific goals:

**Week 1:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 2:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 3:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

**Week 4:**
- [ ] <span style="color:blue">[Goal 1]</span>
- [ ] <span style="color:blue">[Goal 2]</span>
- [ ] <span style="color:blue">[Goal 3]</span>

### 9.3 Checkpoint Submissions

List what you will submit at each checkpoint that you have identified:

| Checkpoint | Due Date | Deliverables |
|------------|----------|-------------|
| Project Proposal (this document) | <span style="color:blue">[Date]</span> | This completed notebook |
| Data & Baseline Review | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Midterm Progress Report | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Deployment Demo | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |
| Final Submission | <span style="color:blue">[Date]</span> | <span style="color:blue">[Deliverables]</span> |

---

## Section 10: Risk Assessment

### 10.1 Risk Identification

Identify potential risks and their mitigation strategies:

| Risk | Likelihood (L/M/H) | Impact (L/M/H) | Mitigation Strategy | Contingency Plan |
|------|-------------------|----------------|--------------------|-----------------|
| <span style="color:blue">[Risk 1]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 2]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 3]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 4]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |
| <span style="color:blue">[Risk 5]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[L/M/H]</span> | <span style="color:blue">[Mitigation]</span> | <span style="color:blue">[Contingency]</span> |

**Common Risks to Consider:**
- Data unavailability or quality issues
- Model underperformance
- Computational resource limitations
- Technical skill gaps
- Time constraints / scope creep
- Dependency/library issues
- Deployment platform issues

**Example Risk Entry:**
| Risk | Likelihood | Impact | Mitigation | Contingency |
|------|------------|--------|------------|-------------|
| Primary dataset becomes unavailable | Low | High | Download and store locally immediately | Use PlantVillage dataset as backup |
| Model accuracy below target | Medium | Medium | Start with proven architecture, iterate | Reduce scope to fewer disease classes |

### 10.2 Assumptions

List assumptions you are making that, if wrong, could affect your project:

1. <span style="color:blue">[Assumption 1]</span>
2. <span style="color:blue">[Assumption 2]</span>
3. <span style="color:blue">[Assumption 3]</span>
4. <span style="color:blue">[Assumption 4]</span>
5. <span style="color:blue">[Assumption 5]</span>

**Example Assumptions:**
1. The public dataset labels are accurate and consistent
2. I will have access to a GPU for training (university cluster or Google Colab)
3. Users will have smartphones with decent cameras
4. 10,000 images will be sufficient for training

### 10.3 Dependencies

List external dependencies that could affect your timeline:

| Dependency | Type | Criticality | Alternative |
|------------|------|-------------|-------------|
| <span style="color:blue">[Dependency 1]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |
| <span style="color:blue">[Dependency 2]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |
| <span style="color:blue">[Dependency 3]</span> | <span style="color:blue">[Type]</span> | <span style="color:blue">[Low/Medium/High]</span> | <span style="color:blue">[Alternative]</span> |

**Example Dependencies:**
| Dependency | Type | Criticality | Alternative |
|------------|------|-------------|-------------|
| AWS Free Tier | Platform | Medium | Google Cloud, local deployment |
| Pre-trained EfficientNet weights | Model | High | Train from scratch (longer) |
| Plant pathology expert for validation | Human | Low | Rely on dataset labels only |

---

## Appendix A: Project Complexity Guidelines

Your project should fall within the "Appropriate" range:

| Level | Characteristics | Example |
|-------|-----------------|--------|
| **Too Simple** | Tutorial-level, no novel combination, single notebook | Image classifier with dataset from Kaggle |
| **Appropriate** | Real-world problem, multiple components, deployed | Plant disease detector with API, monitoring, and CI/CD, extensive evals |