# Capstone Project Details and Deliverables

CSE 4/573 is the [Capstone Course](https://engineering.buffalo.edu/computer-science-engineering/graduate/degrees-and-programs/ms-in-computer-science-and-engineering/ms-tracks-and-specializations.html) at UB. The Capstone Project in this course brings concepts from across the degree into a single, semester-long group project.

## Summary of deliverables and important deadlines:

1. Project Milestone 1 (Problem Statement Definition, Datasets and Evaluation Metrics Identification).
    - `1 PDF file submission.`
    - Deadline: July 17, 11:59 PM Eastern on UBLearns.
2. Project Milestone 2 (Benchmarking Baselines, Results comparison with a preliminary approach).
   - `1 PDF file submission with tables for results.` 
   - Deadline: July 31, 11:59 PM Eastern on UBLearns.
3. Project Presentations.
   - `1 Presentation file.` 
   - In-class presentations.
   - *Students volunteer if they want to be in Batch 1 or Batch 2.*
   - Deadline: Batch 1: August 12, Batch 2: August 14, 11:59 PM Eastern.
4. Final Project Submission 
   - `.zip file submission containing: 1 PDF report, 1 Presentation file, Entire codebase.` 
   - Deadline: August 14, 11:59 PM Eastern on UBLearns.

Please check [Dates and Deliverables page](dates-deadlines.md) for more details about the entire course.

```{admonition} Summer 2025 Capstone Course brings a lot of flexibility to students to encourage novel ideas
:class: tip

Dear Students,

As a PhD researcher in Computer Vision, my primary goal for this capstone project is to **encourage creativity, hands-on exploration, and original thinking**.

Instead of assigning a fixed template project, this course gives you the freedom to choose a project that matches your skill level and personal learning goals, whether it's your first exposure to computer vision or you're preparing for graduate-level research.

To help you get started, I’ve grouped projects into **three categories** so that expectations and deliverables are fair and aligned with your background.
```

## Step 1: Identify your category, to balance expectations and deliverables

Students in this course come from different academic and technical backgrounds. To ensure that everyone can work on a meaningful and achievable project, please identify the category that best matches your current experience with Computer Vision.

1. <span style="background-color:#28a745;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> - **You are new to Computer Vision.**  
   This is for students who are just getting started with image processing or machine learning. You may be an undergraduate or a graduate student taking your first CV course. The goal is to build simple, working applications and gain hands-on exposure to core concepts.

2. <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> - **You have some prior experience with CV and want to go deeper.**  
   You may have worked on a few ML or CV projects and are comfortable using tools like OpenCV, PyTorch, or TensorFlow. The focus is on building moderately complex applications and developing skills for future internships or job roles.

3. <span style="background-color:red;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Advanced</span> - **You are preparing for research or a PhD in Computer Vision.**  
   You have significant experience with deep learning, academic reading, and model development. Your project should explore a challenging topic with clear benchmarks and may involve experimenting with recent research papers or advanced models.

Each category has different expectations in terms of project complexity, evaluation, and final deliverables. *You are free to choose any project from the list.*

## Step 2: Choose a project and form your group

Once you’ve identified your category, the next step is to explore the list of curated project ideas. These span across classical vision tasks, deep learning, and generative AI, and are marked by level for easy filtering.

You may work individually or in a group of up to 3 students. Every group must submit a short proposal, choose a project scope appropriate to the category of its members, and begin building.

📌 If you are proposing your own idea, make sure your proposal includes:
- A clear real-world motivation
- A well-defined problem statement
- Baselines or related work for comparison
- Planned methodology and evaluation metrics

### Core Computer Vision + Software Development

- *Please keep in mind that the focus is more on functionality rather than aesthetics or frontend.*
- *Technologies listed in each project are suggestions only. Students are allowed to choose any frontend/backend frameworks they feel appropriate and comfortable.*

```{admonition} <span style="background-color:#28a745;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> Web Application for Basic Computer Vision Tasks
:class: tip, dropdown, closed

This project is about building a simple web application that helps users understand and apply basic image processing techniques. Users will be able to upload images and see how different image processing operations work, such as converting images to grayscale, detecting edges, and finding key points in the image. The project will also include simple tools for rectifying a pair of images taken from two slightly different angles. *Students can extend this project to perform 2D to 3D reconstruction using two images if they are interested.*

- **Goals**
     - Help users understand basic image processing tasks using a visual interface.
     - Make it easy to upload and process images in the browser.
     - Allow simple camera calibration using chessboard patterns.
     - Allow image rectification using feature matching between two uploaded images.

   - **Features**
     - **Image Upload**
       - User can upload one or two images using the browser.
     - **Basic Image Processing**
       - Convert to grayscale.
       - Apply blur to reduce noise.
       - Detect edges using the Canny edge detector.
       - Adjust image brightness and contrast.
     - **Feature Detection**
       - Detect keypoints in an image using ORB (a simple and fast feature detector).
       - Match keypoints between two images and draw lines to show matches.
     - **Camera Calibration (Simple Version)**
       - User can upload multiple chessboard images.
       - The app detects corners and computes the camera calibration matrix.
       - Shows the estimated camera matrix and visualizes corner detection.
     - **Image Rectification**
       - Given a pair of stereo images, match features and align them.
       - Display the aligned (rectified) version of the stereo images side by side.

   - **Technologies** (These are just suggestions, students can use any tool or web technology appropriately)
     - **Frontend**: HTML, CSS, or any JavaScript library
     - **Backend**: Python
     - **Computer Vision**: OpenCV
     - **Optional**: Docker for containerized deployment

   - **Deliverables**
     - A working web application that runs locally or on a server.
     - Frontend interface to upload images and view results.
     - Python modules:
       - `processing.py` for grayscale, blur, edge detection, etc.
       - `features.py` for feature detection and matching.
       - `calibration.py` for camera calibration functions.
       - `rectify.py` for stereo rectification logic.
     - Clear documentation:
       - Setup instructions in `README.md`
       - Sample test images
       - Steps to use each feature
     - A sample test suite with example inputs and outputs.

   - **Learning Outcomes**
     - Learn how to use OpenCV for basic vision tasks.
     - Understand how camera calibration and stereo rectification work.
     - Practice building a simple full-stack application with Python and web technologies.
     - Gain experience in structuring modular, testable code.
```

```{admonition} <span style="background-color:#28a745;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> Web-Based Lite Image Editing Tool (Lightroom Clone)
:class: tip, dropdown, closed

This project involves building a simplified version of an image editing application like Adobe Lightroom. The tool will run in the browser and allow users to upload and apply various non-destructive image enhancements. The focus is on basic editing functionality such as exposure, contrast, saturation adjustments, cropping, and applying simple filters. The goal is to help users understand how basic image enhancement and transformation operations work using computer vision techniques.

- **Goals**
  - Provide a visual interface for uploading and editing images in the browser.
  - Allow non-destructive edits with real-time preview and adjustable sliders.
  - Enable basic image transformations such as rotation and cropping.
  - Help users understand how filters and adjustments are applied at the pixel level.

- **Features**
  - **Image Upload**
    - Upload a single image through a simple browser interface.
    - Support common image formats such as JPG, PNG, and BMP.

  - **Adjustment Sliders**
    - Exposure/Brightness adjustment using scalar multiplication.
    - Contrast adjustment using histogram stretching or linear transforms.
    - Saturation control for color enhancement.
    - Sharpness using kernel filters (e.g., Laplacian).
    - Temperature (warm/cool) simulation using color balance.

  - **Cropping and Rotation**
    - Basic crop tool using a drag rectangle interface.
    - Rotate the image left, right, or by custom angles.

  - **Preset Filters**
    - Grayscale
    - Sepia
    - Vintage
    - High contrast black and white

  - **Preview and Export**
    - Show live preview of all edits in a canvas.
    - Allow users to reset to the original.
    - Export the edited image as a downloadable file.

- **Technologies** (These are just suggestions, students can use any tool or web technology appropriately)
  - **Frontend**: HTML, CSS, or any JavaScript library
  - **Backend** (optional): Flask (only if image is processed server-side)
  - **Image Processing**: OpenCV (if backend), or pure JS (if frontend-only)
  - **Optional**: Use WebAssembly (WASM) version of OpenCV for in-browser acceleration

- **Deliverables**
  - Fully functional web application with a clean UI.
  - Image editor with sliders and toggle buttons for all basic enhancements.
  - Optional backend support for saving state or processing larger images.
  - Clear modular JavaScript or Python code.
  - README documentation with setup and usage instructions.
  - A test folder with 3 to 5 sample images to demonstrate functionality.

- **Learning Outcomes**
  - Understand basic image enhancement techniques such as brightness and contrast control.
  - Learn to work with pixel values directly using JavaScript or OpenCV.
  - Learn how to create interactive UI elements like sliders and preview areas.
  - Gain confidence in building client-facing image applications from scratch.
```

```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Web-Based Object Detection Annotation System
:class: none, dropdown, closed

This project is about building a simple web-based annotation tool that allows users to manually annotate bounding boxes for object detection tasks. Users will upload a `.zip` file containing a list of images, and the web interface will allow them to draw bounding boxes, assign class labels, and export the annotations in a standard object detection format. The focus is on creating a clean, easy-to-use tool that can support datasets for training object detection models.

- **Goals**
  - Provide a visual interface for uploading and annotating a batch of images.
  - Allow users to draw bounding boxes and assign class labels interactively.
  - Export annotations in a format compatible with object detection models (Pascal VOC `.xml` or COCO-style `.json`).
  - Keep the tool lightweight, browser-accessible, and beginner-friendly.

- **Features**
  - **Image Upload**
    - Accept a `.zip` file containing images.
    - Automatically extract and display image thumbnails for annotation.
    - Support common formats like `.jpg`, `.jpeg`, `.png`.

  - **Annotation Interface**
    - Display one image at a time for annotation.
    - Allow users to:
      - Draw multiple bounding boxes.
      - Assign class labels to each box.
      - Edit or delete boxes before saving.
    - Auto-save progress as the user annotates.

  - **Class Label Input**
    - Textbox or dropdown to enter/select class label for each box.
    - Option to preload a list of class labels from a `.txt` file.

  - **Navigation**
    - Buttons to go to next, previous, or specific image.
    - Visual indicator showing annotation progress across all images.

  - **Export Functionality**
    - Export annotations in **Pascal VOC** XML format (one `.xml` per image).
    - Download annotations as a `.zip` of XML files or a consolidated folder.
    - Optionally support COCO JSON export in future versions.

- **Technologies** (These are just suggestions, students can use any tool or web technology appropriately)
  - **Frontend**: HTML, CSS, or any JavaScript library
  - **Backend**: Python with Flask or FastAPI (for file upload, extraction, and export packaging)
  - **Canvas Drawing**: Use HTML5 `<canvas>` API or libraries like `fabric.js` for drawing boxes
  - **File Handling**: `zipfile`, `os`, and `shutil` for zip extraction and annotation export

- **Deliverables**
  - A fully working web application for local use or deployment on a server.
  - Core modules:
    - `upload_handler.py`: to handle zip upload and image extraction.
    - `annotator.js`: handles drawing logic and frontend interactivity.
    - `exporter.py`: converts annotation data into Pascal VOC XML files.
  - Sample `.zip` of test images and `.txt` of class labels.
  - Downloadable output as `.zip` containing `.xml` annotation files.
  - A clear README explaining how to use the tool.

- **Learning Outcomes**
  - Understand how object detection datasets are annotated and formatted.
  - Learn to build user interfaces with drawing capabilities in the browser.
  - Practice integrating frontend interaction with backend data export logic.
  - Gain hands-on experience with Pascal VOC format and XML generation.
```

```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Perspective Correction Tool for Scanned Documents
:class: none, dropdown, closed

This project is about building a simple web-based tool to correct the perspective of scanned or photographed documents that are tilted or captured at an angle. Users will upload an image, and the system will detect the document's corners and apply a geometric transformation to make the document appear as if it was scanned from a flat, top-down view. If the automatic detection fails, users will be able to manually select the corner points. The result can be downloaded as a cleaned-up `.png` or `.pdf` version of the document.

- **Goals**
  - Automatically correct the perspective of document images.
  - Help users clean up images of receipts, forms, or notes taken with a mobile phone.
  - Make the correction process fast, user-friendly, and visual.
  - Provide manual control for difficult cases.

- **Features**
  - **Image Upload**
    - Upload one image at a time (JPG, PNG).
    - Validate file type and display preview.

  - **Automatic Corner Detection**
    - Use edge detection and contour approximation to find the document.
    - Highlight the detected four corners.
    - Display a warning or fallback option if corners cannot be found.

  - **Manual Corner Adjustment**
    - Allow users to drag or click on the image to select the four corners.
    - Display a zoomed-in preview near the cursor for precision.

  - **Perspective Correction**
    - Compute the homography matrix based on the selected corner points.
    - Apply the transformation to produce a flat, top-down view of the document.
    - Resize or pad output to standard aspect ratio (A4 or original dimensions).

  - **Export Options**
    - Download corrected image as `.png`.
    - Convert and download the corrected document as a single-page `.pdf`.

- **Technologies** (These are just suggestions, students can use any tool or web technology appropriately)
  - **Frontend**: HTML, CSS, or any JavaScript library
  - **Canvas API**: For corner selection and interactive adjustment
  - **Backend**: Python with Flask (or FastAPI)
  - **Computer Vision**: OpenCV for edge detection, contour finding, and perspective transformation
  - **PDF Export**: Python libraries such as `reportlab` or `img2pdf`

- **Deliverables**
  - A clean web application where users can upload and fix document images.
  - Automatic and manual workflows for selecting corners.
  - Python backend with modules:
    - `corner_detection.py` for edge and contour detection.
    - `transform.py` for homography and warping.
    - `exporter.py` for image and PDF download.
  - Exported results saved locally or offered for download.
  - A small set of test images (3–5 document photos with different angles).
  - Documentation:
    - README with setup instructions.
    - Screenshot examples before and after correction.

- **Learning Outcomes**
  - Learn how to detect document boundaries using edge and contour methods.
  - Understand the mathematics of homography and perspective correction.
  - Practice building interactive visual tools for real-world CV use cases.
  - Gain experience in converting images into flattened, readable formats.
```

```{admonition} <span style="background-color:red;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Advanced</span> Image Region Labeling Tool for Semantic Segmentation
:class: warning, dropdown, closed

This project is about building a simple browser-based tool that allows users to annotate regions in an image for semantic segmentation tasks. Users can either draw polygonal masks or paint over regions with a brush tool to define class labels for different areas (e.g., road, sidewalk, car, tree). The tool will generate a pixel-wise labeled mask for each image and export them as `.png` files. A `metadata.json` file will map colors in the mask to class names. This kind of tool is commonly used in preparing datasets for training deep learning models in segmentation tasks.

- **Goals**
  - Allow users to annotate different regions of an image for semantic segmentation.
  - Provide tools for both polygon and brush-based annotations.
  - Generate accurate, per-pixel labeled masks as output.
  - Make the annotation process easy and visual through a web interface.

- **Features**
  - **Image Upload**
    - Upload one or more images to be annotated.
    - Support `.jpg`, `.jpeg`, and `.png` formats.
    - Display images one by one with navigation.

  - **Annotation Tools**
    - **Polygon Tool**:
      - Click to add points and form a closed shape.
      - Fill polygon with a unique color corresponding to a class label.
    - **Brush Tool**:
      - Freehand painting with adjustable brush size.
      - Apply color to irregular regions.

  - **Class Label Management**
    - Add, rename, or delete class labels.
    - Assign a unique color to each label.
    - Show a class legend beside the canvas for reference.

  - **Navigation and Progress**
    - Navigate across multiple uploaded images.
    - Show which images have been labeled.
    - Option to skip or revisit images.

  - **Export Annotations**
    - Save each mask as a `.png` file (same resolution as input).
    - Generate a `metadata.json` file that maps RGB color values to class labels.
    - Export all files in a single `.zip` for download.

- **Technologies** (These are just suggestions, students can use any tool or web technology appropriately)
  - **Frontend**: HTML, CSS, or any JavaScript library
  - **Canvas API**: For drawing polygons and brush-based painting
  - **Backend** (optional): Python with Flask (only needed for exporting zip)
  - **File Packaging**: JavaScript ZIP libraries like `JSZip` or Python `zipfile`

- **Deliverables**
  - A working browser-based tool for image region labeling.
  - Two modes of annotation: polygon and brush.
  - Class label manager with legend display.
  - Export function that produces:
    - Per-image `.png` mask with pixel labels.
    - `metadata.json` mapping label colors to class names.
  - Sample set of 3 to 5 test images for demonstration.
  - Documentation:
    - README with setup and usage instructions.
    - Visual guide for annotating and exporting.

- **Learning Outcomes**
  - Learn how semantic segmentation datasets are structured and labeled.
  - Understand per-pixel annotation formats used in computer vision.
  - Gain experience with drawing and user interaction using the HTML5 canvas.
  - Learn how to convert user interactions into valid dataset outputs.

```

### Learning-based Models for Computer Vision applications

```{admonition} <span style="background-color:#28a745;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> Monocular Depth Estimation from a Single RGB Image
:class: tip, dropdown, closed

This project introduces students to the fundamentals of learning-based monocular depth estimation — the task of predicting a dense depth map from a single RGB image. The goal is not to achieve state-of-the-art results but to build a simple encoder-decoder depth estimation model from scratch, understand the training pipeline, and visualize predicted depth maps. This project helps students get hands-on experience with supervised regression in computer vision and builds intuition for 2D-to-3D reasoning from images.

- **Goals**
  - Learn how to design and train a neural network that predicts depth from a single RGB image.
  - Understand how pixel-level regression differs from classification or segmentation.
  - Visualize and interpret model predictions as grayscale or color-coded depth maps.
  - Develop an end-to-end training and evaluation pipeline on a standard dataset.

- **Features**
  - Build a fully supervised depth estimation pipeline using PyTorch or TensorFlow.
  - Implement a basic CNN-based encoder-decoder architecture.
  - Train the network on a subset of a publicly available dataset.
  - Normalize, visualize, and save predicted depth maps.
  - Compare ground truth and predicted depth maps using error metrics.

- **Suggested Architecture**
  - **Encoder**: Use a simple CNN or a pre-trained backbone like ResNet18 (optional).
  - **Decoder**: Use upsampling layers (e.g., bilinear or transpose convolutions) to produce full-resolution depth maps.
  - **Loss Function**: L1 loss or Scale-Invariant Log RMSE between predicted and ground truth depths.
  - **Post-Processing**: Apply normalization for visualization; convert to heatmaps using OpenCV or matplotlib.

- **Dataset**
  - **NYU Depth v2 (Indoor Scenes)**
    - URL: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
    - Contains aligned RGB and depth frames from indoor scenes captured with a Kinect sensor.
    - Provide a preprocessed subset (~1000 images) to keep training time low.
  
  - **KITTI Depth Dataset (Outdoor Driving Scenes)**
    - URL: http://www.cvlibs.net/datasets/kitti/
    - Contains RGB images and sparse LiDAR-based depth maps.
    - Recommended only for advanced students due to preprocessing requirements.

  - **Alternative Starter Datasets**
    - **Make3D**: Smaller, older dataset for quick experimentation.
    - **Eigen split**: A widely used training split for monocular depth models on KITTI.

- **Technologies**
  - **Framework**: PyTorch (preferred) or TensorFlow/Keras
  - **Data Handling**: `torchvision`, custom Dataset classes, image transformations
  - **Visualization**: matplotlib, OpenCV, or PIL for heatmap overlays
  - **Training Setup**: GPU-accelerated training (optional), model checkpointing, basic training loop

- **Deliverables**
  - A working training script for monocular depth estimation.
  - Model definition (encoder-decoder or UNet-style) in a standalone file.
  - Sample visualizations: input RGB, predicted depth map, and ground truth side-by-side.
  - Evaluation script to compute RMSE, Abs Rel, and visual difference.
  - A README file with:
    - Setup instructions and environment dependencies.
    - Dataset download and preprocessing steps.
    - Training configuration (batch size, epochs, learning rate).
    - Explanation of architecture and loss function choices.

- **Learning Outcomes**
  - Understand the problem of monocular depth estimation and its applications in robotics, AR, and autonomous driving.
  - Learn how to design regression-based neural networks for dense prediction tasks.
  - Gain experience in handling and visualizing dense depth map data.
  - Practice implementing a full supervised training pipeline using modern deep learning frameworks.
```

```{admonition} <span style="background-color:green;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> Object Detection Benchmarking and Analysis
:class: tip, dropdown, closed

This project focuses on implementing and comparing **three popular object detection models** using a common dataset. The goal is to understand the strengths, limitations, and practical trade-offs of different architectures for object detection. This includes measuring speed, accuracy, and visual outputs across models.

Students will use pretrained models and fine-tune them on a moderately sized dataset to evaluate real-world performance. This project provides hands-on experience in working with bounding boxes, object categories, and COCO-style evaluation.

- **Goals**
  - Learn how modern object detection models work and differ.
  - Evaluate and compare multiple models using the same dataset and metrics.
  - Understand model trade-offs between accuracy, speed, and complexity.
  - Gain experience with annotation formats, evaluation pipelines, and visualization tools.

- **Models to Benchmark**
  1. **YOLOv5** (Fast and efficient; good balance of accuracy and speed)  
     - GitHub: [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5)
  2. **SSD (Single Shot MultiBox Detector)**  
     - PyTorch implementation via `torchvision.models.detection.ssd300_vgg16`
  3. **Faster R-CNN** (High accuracy; slower inference)  
     - PyTorch implementation via `torchvision.models.detection.fasterrcnn_resnet50_fpn`

- **Dataset**
  - **Pascal VOC 2012**  
    - Public dataset for object detection with 20 classes.
    - Contains annotated bounding boxes in XML format.
    - [VOC 2012 Homepage](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/)
  - Optional: Convert to COCO-style JSON format using tools like `voc2coco` if needed for uniformity.

- **Tasks**
  - Load and prepare dataset, split into training and validation.
  - Train or fine-tune each model on the dataset.
  - Evaluate using standard object detection metrics:
    - **mAP@0.5**, **mAP@[0.5:0.95]**, precision, recall
    - Inference speed (FPS) and model size
  - Visualize predictions from all three models on the same images.
  - Log metrics and analyze results.

- **Technologies**
  - **Framework**: PyTorch
  - **Libraries**: OpenCV, Matplotlib, TorchVision
  - **Annotation Tools**: LabelImg (for manual annotation or correction)

- **Deliverables**
  - A `benchmarking/` folder containing:
    - Scripts for training and evaluation of each model.
    - Model checkpoints and logs.
    - Evaluation plots comparing accuracy, inference speed, and qualitative results.
  - A report or Jupyter notebook (`detection_analysis.ipynb`) that includes:
    - Overview of each model and architecture.
    - Training setup and hyperparameters.
    - Evaluation metrics across models.
    - Visual comparison of detections (side-by-side bounding boxes).
    - Reflections on model performance and trade-offs.

- **Learning Outcomes**
  - Gain practical experience working with bounding box detection.
  - Understand how detection architectures compare in real-world settings.
  - Learn to use and modify pretrained models for new datasets.
  - Develop analysis and reporting skills for model comparison.

- **Optional Extensions**
  - Test on a custom dataset (e.g., classroom object detection, product shelf detection).
  - Add a fourth model (e.g., DETR or YOLOv8).
  - Include mobile deployment using ONNX or TensorRT.
```

```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Unpaired Day-to-Night Image Translation
:class: none, dropdown, closed

This project is about learning to perform **unpaired image-to-image translation**. The goal is to translate daytime street images into realistic nighttime scenes and vice versa, even when the dataset contains no paired examples. Students will train a model (hint: cycleGAN) to perform this style transformation and explore how adversarial losses and cycle consistency enable learning without paired supervision. This project introduces key concepts in generative modeling, domain adaptation, and style transfer.

- **Goals**
  - Understand the concept of unpaired image-to-image translation.
  - Implement or fine-tune a model to perform domain transfer between day and night images.
  - Visualize and evaluate the quality of generated images.
  - Learn how adversarial training and cycle consistency work in practice.

- **Features**
  - Load and preprocess images from two domains: Day and Night.
  - Train a model to learn two mappings: Day → Night and Night → Day.
  - Use cycle-consistency loss to ensure that an image translated from one domain and back remains similar to the original.
  - Generate side-by-side comparisons of real and translated images.
  - Evaluate generated images qualitatively and with simple perceptual metrics.

- **Dataset**
  - **BDD100K (Berkeley DeepDrive 100K)**
    - URL: https://bdd-data.berkeley.edu
    - Use only the image data for this project (no annotations needed).
    - Contains over 100,000 street-view driving images with diverse conditions.
    - Filter images by `timeofday=daytime` and `timeofday=night` for two separate domains.

  - **Alternative (if needed)**
    - Use custom datasets with clear visual distinction (e.g., Google Street View, webcams).
    - Minimum: 500+ images per domain, resized to 256×256.

- **Technologies** 
  - **Framework**: PyTorch (preferred) or TensorFlow
  - **Model Architecture**: CycleGAN (generator + discriminator for each domain)
  - **Image Processing**: torchvision transforms, PIL
  - **Visualization**: matplotlib, OpenCV

- **Key Components**
  - **Generators**: Two U-Net or ResNet-based generators (G: Day → Night, F: Night → Day)
  - **Discriminators**: PatchGAN discriminators (D_A and D_B) for adversarial training
  - **Losses**:
    - Adversarial Loss: to make translated images realistic
    - Cycle-Consistency Loss: to preserve structure (A → B → A ≈ A)
    - Identity Loss (optional): to preserve color/style when image already matches target domain

- **Deliverables**
  - A working CycleGAN training pipeline.
  - A small subset of day and night images prepared for training.
  - Trained models for Day→Night and Night→Day translation.
  - Sample outputs:
    - Day → Generated Night
    - Night → Generated Day
    - Original vs Reconstructed (Cycle consistency check)
  - Evaluation script for saving results and comparing visually.
  - README with:
    - Setup instructions
    - Dataset download and filtering guide
    - Training instructions
    - Example results

- **Learning Outcomes**
  - Learn how GANs work for unpaired translation tasks.
  - Understand the role of cycle-consistency and adversarial losses.
  - Explore domain adaptation and style transfer in real-world scenes.
  - Gain hands-on experience with training large models on real datasets.

```

```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Automatic Image Colorization Using CNNs
:class: none, dropdown, closed

This project focuses on the task of automatic image colorization, where a model learns to generate plausible colors for grayscale images. The approach involves training a convolutional neural network to predict the **ab color channels** from the **L channel** of an image in the CIELAB color space. Unlike simple filters, this model must learn context and semantics to infer realistic colors (e.g., sky is usually blue, grass is green). The final goal is to produce colorized outputs that look natural and consistent, even though the model is only trained on grayscale inputs.

- **Goals**
  - Learn how to build and train a CNN that maps grayscale (L channel) images to color channels (ab).
  - Understand the use of perceptual color spaces (Lab vs RGB) for regression tasks.
  - Evaluate the quality of generated images both visually and with perceptual metrics.
  - Gain practical experience with pixel-wise regression and image synthesis.

- **Features**
  - Convert RGB images to Lab color space and extract only the L (grayscale) channel.
  - Train a CNN to predict ab channels from the L channel using supervised learning.
  - Concatenate predicted ab channels with L to reconstruct a full-color image.
  - Convert back from Lab to RGB for visualization and evaluation.
  - Compare generated images to ground truth RGB and compute quantitative metrics.

- **Dataset**
  - **Places365 (Recommended)**
    - URL: http://places2.csail.mit.edu/
    - Large-scale dataset with natural scene diversity (ideal for color learning).
    - Resize to 256×256 and use a subset (~10k–50k images) for training.
  
  - **Alternative Datasets**
    - CIFAR-10 (small, good for quick experiments)
    - ImageNet subset (for more advanced results)
    - COCO images (for object-rich, diverse color contexts)

- **Technologies**
  - **Framework**: PyTorch (preferred) or TensorFlow
  - **Model Architecture**: 
    - U-Net style encoder-decoder
    - Or a custom CNN with downsampling and upsampling blocks
  - **Color Space Conversions**: `skimage.color.rgb2lab`, `lab2rgb`
  - **Loss Function**: Smooth L1 or MSE loss between predicted and ground truth ab channels

- **Key Components**
  - **DataLoader**: Convert RGB images to Lab, feed only the L channel as input
  - **Model**: Encoder-decoder CNN that takes L and predicts ab channels
  - **Training Loop**: Feed forward, compute loss on ab, backpropagate
  - **Postprocessing**: Reconstruct RGB image from predicted ab + original L

- **Deliverables**
  - A complete training pipeline for the CNN-based colorization model
  - Model architecture file (e.g., `colorization_net.py`)
  - Evaluation script for:
    - Side-by-side comparison of input grayscale, predicted color, and ground truth
    - Metrics like PSNR, SSIM, LPIPS (optional)
  - A set of generated examples saved to disk
  - A README with:
    - Instructions for setting up the environment and dependencies
    - Dataset download and preprocessing instructions
    - Training and inference commands
    - Notes on how the Lab color space is used

- **Learning Outcomes**
  - Understand the problem of pixel-wise regression and why Lab is used for color prediction.
  - Learn how to design convolutional architectures for generative vision tasks.
  - Develop intuition for training and evaluating generative models that synthesize realistic content.
  - Gain confidence in manipulating image formats, color spaces, and visualizations in a deep learning workflow.
```

```{admonition} <span style="background-color:red;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Advanced</span> Image Super-Resolution Using CNNs (SRCNN/ESPCN)
:class: warning, dropdown, closed

This project focuses on the task of **image super-resolution**, which involves reconstructing a high-resolution image from a low-resolution input. Super-resolution is important in fields like medical imaging, satellite imaging, video enhancement, and forensics. Students will implement a supervised learning pipeline using a simple CNN-based model like SRCNN or ESPCN. The goal is to learn how deep learning can restore high-frequency visual details lost during downsampling.

- **Goals**
  - Learn how to build and train a convolutional neural network for super-resolution.
  - Understand the principles of upscaling using deep learning versus traditional interpolation.
  - Explore evaluation metrics that measure perceptual image quality.
  - Generate side-by-side comparisons of low-resolution, upscaled, and ground truth images.

- **Features**
  - Downsample high-resolution images to create training pairs (LR → HR).
  - Build a baseline CNN model (SRCNN or ESPCN) for upsampling images.
  - Train the model to minimize pixel-wise difference between predicted and true high-resolution images.
  - Visualize and save comparison grids of:
    - Low-resolution input
    - Bicubic interpolation baseline
    - Super-resolved output
    - Ground truth high-resolution image

- **Suggested Architectures**
  - **SRCNN (Super-Resolution CNN)**
    - A 3-layer CNN that upsamples input using bicubic interpolation followed by refinement.
  - **ESPCN (Efficient Sub-Pixel CNN)**
    - Upsamples images using a learned sub-pixel convolution layer (pixel shuffle).
  - Optional: Try deeper models like VDSR or SwinIR (if time and GPU allow)

- **Dataset**
  - **DIV2K**
    - URL: https://data.vision.ee.ethz.ch/cvl/DIV2K/
    - High-quality 2K resolution images for super-resolution benchmarking.
    - Use the first 800 images for training, 100 for validation.
    - Common upscaling factors: ×2, ×3, ×4

  - **CelebA (Face Images)**
    - URL: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
    - Center-cropped face dataset, good for domain-specific SR.
    - Resize and downsample to generate pairs.

  - **Custom Dataset (Optional)**
    - Students can create their own dataset by downsampling any high-quality image folder.

- **Technologies**
  - **Framework**: PyTorch (preferred) or TensorFlow/Keras
  - **Loss Function**: L1 or MSE loss on pixel values
  - **Upsampling Techniques**: Bicubic (for comparison), learned upsampling with `torch.nn.PixelShuffle`
  - **Visualization**: matplotlib, OpenCV

- **Evaluation Metrics**
  - **PSNR (Peak Signal-to-Noise Ratio)**: Measures overall pixel-level fidelity.
  - **SSIM (Structural Similarity Index)**: Measures perceptual similarity and structural quality.
  - Optional: **LPIPS** for learning-based perceptual distance (if interested in realism)

- **Deliverables**
  - A complete training pipeline for a super-resolution CNN.
  - Image preprocessing scripts for generating LR–HR training pairs.
  - Inference script that takes a low-resolution image and produces a high-resolution output.
  - Side-by-side comparison visuals for test images.
  - Evaluation results: PSNR/SSIM scores on validation/test sets.
  - README with:
    - Dataset download instructions
    - Training and inference commands
    - Notes on model architecture and loss functions

- **Learning Outcomes**
  - Understand how CNNs can learn to restore visual detail from low-resolution inputs.
  - Explore the difference between traditional upsampling methods and learned upscaling.
  - Learn how to measure perceptual image quality using PSNR, SSIM, and visual inspection.
  - Gain experience with preprocessing, training, and evaluating image-to-image regression models.
```

```{admonition} <span style="background-color:red;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Advanced</span> Shadow Removal from Real-World Images Using Deep Learning
:class: warning, dropdown, closed

This project focuses on building a deep learning model to **remove shadows from natural images**. Shadows are often undesirable in computer vision applications such as autonomous driving, document scanning, and photo editing. The goal is to train a neural network to reconstruct the shadow-free version of an image, given a shadowed input. This task is commonly treated as an image-to-image translation problem, where paired datasets contain both shadowed and shadow-free versions of the same scene.

- **Goals**
  - Learn how to build a deep CNN model for shadow removal.
  - Understand challenges involved in learning to remove localized, structured visual obstructions.
  - Evaluate the perceptual and quantitative quality of de-shadowed results.
  - Gain hands-on experience with paired training data and pixel-level reconstruction loss.

- **Features**
  - Train a supervised image-to-image model using shadowed and shadow-free image pairs.
  - Predict clean, shadow-free versions of natural images.
  - Visualize original, shadow-free ground truth, and model output side-by-side.
  - Compare the model’s output with traditional shadow correction methods.
  - Analyze edge cases where shadows are complex or partially occluded.

- **Dataset**
  - **ISTD Dataset (Image Shadow Triplets Dataset)**
    - URL: https://github.com/DeepInsight-PCALab/ST-CGAN
    - Contains 1870 triplets: (shadow image, shadow mask, shadow-free image)
    - High-quality images of outdoor and indoor scenes with manually aligned masks
    - Download includes:
      - Shadow images
      - Shadow masks (optional: for visualization or multi-task training)
      - Shadow-free ground truths

  - **Alternative Datasets (Optional/Extension)**
    - SRD (Shadow Removal Dataset): Larger but noisier annotations
    - SBU Shadow Dataset (for shadow detection if students want a multi-task setup)

- **Technologies** 
  - **Framework**: PyTorch (preferred) or TensorFlow
  - **Model Architecture Options**:
    - U-Net or ResNet-based encoder-decoder
    - Optional attention modules to improve performance in shadow regions
  - **Loss Functions**:
    - L1 or L2 reconstruction loss between prediction and ground truth
    - Perceptual loss (optional) for texture consistency
    - Shadow mask-guided weighted loss (optional)

- **Evaluation Metrics**
  - **PSNR** (Peak Signal-to-Noise Ratio)
  - **SSIM** (Structural Similarity Index)
  - Optional: LPIPS or FID if using a GAN-based extension

- **Deliverables**
  - A fully working training pipeline for shadow removal
  - Model file (e.g., `shadow_removal_net.py`) implementing the architecture
  - Training and evaluation scripts
  - Sample results:
    - Shadowed input
    - Model output
    - Ground truth shadow-free image
  - Evaluation results (PSNR/SSIM tables)
  - README including:
    - Dataset download link and preprocessing guide
    - Instructions for training, inference, and evaluation
    - Explanations of chosen architecture and loss functions

- **Learning Outcomes**
  - Understand how CNNs can reconstruct missing or altered visual content.
  - Learn to design, train, and debug an image-to-image translation network.
  - Develop insights into dealing with real-world, localized artifacts in images.
  - Build intuition about the tradeoffs between structural accuracy and visual quality.
```


### Generative AI Projects

*Most GenAI projects need highend GPUs, please choose projects based on GPU availability. This course DOES NOT provide access to GPU compute.*

```{admonition} <span style="background-color:green;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Beginner</span> Benchmarking GenAI Models: GANs, VAEs, and Diffusion Models
:class: tip, dropdown, closed

*This can be run on Google Colab or Kaggle.*

This project aims to provide hands-on experience with three major classes of generative models — **Generative Adversarial Networks (GANs)**, **Variational Autoencoders (VAEs)**, and **Diffusion Models**. Students will implement simplified versions of each model, train them on image datasets (e.g., MNIST, CIFAR-10, CelebA), and compare their generated samples, training dynamics, and evaluation metrics.

The goal is not to beat SOTA but to **understand the mechanisms** behind each approach, gain intuition for their strengths and limitations, and benchmark their performance in a consistent setting.

- **Goals**
  - Implement foundational versions of VAE, GAN, and Denoising Diffusion Probabilistic Models (DDPM).
  - Compare their ability to model image data from simple to moderately complex datasets.
  - Study training behaviors: convergence, stability, sample quality.
  - Evaluate outputs using perceptual and statistical metrics.

- **Models to Implement**
  - **VAE** (Kingma & Welling)
    - Encoder + decoder architecture
    - Latent sampling with reparameterization
    - Gaussian likelihood and KL-divergence loss
  - **GAN** (Goodfellow et al.)
    - Simple DCGAN architecture
    - Generator vs Discriminator adversarial training
    - Use non-saturating loss for stability
  - **DDPM** (Ho et al., 2020)
    - Forward noising process (q)
    - Reverse denoising process (pθ)
    - Start with basic U-Net-based denoising model
    - Use fixed variance schedule for simplicity

- **Datasets**
  - **MNIST**
    - Handwritten digits, grayscale 28×28
    - Good for fast experimentation
  - **CIFAR-10**
    - 10 object classes, RGB 32×32
    - More colorful, diverse scenes
  - **CelebA (Optional/Advanced)**
    - Human face images, 64×64 or 128×128
    - For testing on structured data

- **Technologies**
  - **Framework**: PyTorch (strongly preferred for modularity)
  - **Architectures**:
    - VAE: MLP or CNN-based encoder/decoder
    - GAN: DCGAN-style generator/discriminator
    - DDPM: Simple U-Net with cosine or linear noise schedule
  - **Tools**:
    - `matplotlib`, `seaborn` for visualizations
    - `torchvision.utils.make_grid()` for grid image outputs

- **Evaluation Metrics**
  - **Inception Score (IS)**: For diversity and realism (on CIFAR-10)
  - **Fréchet Inception Distance (FID)**: Distance between generated and real distribution
  - **ELBO (for VAE)**: Evidence Lower Bound
  - **Sample Diversity**: Number of unique samples generated
  - **Qualitative Analysis**: Visual side-by-side comparisons

- **Deliverables**
  - Three implemented models in `vae.py`, `gan.py`, and `diffusion.py`
  - Dataset-specific training scripts: `train_mnist.py`, `train_cifar.py`, etc.
  - Output folders:
    - Checkpoints
    - Sample images from each model
    - Evaluation plots (loss curves, FID score over epochs)
  - A comparison notebook (`benchmark.ipynb`) with:
    - Training logs
    - Generated image grids
    - Evaluation summary table
  - README with:
    - Environment setup
    - Instructions for training and benchmarking
    - Notes on hyperparameters and model differences

- **Learning Outcomes**
  - Gain working knowledge of three generative modeling paradigms.
  - Understand the differences between latent variable models (VAEs), adversarial training (GANs), and iterative refinement (Diffusion Models).
  - Develop practical skills in evaluating image generation quality.
  - Learn how to design and run experiments fairly across models and datasets.
```

```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Face Sketch-to-Photo Generation with Pix2Pix
:class: none, dropdown, closed

This project focuses on building a **paired image-to-image translation model** that can generate realistic grayscale face photos from input face sketches. The task is framed as a supervised image translation problem, where the model learns to reconstruct real facial appearances based on edge-level sketch inputs. This problem has real-world applications in criminal identification, digital art, and restoration.

The core idea is to use a lightweight **Pix2Pix** architecture — a conditional GAN that takes a sketch as input and generates a realistic photo output. Students will train and evaluate their model using the **CUHK Face Sketch Dataset**.

- **Goals**
  - Understand how to perform image-to-image translation with paired supervision.
  - Learn to implement conditional GANs with encoder-decoder generators.
  - Generate realistic face images from edge sketches.
  - Evaluate reconstruction quality both quantitatively and visually.

- **Dataset**
  - **CUHK Face Sketch Dataset (CUFS)**
    - URL: [https://www.ee.cuhk.edu.hk/~xgwang/sketch.html](https://www.ee.cuhk.edu.hk/~xgwang/sketch.html)
    - Contains 606 image pairs: face sketch + corresponding grayscale photo.
    - Sketches are artist-drawn from real photos (aligned and cleaned).
    - Pre-aligned 250×200 images for easy model input.
    - No licensing restrictions for educational use.

- **Model Architecture**
  - **Generator**: U-Net (with skip connections)
  - **Discriminator**: PatchGAN (classifies image patches instead of whole image)
  - **Loss Functions**:
    - Adversarial Loss (from GAN framework)
    - L1 Reconstruction Loss between generated and ground-truth photo
    - Optional: Perceptual Loss for sharper outputs (if GPU permits)

- **Technologies** 
  - **Framework**: PyTorch
  - **Training Tools**: torchvision, matplotlib, tqdm
  - **Data Processing**: Resize to 256×256, normalize to [−1, 1]
  - Optional lightweight training on Google Colab

- **Features**
  - Upload and view face sketches
  - Train Pix2Pix model to learn photo reconstruction
  - Visualize outputs side-by-side with original sketches and real photos
  - Plot training curves: loss vs epoch
  - Generate new photo outputs from unseen sketches

- **Evaluation Metrics**
  - **L1 / MAE**: Pixel-level reconstruction error
  - **SSIM**: Structural similarity to ground truth
  - **PSNR**: Peak Signal-to-Noise Ratio
  - **Qualitative**: Visual realism of skin, eyes, hair, and facial outline

- **Deliverables**
  - Clean and modular codebase:
    - `train.py`: Handles dataset loading, model training, logging
    - `models/`: Contains generator and discriminator implementations
    - `datasets/`: Preprocessing scripts for CUHK data
    - `visualize.py`: Grid visualizations of sketch → generated → real
  - Output folder containing:
    - Saved model checkpoints
    - Sample predictions on test set
  - Evaluation report (`results.md`) with:
    - Visual examples
    - Metric tables
    - Observations about sketch-to-photo quality
  - README:
    - Dataset instructions
    - Training and inference commands
    - Hyperparameter tuning suggestions

- **Learning Outcomes**
  - Understand how conditional GANs work in supervised settings
  - Learn to process and train on paired image datasets
  - Develop practical skills in evaluating visual generation tasks
  - Build end-to-end pipelines for training, inference, and visualization
```


```{admonition} <span style="background-color:orange;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Intermediate</span> Emoji Generation Using VAE or GAN
:class: none, dropdown, closed

This project focuses on building a **generative model for creating new emojis** by learning from a small dataset of existing emoji images. Students will implement either a **Variational Autoencoder (VAE)** or a **DCGAN** to learn the underlying distribution of emojis and sample novel, diverse outputs from the learned latent space. This project introduces students to core generative modeling concepts while being computationally lightweight and visually rewarding.

- **Goals**
  - Implement a simple VAE or GAN to model emoji distributions.
  - Learn to generate new, unseen emojis by sampling from the learned latent space.
  - Understand differences in latent sampling, reconstruction loss, and adversarial training.
  - Gain practical experience in visualizing and debugging generative models.

- **Dataset**
  - **Twemoji or EmojiOne (Open-Source Emojis)**
    - [Twemoji GitHub](https://github.com/twitter/twemoji)
    - Manually download or scrape 200–500 emojis.
    - Preprocess to grayscale or RGB and resize to **64×64** pixels.
    - Organize into a single folder with `.png` images.

  - **Optional Dataset Tooling**
    - Use `Pillow` or `OpenCV` to resize and normalize images.
    - Convert emojis to grayscale if you want simpler model training.

- **Model Options**
  - **VAE**:
    - Encoder-decoder architecture with reparameterization
    - Latent space sampling and L2 reconstruction loss
    - Low training instability, interpretable latent space
  - **DCGAN**:
    - Generator-discriminator adversarial setup
    - Can create sharper-looking emoji outputs
    - More sensitive to hyperparameters (optional for advanced students)

- **Technologies** 
  - **Framework**: PyTorch or TensorFlow
  - **Tools**: matplotlib for visualizing generations, torchvision for image grid rendering
  - **Training Setup**:
    - Batch size: 32
    - Resolution: 64×64
    - Epochs: 50–100 (can train in <30 mins on Colab)

- **Features**
  - Encode emojis into latent space and visualize latent vectors (for VAE).
  - Sample random latent points to generate new emoji images.
  - Interpolate between two emojis in latent space.
  - Visualize reconstructions vs original images.
  - Save generated emojis to a grid for creative viewing.

- **Evaluation Metrics**
  - **Reconstruction Loss (VAE)**: L2 or Binary Cross Entropy
  - **Latent Space Continuity**: Interpolation results
  - **Visual Diversity**: Number of unique-looking samples
  - **Optional**: t-SNE plot of latent space clusters

- **Deliverables**
  - A working VAE or GAN implementation in `emoji_vae.py` or `emoji_gan.py`
  - Training script `train.py` with config flags for model type
  - Folder with generated emoji samples over training epochs
  - Output grids for:
    - Random generations
    - Interpolations
    - Reconstructions (for VAE)
  - Notebook (`emoji_analysis.ipynb`) to visualize generations and latent space
  - README:
    - Dataset preparation steps
    - Model architecture diagram
    - Training commands
    - Sample outputs and how to evaluate them

- **Learning Outcomes**
  - Understand how latent space encoding enables creativity and diversity in generation.
  - Learn the differences between VAE (likelihood-based) and GAN (adversarial) modeling.
  - Get experience training lightweight generative models under constrained compute.
  - Build creative generative AI applications from scratch with real-world use cases.
```

### You tell me

```{admonition} <span style="background-color:#007bff;color:white;padding:1px 4px;border-radius:4px;text-decoration:none;">Open-Ended</span> Student Proposed Projects
:class: tip, dropdown, closed

This track invites students to design their own project from the ground up by identifying a **real-world problem**, formulating a clear **computer vision problem statement**, and proposing a **well-structured methodology** to solve it. The goal is not only to apply course concepts but to simulate the early stages of applied AI research or product development.

This is ideal for students who want to explore ideas beyond predefined datasets and go deeper into topics aligned with their interests (e.g., medical imaging, agriculture, autonomous systems, education, art).

- **Goals**
  - Encourage originality, creativity, and initiative.
  - Practice defining scope, constraints, and success metrics for real problems.
  - Gain experience in building end-to-end projects from ideation to implementation.
  - Learn to perform baselining, comparison, and evaluation like a research prototype.

- **Proposal Requirements**
  Students must submit a 1-page written proposal that clearly outlines:
  
  1. **Real-World Motivation**
     - Describe a real-world situation where the problem arises.
     - Explain why it matters and who it affects.
     - Example: “Farmers struggle to detect early signs of crop disease using just the naked eye, leading to delayed interventions.”

  2. **Problem Statement**
     - Frame the motivation into a specific, measurable CV problem.
     - Clearly define inputs, outputs, and the expected model behavior.
     - Example: “Given an RGB image of a crop leaf, classify whether the plant is healthy or affected by one of 5 known diseases.”

  3. **Baseline Benchmarking**
     - Identify existing methods that solve similar or related problems.
     - Describe one or more baseline models that will be used for comparison.
     - Provide links to relevant papers, GitHub repos, or dataset leaderboards.

  4. **Dataset and Preprocessing**
     - Describe what dataset will be used.
     - If it's not available publicly, describe the plan to collect or annotate it.
     - Describe the input format, image resolution, preprocessing, and splits.

  5. **Proposed Method**
     - High-level architecture (e.g., CNN, UNet, Transformer).
     - Justify why the proposed method is expected to work better or differently than the baseline.
     - Optional extensions like multi-modal inputs or data augmentations.

  6. **Evaluation Plan**
     - Define the metrics: accuracy, F1-score, PSNR, SSIM, IoU, etc.
     - Include how experiments will be logged, visualized, and interpreted.
     - Describe comparison with baselines and ablation studies (if applicable).

  7. **Deliverables**
     - Working code and trained models.
     - README and reproducible instructions.
     - Visualizations of results (e.g., confusion matrix, segmentation masks, GAN outputs).
     - Report/Notebook summarizing methodology, experiments, and learnings.

- **Examples of Past Student Projects**
  - *“Detecting potholes in road surfaces from dashcam footage”*  
  - *“Monocular 3D object detection in construction site scenes”*  
  - *“Fine-grained classification of rock types in geological samples”*  
  - *“Generating anime avatars from hand-drawn sketches”*

- **Learning Outcomes**
  - Learn to independently scope and define a real AI project.
  - Develop skills in literature review, baseline benchmarking, and critical analysis.
  - Understand the research process: identifying gaps, validating methods, and iterating.
  - Practice communicating a vision both technically and narratively.
```

## Step 3: Go through deliverables and submit in UBLearns