## Table of Contents

1. [Environment Setup & Random Seed Configuration](#1.-Environment-Setup-&-Random-Seed-Configuration)
2. [Import Libraries](#2.-Import-Libraries)
3. [Dataset Download & Verification](#3.-Dataset-Download-&-Verification)
4. [Exploratory Data Analysis (EDA)](#4.-Exploratory-Data-Analysis-(EDA))
5. [Data Preprocessing & Subset Creation](#5.-Data-Preprocessing-&-Subset-Creation)
6. [Data Augmentation](#6.-Data-Augmentation)
7. [YOLOv8 Object Detection](#7.-YOLOv8-Object-Detection)
8. [U-Net Semantic Segmentation](#8.-U-Net-Semantic-Segmentation)
9. [Model Comparison & Discussion](#9.-Model-Comparison-&-Discussion)
10. [Conclusions & Future Work](#10.-Conclusions-&-Future-Work)

---
## 1. Environment Setup & Random Seed Configuration

Setting all random seeds for reproducibility across:
- Python's built-in random module
- NumPy
- PyTorch (CPU and CUDA)
- Python hash seed

In [None]:
# Set environment variable for Python hash seed (must be done before importing libraries)
import os
os.environ['PYTHONHASHSEED'] = '0'

# Import random libraries
import random
import numpy as np

# Set random seeds
RANDOM_SEED = 42

random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# PyTorch seeds (will be set after importing torch)
print("‚úì Python hash seed set to 0")
print(f"‚úì Random seed set to {RANDOM_SEED}")
print(f"‚úì NumPy seed set to {RANDOM_SEED}")

In [None]:
# Import PyTorch and set its random seeds
import torch

torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed_all(RANDOM_SEED)

# Additional PyTorch reproducibility settings
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

print(f"‚úì PyTorch seed set to {RANDOM_SEED}")
print(f"‚úì PyTorch CUDA seed set to {RANDOM_SEED}")
print("‚úì CUDNN deterministic mode enabled")
print("‚úì CUDNN benchmark disabled for reproducibility")

# Check GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\n‚úì Using device: {device}")
if torch.cuda.is_available():
    print(f"  GPU: {torch.cuda.get_device_name(0)}")
    print(f"  CUDA Version: {torch.version.cuda}")

---
## 2. Import Libraries

Importing all required libraries for:
- Data manipulation and analysis
- Image processing
- Deep learning (PyTorch, YOLOv8)
- Visualization
- COCO dataset handling

In [None]:
# Data manipulation and analysis
import pandas as pd
from collections import Counter, defaultdict
import json
from pathlib import Path
import shutil
from tqdm import tqdm

# Image processing
import cv2
from PIL import Image
import albumentations as A
from albumentations.pytorch import ToTensorV2

# Deep Learning - PyTorch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision import transforms

# YOLOv8
from ultralytics import YOLO

# Visualization
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns

# COCO tools
from pycocotools.coco import COCO
from pycocotools import mask as coco_mask

# Metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Set matplotlib style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úì All libraries imported successfully!")
print(f"\nLibrary Versions:")
print(f"  PyTorch: {torch.__version__}")
print(f"  Torchvision: {torchvision.__version__}")
print(f"  NumPy: {np.__version__}")
print(f"  Pandas: {pd.__version__}")
print(f"  OpenCV: {cv2.__version__}")
print(f"  Albumentations: {A.__version__}")

---
## 3. Dataset Download & Verification

### 3.1 Download TACO Dataset

The TACO dataset can be downloaded using:
1. **Kaggle API** (recommended - automated)
2. **Manual download** from Kaggle website

In [None]:
# Define project directories
PROJECT_ROOT = Path.cwd()
DATA_DIR = PROJECT_ROOT / 'data'
TACO_DIR = DATA_DIR / 'TACO'
RUNS_DIR = PROJECT_ROOT / 'runs'
WEIGHTS_DIR = PROJECT_ROOT / 'weights'

# Create directories
DATA_DIR.mkdir(exist_ok=True)
RUNS_DIR.mkdir(exist_ok=True)
WEIGHTS_DIR.mkdir(exist_ok=True)

print("Project directory structure:")
print(f"  Project Root: {PROJECT_ROOT}")
print(f"  Data Directory: {DATA_DIR}")
print(f"  TACO Directory: {TACO_DIR}")
print(f"  Runs Directory: {RUNS_DIR}")
print(f"  Weights Directory: {WEIGHTS_DIR}")

In [None]:
# Check if dataset already exists
if TACO_DIR.exists() and len(list(TACO_DIR.glob('*'))) > 0:
    print("‚úì TACO dataset already exists!")
    print(f"  Location: {TACO_DIR}")
else:
    print("Dataset not found. Please download using one of the following methods:\n")
    
    print("Method 1: Kaggle API (Recommended)")
    print("-" * 50)
    print("1. Install Kaggle API: pip install kaggle")
    print("2. Setup Kaggle credentials (kaggle.json)")
    print("3. Run the following commands:\n")
    print("   kaggle datasets download -d kneroma/tacotrashdataset")
    print(f"   unzip tacotrashdataset.zip -d {DATA_DIR}")
    print("\nMethod 2: Manual Download")
    print("-" * 50)
    print("1. Visit: https://www.kaggle.com/datasets/kneroma/tacotrashdataset")
    print("2. Download the dataset")
    print(f"3. Extract to: {DATA_DIR}")
    print("\nNote: Uncomment and run the cell below to download via Kaggle API")

In [None]:
# Uncomment to download using Kaggle API
# !pip install kaggle
# !kaggle datasets download -d kneroma/tacotrashdataset
# import zipfile
# with zipfile.ZipFile('tacotrashdataset.zip', 'r') as zip_ref:
#     zip_ref.extractall(DATA_DIR)
# print("‚úì Dataset downloaded and extracted successfully!")

### 3.2 Verify Dataset Structure and COCO Annotations

In [None]:
# Verify dataset structure
if TACO_DIR.exists():
    print("Dataset Structure:")
    print("-" * 50)
    
    # List all subdirectories and files
    for item in sorted(TACO_DIR.glob('*')):
        if item.is_dir():
            file_count = len(list(item.glob('*')))
            print(f"üìÅ {item.name}/ ({file_count} items)")
        else:
            file_size = item.stat().st_size / (1024 * 1024)  # Convert to MB
            print(f"üìÑ {item.name} ({file_size:.2f} MB)")
    
    # Look for annotation files
    annotation_files = list(TACO_DIR.rglob('*.json'))
    print(f"\nFound {len(annotation_files)} annotation file(s):")
    for ann_file in annotation_files:
        print(f"  - {ann_file.relative_to(TACO_DIR)}")
else:
    print("‚ö†Ô∏è Dataset not found. Please download the dataset first.")

In [None]:
# Load and verify COCO annotations
# Note: Update the annotation file path based on actual dataset structure

# Common TACO annotation file paths
possible_ann_paths = [
    TACO_DIR / 'annotations.json',
    TACO_DIR / 'annotations' / 'instances_default.json',
    TACO_DIR / 'TACO' / 'annotations.json',
]

ANNOTATION_FILE = None
for path in possible_ann_paths:
    if path.exists():
        ANNOTATION_FILE = path
        break

if ANNOTATION_FILE:
    print(f"‚úì Found annotation file: {ANNOTATION_FILE.name}")
    
    # Load COCO annotations
    with open(ANNOTATION_FILE, 'r') as f:
        coco_data = json.load(f)
    
    print("\nCOCO Format Verification:")
    print("-" * 50)
    print(f"  Images: {len(coco_data.get('images', []))}")
    print(f"  Annotations: {len(coco_data.get('annotations', []))}")
    print(f"  Categories: {len(coco_data.get('categories', []))}")
    
    # Display first few categories
    print("\nSample Categories:")
    for cat in coco_data.get('categories', [])[:10]:
        print(f"  ID {cat['id']}: {cat['name']}")
    
    if len(coco_data.get('categories', [])) > 10:
        print(f"  ... and {len(coco_data.get('categories', [])) - 10} more")
    
    print("\n‚úì COCO format verified successfully!")
else:
    print("‚ö†Ô∏è Annotation file not found. Please verify dataset structure.")
    print("   Expected locations:")
    for path in possible_ann_paths:
        print(f"   - {path}")

### 3.3 Find Image Directory

In [None]:
# Locate image directory
possible_img_dirs = [
    TACO_DIR / 'images',
    TACO_DIR / 'data',
    TACO_DIR / 'TACO' / 'images',
    TACO_DIR,
]

IMAGE_DIR = None
for img_dir in possible_img_dirs:
    if img_dir.exists():
        # Check if directory contains image files
        img_files = list(img_dir.glob('*.jpg')) + list(img_dir.glob('*.png'))
        if len(img_files) > 0:
            IMAGE_DIR = img_dir
            print(f"‚úì Found image directory: {IMAGE_DIR.name}")
            print(f"  Total images: {len(img_files)}")
            break

if not IMAGE_DIR:
    print("‚ö†Ô∏è Image directory not found. Please verify dataset structure.")

---
## 4. Exploratory Data Analysis (EDA)

Coming next: Comprehensive analysis of the TACO dataset including:
- Dataset statistics
- Class distribution and frequency
- Object count per image
- Class imbalance visualization
- Sample images with annotations

In [None]:
# Placeholder for EDA section
print("EDA section will be implemented in the next milestone")

---
## 5. Data Preprocessing & Subset Creation

Coming next: Filter dataset to top 5 most frequent classes

In [None]:
# Placeholder for preprocessing section
print("Preprocessing section will be implemented in the next milestone")

---
## 6. Data Augmentation

Coming next: Implement augmentation strategies

In [None]:
# Placeholder for augmentation section
print("Augmentation section will be implemented in the next milestone")

---
## 7. YOLOv8 Object Detection

Coming next: Train and evaluate YOLOv8 model

In [None]:
# Placeholder for YOLOv8 section
print("YOLOv8 section will be implemented in the next milestone")

---
## 8. U-Net Semantic Segmentation

Coming next: Build and train U-Net model

In [None]:
# Placeholder for U-Net section
print("U-Net section will be implemented in the next milestone")

---
## 9. Model Comparison & Discussion

Coming next: Compare YOLO vs U-Net performance

In [None]:
# Placeholder for comparison section
print("Comparison section will be implemented in the next milestone")

---
## 10. Conclusions & Future Work

Coming next: Final conclusions and recommendations

In [None]:
# Placeholder for conclusions section
print("Conclusions section will be implemented in the next milestone")

---
## End of Notebook

**Project:** YOLOv8 + U-Net Waste Detection and Segmentation  
**Course:** Deep Learning for Perception (CS4045)  
**Authors:** Minahil Ali (22i-0849), Ayaan Khan (22i-0832)  