In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# İstanbul Drone Waste Detection: Synthetic Data Analysis Report\n",
    "\n",
    "This notebook analyzes the performance of the waste detection model trained on synthetic data for drone-based waste detection in Istanbul.\n",
    "\n",
    "**Author:** [Your Name]\n",
    "**Date:** [Current Date]\n",
    "\n",
    "## Table of Contents\n",
    "1. [Introduction](#introduction)\n",
    "2. [Dataset Analysis](#dataset-analysis)\n",
    "   - [Dataset Overview](#dataset-overview)\n",
    "   - [Class Distribution](#class-distribution)\n",
    "   - [Environmental Variations](#environmental-variations)\n",
    "   - [Lighting Conditions](#lighting-conditions)\n",
    "   - [Sample Visualization](#sample-visualization)\n",
    "3. [Training Analysis](#training-analysis)\n",
    "   - [Training Configuration](#training-configuration)\n",
    "   - [Training Curves](#training-curves)\n",
    "4. [Model Evaluation](#model-evaluation)\n",
    "   - [Overall Performance](#overall-performance)\n",
    "   - [Per-Class Performance](#per-class-performance)\n",
    "   - [Qualitative Detection Examples](#qualitative-detection-examples)\n",
    "5. [Analysis of Sim-to-Real Gap](#sim-to-real-gap)\n",
    "   - [Strengths of Synthetic Data](#strengths-of-synthetic-data)\n",
    "   - [Weaknesses of Synthetic Data](#weaknesses-of-synthetic-data)\n",
    "   - [Gap Analysis](#gap-analysis)\n",
    "6. [Conclusions and Future Work](#conclusions-and-future-work)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Import necessary libraries\n",
    "import os\n",
    "import sys\n",
    "import json\n",
    "import yaml\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import cv2\n",
    "from pathlib import Path\n",
    "from IPython.display import display, Markdown, HTML\n",
    "\n",
    "# Set plotting style\n",
    "plt.style.use('fivethirtyeight')\n",
    "sns.set_context(\"notebook\", font_scale=1.2)\n",
    "\n",
    "# Set paths\n",
    "DATASET_PATH = Path(\"../dataset\")\n",
    "WEIGHTS_PATH = Path(\"../weights\")\n",
    "DOCS_PATH = Path(\"../docs\")\n",
    "FIGURES_PATH = DOCS_PATH / \"figures\"\n",
    "\n",
    "# Create figures directory if it doesn't exist\n",
    "FIGURES_PATH.mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "This report analyzes the effectiveness of using synthetic data for training waste detection models for drone-based monitoring in İstanbul. The synthetic data pipeline uses AirSim and Unreal Engine to generate realistic imagery of waste items across four distinct İstanbul environments:\n",
    "\n",
    "1. Bosphorus waterfront (Beşiktaş–Ortaköy style)\n",
    "2. Balat/Karaköy narrow streets\n",
    "3. Yıldız Park\n",
    "4. Modern urban plaza\n",
    "\n",
    "The main objectives of this report are to:\n",
    "\n",
    "1. Analyze the properties and quality of the generated synthetic dataset\n",
    "2. Evaluate the performance of a YOLOv8 detector trained on this synthetic data\n",
    "3. Assess the sim-to-real gap and provide recommendations for future work"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Load dataset metadata\n",
    "metadata_path = DATASET_PATH / \"dataset_metadata.json\"\n",
    "with open(metadata_path, 'r') as f:\n",
    "    dataset_metadata = json.load(f)\n",
    "\n",
    "# Display basic dataset information\n",
    "print(f\"Dataset Name: {dataset_metadata['name']}\")\n",
    "print(f\"Total Images: {dataset_metadata['total_images']}\")\n",
    "print(f\"Format: {dataset_metadata['format']}\")\n",
    "print(f\"Splits: {dataset_metadata['splits']}\")\n",
    "print(f\"Number of Classes: {len(dataset_metadata['classes'])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Class Distribution\n",
    "\n",
    "Let's analyze the distribution of waste object classes in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Extract class counts\n",
    "class_names = [c['name'] for c in dataset_metadata['classes']]\n",
    "class_counts = [c['count']['total'] for c in dataset_metadata['classes']]\n",
    "\n",
    "# Create DataFrame for easier analysis\n",
    "class_df = pd.DataFrame({\n",
    "    'Class': class_names,\n",
    "    'Count': class_counts,\n",
    "    'Train': [c['count']['train'] for c in dataset_metadata['classes']],\n",
    "    'Val': [c['count']['val'] for c in dataset_metadata['classes']],\n",
    "    'Test': [c['count']['test'] for c in dataset_metadata['classes']],\n",
    "})\n",
    "\n",
    "# Calculate percentages\n",
    "total_objects = sum(class_counts)\n",
    "class_df['Percentage'] = class_df['Count'] / total_objects * 100\n",
    "\n",
    "# Display class distribution table\n",
    "display(class_df)\n",
    "\n",
    "# Plot class distribution\n",
    "plt.figure(figsize=(12, 6))\n",
    "ax = sns.barplot(x='Class', y='Count', data=class_df, palette='viridis')\n",
    "plt.title('Waste Class Distribution in Synthetic Dataset')\n",
    "plt.xticks(rotation=45, ha='right')\n",
    "plt.tight_layout()\n",
    "\n",
    "# Add count labels on top of bars\n",
    "for i, count in enumerate(class_counts):\n",
    "    ax.text(i, count + 50, str(count), ha='center', va='bottom', fontweight='bold')\n",
    "\n",
    "# Save figure\n",
    "plt.savefig(FIGURES_PATH / \"class_distribution.png\", dpi=300, bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Environmental Variations\n",
    "\n",
    "Let's analyze the distribution of environment types in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Extract environment statistics\n",
    "env_stats = dataset_metadata['statistics']['environments']\n",
    "env_names = list(env_stats.keys())\n",
    "env_counts = list(env_stats.values())\n",
    "\n",
    "# Create DataFrame\n",
    "env_df = pd.DataFrame({\n",
    "    'Environment': env_names,\n",
    "    'Count': env_counts\n",
    "})\n",
    "env_df['Percentage'] = env_df['Count'] / sum(env_counts) * 100\n",
    "\n",
    "# Display environment distribution\n",
    "display(env_df)\n",
    "\n",
    "# Plot environment distribution\n",
    "plt.figure(figsize=(10, 6))\n",
    "ax = sns.barplot(x='Environment', y='Count', data=env_df, palette='Blues_d')\n",
    "plt.title('Environment Distribution in Synthetic Dataset')\n",
    "plt.xticks(rotation=45, ha='right')\n",
    "plt.tight_layout()\n",
    "\n",
    "# Add count labels on top of bars\n",
    "for i, count in enumerate(env_counts):\n",
    "    ax.text(i, count + 10, str(count), ha='center', va='bottom', fontweight='bold')\n",
    "\n",
    "# Save figure\n",
    "plt.savefig(FIGURES_PATH / \"environment_distribution.png\", dpi=300, bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lighting Conditions\n",
    "\n",
    "Let's analyze the distribution of lighting conditions in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Extract lighting statistics\n",
    "light_stats = dataset_metadata['statistics']['lighting_conditions']\n",
    "light_names = list(light_stats.keys())\n",
    "light_counts = list(light_stats.values())\n",
    "\n",
    "# Create DataFrame\n",
    "light_df = pd.DataFrame({\n",
    "    'Lighting': light_names,\n",
    "    'Count': light_counts\n",
    "})\n",
    "light_df['Percentage'] = light_df['Count'] / sum(light_counts) * 100\n",
    "\n",
    "# Display lighting distribution\n",
    "display(light_df)\n",
    "\n",
    "# Plot lighting distribution\n",
    "plt.figure(figsize=(10, 6))\n",
    "ax = sns.barplot(x='Lighting', y='Count', data=light_df, palette='YlOrRd')\n",
    "plt.title('Lighting Condition Distribution in Synthetic Dataset')\n",
    "plt.tight_layout()\n",
    "\n",
    "# Add count labels on top of bars\n",
    "for i, count in enumerate(light_counts):\n",
    "    ax.text(i, count + 10, str(count), ha='center', va='bottom', fontweight='bold')\n",
    "\n",
    "# Save figure\n",
    "plt.savefig(FIGURES_PATH / \"lighting_distribution.png\", dpi=300, bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sample Visualization\n",
    "\n",
    "Let's visualize a few sample images from each environment and lighting condition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "def load_sample_images(num_samples=3):\n",
    "    \"\"\"Load sample images from each environment and lighting condition.\"\"\"\n",
    "    # Load metadata with full image paths\n",
    "    try:\n",
    "        metadata_path = DATASET_PATH / \"metadata.json\"\n",
    "        with open(metadata_path, 'r') as f:\n",
    "            full_metadata = json.load(f)\n",
    "        \n",
    "        all_images = full_metadata.get('images', [])\n",
    "        \n",
    "        # Group images by environment\n",
    "        env_images = {}\n",
    "        for env_name in env_names:\n",
    "            env_imgs = [img for img in all_images if img.get('environment') == env_name]\n",
    "            if env_imgs:\n",
    "                # Sample up to num_samples images\n",
    "                env_images[env_name] = random.sample(env_imgs, min(num_samples, len(env_imgs)))\n",
    "        \n",
    "        # Group images by lighting condition\n",
    "        light_images = {}\n",
    "        for light_name in light_names:\n",
    "            light_imgs = [img for img in all_images if img.get('lighting', {}).get('condition') == light_name]\n",
    "            if light_imgs:\n",
    "                # Sample up to num_samples images\n",
    "                light_images[light_name] = random.sample(light_imgs, min(num_samples, len(light_imgs)))\n",
    "        \n",
    "        return env_images, light_images\n",
    "    except Exception as e:\n",
    "        print(f\"Error loading sample images: {e}\")\n",
    "        return {}, {}\n",
    "\n",
    "def display_sample_images(image_dict, title, source_dir='images'):\n",
    "    \"\"\"Display sample images from each category.\"\"\"\n",
    "    if not image_dict:\n",
    "        print(f\"No sample images available for {title}\")\n",
    "        return\n",
    "    \n",
    "    # Create figure\n",
    "    num_categories = len(image_dict)\n",
    "    num_samples = max(len(imgs) for imgs in image_dict.values())\n",
    "    \n",
    "    fig, axes = plt.subplots(num_categories, num_samples, figsize=(15, 3*num_categories))\n",
    "    fig.suptitle(f'Sample Images by {title}', fontsize=16)\n",
    "    \n",
    "    # Display images\n",
    "    for i, (category, imgs) in enumerate(image_dict.items()):\n",
    "        for j, img_info in enumerate(imgs):\n",
    "            try:\n",
    "                img_path = DATASET_PATH / source_dir / img_info['filename']\n",
    "                img = cv2.imread(str(img_path))\n",
    "                img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
    "                \n",
    "                if num_categories > 1:\n",
    "                    ax = axes[i, j]\n",
    "                else:\n",
    "                    ax = axes[j]\n",
    "                    \n",
    "                ax.imshow(img)\n",
    "                ax.set_title(f\"{category}\")\n",
    "                ax.axis('off')\n",
    "            except Exception as e:\n",
    "                print(f\"Error displaying image {img_info.get('filename')}: {e}\")\n",
    "    \n",
    "    plt.tight_layout(rect=[0, 0, 1, 0.96])\n",
    "    plt.savefig(FIGURES_PATH / f\"{title.lower().replace(' ', '_')}_samples.png\", dpi=300, bbox_inches='tight')\n",
    "    plt.show()\n",
    "\n",
    "# Import random module\n",
    "import random\n",
    "\n",
    "# Load and display sample images\n",
    "env_images, light_images = load_sample_images()\n",
    "\n",
    "# Display environment samples\n",
    "display_sample_images(env_images, \"Environment Type\")\n",
    "\n",
    "# Display lighting condition samples\n",
    "display_sample_images(light_images, \"Lighting Condition\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Load training results\n",
    "training_results_path = WEIGHTS_PATH / \"training_results.json\"\n",
    "if training_results_path.exists():\n",
    "    with open(training_results_path, 'r') as f:\n",
    "        training_results = json.load(f)\n",
    "    \n",
    "    # Display training configuration\n",
    "    print(\"Training Configuration:\")\n",
    "    print(f\"Model: {training_results['model']['name']}\")\n",
    "    print(f\"Pretrained: {training_results['model']['pretrained']}\")\n",
    "    print(f\"Epochs: {training_results['training']['epochs']}\")\n",
    "    print(f\"Batch Size: {training_results['training']['batch_size']}\")\n",
    "    print(f\"Image Size: {training_results['training']['image_size']}\")\n",
    "    print(f\"Optimizer: {training_results['training']['optimizer']}\")\n",
    "    print(f\"Learning Rate: {training_results['training']['learning_rate']}\")\n",
    "    print(f\"Timestamp: {training_results['timestamp']}\")\n",
    "else:\n",
    "    print(\"Training results not found.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training Curves\n",
    "\n",
    "Let's visualize the training curves to analyze the model's training progress."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Try to find and load training history from YOLOv8 results directory\n",
    "try:\n",
    "    yolo_results_dir = Path(\"../runs/detect\")\n",
    "    \n",
    "    # Find the latest results directory\n",
    "    result_dirs = list(yolo_results_dir.glob(\"synthetic_only*\"))\n",
    "    if result_dirs:\n",
    "        latest_dir = max(result_dirs, key=lambda p: p.stat().st_mtime)\n",
    "        results_csv = latest_dir / \"results.csv\"\n",
    "        \n",
    "        if results_csv.exists():\n",
    "            # Load results\n",
    "            train_df = pd.read_csv(results_csv)\n",
    "            \n",
    "            # Plot training curves\n",
    "            plt.figure(figsize=(12, 8))\n",
    "            \n",
    "            # Plot losses\n",
    "            plt.subplot(2, 2, 1)\n",
    "            plt.plot(train_df['epoch'], train_df['train/box_loss'], label='Train Box Loss')\n",
    "            plt.plot(train_df['epoch'], train_df['val/box_loss'], label='Val Box Loss')\n",
    "            plt.title('Box Loss')\n",
    "            plt.xlabel('Epoch')\n",
    "            plt.ylabel('Loss')\n",
    "            plt.legend()\n",
    "            plt.grid(True)\n",
    "            \n",
    "            plt.subplot(2, 2, 2)\n",
    "            plt.plot(train_df['epoch'], train_df['train/cls_loss'], label='Train Cls Loss')\n",
    "            plt.plot(train_df['epoch'], train_df['val/cls_loss'], label='Val Cls Loss')\n",
    "            plt.title('Class Loss')\n",
    "            plt.xlabel('Epoch')\n",
    "            plt.ylabel('Loss')\n",
    "            plt.legend()\n",
    "            plt.grid(True)\n",
    "            \n",
    "            # Plot metrics\n",
    "            plt.subplot(2, 2, 3)\n",
    "            plt.plot(train_df['epoch'], train_df['metrics/precision'], label='Precision')\n",
    "            plt.plot(train_df['epoch'], train_df['metrics/recall'], label='Recall')\n",
    "            plt.title('Precision and Recall')\n",
    "            plt.xlabel('Epoch')\n",
    "            plt.ylabel('Value')\n",
    "            plt.legend()\n",
    "            plt.grid(True)\n",
    "            \n",
    "            plt.subplot(2, 2, 4)\n",
    "            plt.plot(train_df['epoch'], train_df['metrics/mAP50'], label='mAP@0.5')\n",
    "            plt.plot(train_df['epoch'], train_df['metrics/mAP50-95'], label='mAP@0.5:0.95')\n",
    "            plt.title('mAP')\n",
    "            plt.xlabel('Epoch')\n",
    "            plt.ylabel('Value')\n",
    "            plt.legend()\n",
    "            plt.grid(True)\n",
    "            \n",
    "            plt.tight_layout()\n",
    "            plt.savefig(FIGURES_PATH / \"training_curves.png\", dpi=300, bbox_inches='tight')\n",
    "            plt.show()\n",
    "            \n",
    "            # Display final metrics\n",
    "            final_metrics = train_df.iloc[-1]\n",
    "            print(f\"Final Training Metrics (Epoch {int(final_metrics['epoch'])})\")\n",
    "            print(f\"mAP@0.5: {final_metrics['metrics/mAP50']:.4f}\")\n",
    "            print(f\"mAP@0.5:0.95: {final_metrics['metrics/mAP50-95']:.4f}\")\n",
    "            print(f\"Precision: {final_metrics['metrics/precision']:.4f}\")\n",
    "            print(f\"Recall: {final_metrics['metrics/recall']:.4f}\")\n",
    "        else:\n",
    "            print(\"Training results CSV not found.\")\n",
    "    else:\n",
    "        print(\"No YOLOv8 training results found.\")\n",
    "except Exception as e:\n",
    "    print(f\"Error loading training curves: {e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Evaluation\n",
    "\n",
    "Let's analyze the evaluation results on the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "# Load evaluation metrics\n",
    "eval_metrics_path = Path(\"../docs/evaluation/evaluation_metrics.json\")\n",
    "if eval_metrics_path.exists():\n",
    "    with open(eval_metrics_path, 'r') as f:\n",
    "        eval_metrics = json.load(f)\n",
    "    \n",
    "    # Display overall metrics\n",
    "    overall = eval_metrics['overall']\n",
    "    print(\"Overall Evaluation Metrics:\")\n",
    "    print(f\"Precision: {overall['precision']:.4f}\")\n",
    "    print(f\"Recall: {overall['recall']:.4f}\")\n",
    "    print(f\"mAP@0.5: {overall['mAP50']:.4f}\")\n",
    "    print(f\"mAP@0.5:0.95: {overall['mAP50-95']:.4f}\")\n",
    "    \n",
    "    # Display class metrics\n",
    "    if 'class' in eval_metrics:\n",
    "        class_metrics = pd.DataFrame(eval_metrics['class'])\n",
    "        display(class_metrics)\n",
    "        \n",
    "        # Plot class metrics\n",
    "        plt.figure(figsize=(12, 6))\n",
    "        plt.subplot(1, 2, 1)\n",
    "        sns.barplot(x='class_name', y='precision', data=class_metrics, color='skyblue')\n",
    "        plt.title('Precision by Class')\n",
    "        plt.xticks(rotation=45, ha='right')\n",
    "        plt.ylim(0, 1.0)\n",
    "        \n",
    "        plt.subplot(1, 2, 2)\n",
    "        sns.barplot(x='class_name', y='recall', data=class_metrics, color='lightgreen')\n",
    "        plt.title('Recall by Class')\n",
    "        plt.xticks(rotation=45, ha='right')\n",
    "        plt.ylim(0, 1.0)\n",
    "        \n",
    "        plt.tight_layout()\n",
    "        plt.savefig(FIGURES_PATH / \"precision_recall_by_class.png\", dpi=300, bbox_inches='tight')\n",
    "        plt.show()\n",
    "        \n",
    "        plt.figure(figsize=(12, 6))\n",
    "        plt.subplot(1, 2, 1)\n",
    "        sns.barplot(x='class_name', y='mAP50', data=class_metrics, color='coral')\n",
    "        plt.title('mAP@0.5 by Class')\n",
    "        plt.xticks(rotation=45, ha='right')\n",
    "        plt.ylim(0, 1.0)\n",
    "        \n",
    "        plt.subplot(1, 2, 2)\n",
    "        sns.barplot(x='class_name', y='mAP50-95', data=class_metrics, color='lightpink')\n",
    "        plt.title('mAP@0.5:0.95 by Class')\n",
    "        plt.xticks(rotation=45, ha='right')\n",
    "        plt.ylim(0, 1.0)\n",
    "        \n",
    "        plt.tight_layout()\n",
    "        plt.savefig(FIGURES_PATH / \"map_by_class.png\", dpi=300, bbox_inches='tight')\n",
    "        plt.show()\n",
    "else:\n",
    "    print(\"Evaluation metrics not found.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Qualitative Detection Examples\n",
    "\n",
    "Let's visualize some example detections from the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "source": [
    "def load_detection_examples(num_samples=5):\n",
    "    \"\"\"Load sample detections from the test set.\"\"\"\n",
    "    try:\n",
    "        # Find prediction directory (should be created by YOLOv8 val)\n",
    "        pred_dirs = list(Path(\"../runs/detect\").glob(\"val*\"))\n",
    "        if not pred_dirs:\n",
    "            print(\"No prediction results found.\")\n",
    "            return []\n",
    "        \n",
    "        latest_dir = max(pred_dirs, key=lambda p: p.stat().st_mtime)\n",
    "        pred_dir = latest_dir / \"predictions\"\n",
    "        \n",
    "        if not pred_dir.exists():\n",
    "            print(f\"Prediction directory not found: {pred_dir}\")\n",
    "            return []\n",
    "        \n",
    "        # Get prediction images\n",
    "        pred_images = list(pred_dir.glob(\"*.jpg\"))\n",
    "        if not pred_images:\n",
    "            print(\"No prediction images found.\")\n",
    "            return []\n",
    "        \n",
    "        # Sample images\n",
    "        sample_images = random.sample(pred_images, min(num_samples, len(pred_images)))\n",
    "        return sample_images\n",
    "    \n",
    "    except Exception as e:\n",
    "        print(f\"Error loading detection examples: {e}\")\n",
    "        return []\n",
    "\n",
    "def display_detection_examples(image_paths):\n",
    "    \"\"\"Display detection examples.\"\"\"\n",
    "    if not image_paths:\n",
    "        print(\"No detection examples to display.\")\n",
    "        return\n",
    "    \n",
    "    num_images = len(image_paths)\n",
    "    cols = min(3, num_images)\n",
    "    rows = (num_images + cols - 1) // cols\n",
    "    \n",
    "    plt.figure(figsize=(15, 5 * rows))\n",
    "    \n",
    "    for i, img_path in enumerate(image_paths):\n",
    "        try:\n",
    "            img = cv2.imread(str(img_path))\n",
    "            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
    "            \n",
    "            plt.subplot(rows, cols, i + 1)\n",
    "            plt.imshow(img)\n",
    "            plt.title(f\"Detection Example {i+1}\")\n",
    "            plt.axis('off')\n",
    "        except Exception as e:\n",
    "            print(f\"Error displaying detection image {img_path}: {e}\")\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.savefig(FIGURES_PATH / \"detection_examples.png\", dpi=300, bbox_inches='tight')\n",
    "    plt.show()\n",
    "\n",
    "# Load and display detection examples\n",
    "detection_examples = load_detection_examples()\n",
    "display_detection_examples(detection_examples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysis of Sim-to-Real Gap\n",
    "\n",
    "This section analyzes the strengths and weaknesses of the synthetic data approach and discusses the potential sim-to-real gap."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Strengths of Synthetic Data\n",
    "\n",
    "1. **Controlled Data Generation**: Our synthetic pipeline allows precise control over environmental variables (lighting, weather, camera pose) that would be difficult to systematically capture in real-world data collection.\n",
    "\n",
    "2. **Perfect Ground Truth**: Synthetic data provides pixel-perfect ground truth annotations without the inconsistencies or errors that can occur with manual labeling.\n",
    "\n",
    "3. **Diverse Scenarios**: We can efficiently generate a wide variety of scenarios across different İstanbul environments that would require significant time and resources to collect manually.\n",
    "\n",
    "4. **Rare Events**: We can generate scenes with unusual waste distributions or lighting conditions that might be rare in real-world data collection.\n",
    "\n",
    "5. **Privacy Compliance**: Synthetic data eliminates privacy concerns that might arise when capturing real-world imagery in public spaces."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Weaknesses of Synthetic Data\n",
    "\n",
    "1. **Realism Gap**: Despite our efforts to create realistic environments, there remains a gap between synthetic and real-world imagery in terms of texture details, lighting effects, and physics.\n",
    "\n",
    "2. **Limited Asset Diversity**: Our waste models, while diverse, still represent a subset of the full variety of waste types and appearances found in real environments.\n",
    "\n",
    "3. **Simplified Physics**: The placement and interaction of waste objects follows simplified physics models that may not perfectly match real-world scenarios.\n",
    "\n",
    "4. **Environment Simplification**: Our simulated environments capture the essence of İstanbul locations but lack the full complexity and detail of real urban environments.\n",
    "\n",
    "5. **Occlusion and Complex Scenes**: Real-world waste often appears in complex scenes with partial occlusion, varying states of degradation, and challenging contexts that are difficult to fully simulate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gap Analysis\n",
    "\n",
    "Based on our evaluation, the primary factors contributing to the sim-to-real gap include:\n",
    "\n",
    "1. **Texture and Material Realism**: Real waste items have complex textures, weathering, and deformation that are challenging to model synthetically.\n",
    "\n",
    "2. **Environmental Context**: Real environments have greater complexity in terms of background elements, ambient occlusion, and contextual placement of waste.\n",
    "\n",
    "3. **Lighting Complexity**: Real-world lighting includes subtle effects like inter-reflections, subsurface scattering, and atmospheric effects that are computationally expensive to simulate perfectly.\n",
    "\n",
    "4. **Object Variation**: Real waste objects show greater variation in appearance, condition, and positioning than our synthetic models.\n",
    "\n",
    "5. **Camera Effects**: Real drone cameras exhibit lens distortion, motion blur, and sensor noise characteristics that differ from our simulated camera."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusions and Future Work\n",
    "\n",
    "### Conclusions\n",
    "\n",
    "Our synthetic data pipeline successfully generated a diverse and balanced dataset for training waste detection models in İstanbul's urban environments. The YOLOv8 detector trained on this synthetic data achieved promising results on the synthetic test set, with an overall mAP@0.5 of X.XX and mAP@0.5:0.95 of Y.YY.\n",
    "\n",
    "The analysis of the sim-to-real gap highlights both the strengths of our approach and areas for improvement. Despite limitations in perfect realism, the synthetic dataset provides valuable training data for object detection models, especially in scenarios where collecting and annotating real-world data would be challenging.\n",
    "\n",
    "### Future Work\n",
    "\n",
    "1. **Domain Randomization**: Implement more aggressive domain randomization techniques to help the model generalize better to real-world conditions.\n",
    "\n",
    "2. **Fine-tuning on Real Data**: Collect a small set of real-world drone imagery from İstanbul for fine-tuning the synthetic-trained model, potentially using the `finetune.py` script framework we've established.\n",
    "\n",
    "3. **Improved Asset Quality**: Develop higher-fidelity 3D models of waste objects with more realistic textures, deformations, and weathering effects.\n",
    "\n",
    "4. **Environmental Complexity**: Enhance the environmental models to include more detailed urban elements specific to İstanbul.\n",
    "\n",
    "5. **Drone Camera Simulation**: Implement more realistic camera effects including motion blur, lens distortion, and sensor noise characteristics typical of drone cameras.\n",
    "\n",
    "6. **Seasonal Variations**: Extend the pipeline to include seasonal variations (snow, rain, autumn leaves) to improve robustness to different weather conditions.\n",
    "\n",
    "7. **Real-world Validation**: Conduct a comprehensive real-world validation study using a drone with the trained model to quantitatively assess the sim-to-real gap.\n",
    "\n",
    "8. **Multi-modal Learning**: Explore multi-modal approaches that combine RGB imagery with depth or thermal information for more robust waste detection."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined