<a href="https://colab.research.google.com/github/polymatvericks/plant-watering-workshop/blob/main/Black_in_Robotics_Outreach_Computer_Vision_%26_Robotics_S.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Black in Robotics Outreach: Computer Vision & Robotics

### Welcome to Your Computer Vision Journey! 🤖👁️

**What you will learn in Section 1 (20 minutes):**
- Python programming fundamentals
- How to use Google Colab and Python for computer vision

**What you will learn in Section 2 (25 minutes):**
- That images are just data you can manipulate
- How to use OpenCV library for image processing
- 3 different methods to read Images (url, camera, and colab upload)
- Understanding the 3 different Image Channels for RGB images.
- Cropping and Annotating Images
- Splitting and Merging Images

**What you will learn in Section 3 (25 minutes):**
- Videos are just picture frames
- Basics of Object Detection and the Pipeline
- Using Yolo pretrained model for object detection in uploaded images
- Using Yolo pretrained model for object detection on live video footage from the webcam

---

**Important Notes:**
- Read each markdown section carefully before running code
- Follow the TO-DO lists in each section
- Fill in the missing code where you see `# TODO:` comments
- Ask for help if you get stuck!

---

## 📚 Setting Up Our Digital Lab

### Understanding Python Libraries (Modules)

Think of Python libraries like apps on your phone. Each library gives Python special abilities:

- **OpenCV (`cv2`)**: The computer vision powerhouse - reads, processes, and saves images
- **NumPy (`np`)**: Mathematical operations on image data (images are just arrays of numbers!)
- **Matplotlib (`plt`)**: Creates beautiful visualizations and displays our images
- **Google Colab tools**: Allows us to upload and download files

### TO-DO List for This Section:
- [ ] Run the import code cell below
- [ ] Make sure you see the success messages
- [ ] If you get errors, raise your hand for help
- [ ] Understand that `import` gives us access to pre-built tools

In [None]:
# LIBRARY IMPORTS - Your Computer Vision Toolkit

import cv2                       # OpenCV for computer vision
import numpy as np               # NumPy for numerical operations
import matplotlib.pyplot as plt  # Matplotlib for displaying images
from google.colab import files   # For uploading files
import urllib.request            # For downloading images from URLs
import io
from PIL import Image
import os
from datetime import datetime

# Set up matplotlib for proper image display
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 12

print("🎉 Computer Vision Toolkit Ready!")
print("✅ OpenCV imported - for image processing")
print("✅ NumPy imported - for numerical operations")
print("✅ Matplotlib imported - for visualization")
print("✅ Colab tools imported - for file operations")

### Helper Functions

Helper Functions are just there to make life easier for us - just run them and move on ASAP 😉

In [None]:
# HELPER FUNCTIONS - Making Image Display Easy

def show_image(image, title="Image", figsize=(10, 6), show_coords=False):
    """
    Display an image with proper color conversion and formatting

    Parameters:
    - image: the image array to display
    - title: title for the image
    - figsize: size of the display
    - show_coords: whether to show coordinate grid
    """
    plt.figure(figsize=figsize)

    # Check if image is grayscale (2D) or color (3D)
    if len(image.shape) == 2:  # Grayscale
        plt.imshow(image, cmap='gray')
    else:  # Color image
        # Convert BGR to RGB for proper display
        plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    plt.title(title, fontsize=16, fontweight='bold')

    if show_coords:
        plt.xlabel('X (Column) →', fontsize=12)
        plt.ylabel('Y (Row) ↓', fontsize=12)
        plt.grid(True, alpha=0.3)
    else:
        plt.axis('off')

    plt.tight_layout()
    plt.show()

def show_images_grid(images, titles, rows=1, cols=2, figsize=(15, 6)):
    """
    Display multiple images in a grid layout
    """
    fig, axes = plt.subplots(rows, cols, figsize=figsize)

    if rows * cols == 1:
        axes = [axes]
    elif rows == 1 or cols == 1:
        axes = axes.flatten() if hasattr(axes, 'flatten') else [axes]
    else:
        axes = axes.flatten()

    for i, (img, title) in enumerate(zip(images, titles)):
        if i < len(axes):
            if len(img.shape) == 2:  # Grayscale
                axes[i].imshow(img, cmap='gray')
            else:  # Color
                axes[i].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
            axes[i].set_title(title, fontsize=12, fontweight='bold')
            axes[i].axis('off')

    # Hide unused subplots
    for i in range(len(images), len(axes)):
        axes[i].axis('off')

    plt.tight_layout()
    plt.show()

def print_image_info(image, name="Image"):
    """Print detailed information about an image"""
    print(f"📊 {name} Information:")
    print(f"   Shape: {image.shape}")
    print(f"   Data type: {image.dtype}")
    if len(image.shape) == 3:
        print(f"   Channels: {image.shape[2]} (Color)")
    else:
        print(f"   Channels: 1 (Grayscale)")
    print(f"   Size: {image.size} pixels")
    print(f"   Min value: {image.min()}")
    print(f"   Max value: {image.max()}")
    print()

print("🛠️ Helper functions ready!")

---

## 🐍 Breakout 1: Python Fundamentals for Computer Vision (15 minutes)

### Welcome to Python Programming!

Before we can teach robots to see, we need to master the language they understand: **Python**!

**Why Python for Robotics?**
- 🤖 **Easy to learn**: Simple, readable syntax
- 🔬 **Powerful libraries**: Tons of pre-built tools for robotics
- 🌍 **Industry standard**: Used by major tech companies
- 🚀 **Great for beginners**: Perfect first programming language

**What We'll Cover:**
1. **Variables & Basic Operations** - Storing and manipulating data
2. **Lists & Slicing** - Working with collections of data
3. **Functions** - Creating reusable code blocks
4. **Importing Packages** - Using pre-built tools
5. **NumPy Arrays** - The foundation of image data

---

### 1️⃣ Variables and Basic Operations

**What are Variables?** Think of variables as labeled boxes or containers 🎁 that store information. These information can later be updated or changed.

**Python Data Types:**
- **Integers**: Whole numbers (42, -7, 0)
- **Floats**: Decimal numbers (3.14, -2.5, 0.0)
- **Strings**: Text ("Hello Robot!", 'Python')
- **Booleans**: True or False values

**Basic Operations:**
- **Math**: +, -, *, /, ** (power), % (remainder)
- **Comparison**: ==, !=, <, >, <=, >=
- **String**: + (concatenation), * (repetition)


In [None]:
# Variables and Basic Operations Examples

# Integer variables
robot_age = 5

# Float variables
battery_voltage = 3.7

# String variables
robot_name = "Vision Bot"

# Boolean variables
is_robot_active = True

### Accessing/Reading Values

In [None]:
# Accessing/Reading Values
print("Accessing just variable values\n")
print(robot_age)
print(robot_name)
print(battery_voltage)
print(is_robot_active)

### Or Displaying them with texts

Using ```print(f"Some text + {variable_name}")```

In [None]:
# Accessing/Reading Values
print("Accessing variable values with texts\n")
print(f"Robot Age: {robot_age}")
print(f"Robot Name: {robot_name}")
print(f"Battery Voltage: {battery_voltage}")
print(f"Is Robot Active: {is_robot_active}")

### Updating the Values of Variables

In [None]:
# Basic math operations

# Integers
robot_age_updated = 5 + 5
age_in_months = robot_age * 12

# String operations
greeting = "Hello, I am " + robot_name

In [None]:
# Display results
print("Updating variable values\n")
print(f"Updated Robot Age: {robot_age_updated}")
print(f"Age in months: {age_in_months}")
print(f"Greeting: {greeting}")

### 🎯 Exercise 1.1: Your Turn with Variables!

**Your Mission:** Create variables about yourself and perform operations with them.

**TO-DO List:**
- [ ] Create variables for your personal information
- [ ] Perform mathematical operations
- [ ] Use string concatenation
- [ ] Display results with print statements
- [ ] Experiment with different data types

In [None]:
# 🎯 Exercise 1.1: Create Your Own Variables

# TODO: Create variables about yourself
your_name =           # Put your name here
your_age =             # Put your age here
favorite_subject =    # Put your favorite subject here
hours_of_sleep =     # Average hours you sleep per night

print("👋 About Me:\n")
print(f"___: {your_name}")              # TODO: Fill in "Name"
print(f"Age: {} years old!)")           # TODO: Fill in your age variable
print(f"___: {favorite_subject}")       # TODO: Fill in appropriate label

In [None]:
# TODO: Perform mathematical operations on some of the variables above
age_in_days = your_age * ___    # Calculate your age in days (hint: how many days are in a year)
sleep_per_week = hours_of_sleep * ___  # Hours of sleep per week (hint: how many days in a week)


# TODO: Create string combinations using the variables you created earlier or just an integer where neccessary
introduction = "Hi! My name is " + ___  # Complete the introduction using your name variable
subject_love = "I love " + ___ + "!"     # Express your love for your subject using your favourite subject variable

# TODO: Display your results
print(f"Sleep: {hours_of_sleep} hours/night ({sleep_per_week} hours/week)")
print(f"Introduction: {introduction}")
print(f"Subject: {subject_love}")

### 2️⃣ Lists and Slicing

**What are Lists?** Collections of items stored in order. Perfect for storing multiple pieces of related data!

**List Basics:**
- **Creating**: `my_list = [1, 2, 3, 4, 5]`
- **Accessing**: `my_list[0]` (first item), `my_list[-1]` (last item)
- **Length**: `len(my_list)`
- **Adding**: `my_list.append(item)`

**Slicing Magic:**
- **Basic slice**: `my_list[start:end]`
- **From beginning**: `my_list[:3]` (first 3 items)
- **To end**: `my_list[2:]` (from index 2 to end)
- **Step**: `my_list[::2]` (every 2nd item)

**Why Important for Images?** Images are like 2D lists of pixel values!

**Lists and Slicing Examples:**

In [None]:
# Create lists of robot-related data
sensor_readings = [23, 45, 67, 89, 12, 34, 56, 78]
robot_names = ["R2D2", "WALL-E", "Optimus", "BB-8", "C3PO"]

# Basic list operations
print("📊 Basic List Information\n")
print(f"Sensor readings: {sensor_readings}")
print(f"Number of sensors: {len(sensor_readings)}")
print(f"First reading: {sensor_readings[0]}")
print(f"Last reading: {sensor_readings[-1]}")
print(f"Robot names: {robot_names}")

In [None]:
# List slicing examples
print("\n✂️ Slicing Examples:")
print(f"First 3 sensors: {sensor_readings[:3]}")
print(f"Last 3 sensors: {sensor_readings[-3:]}")
print(f"Middle readings: {sensor_readings[2:6]}")
print(f"Reversed list: {sensor_readings[::-1]}")

### 🎯 Exercise 1.2: Master Lists and Slicing!

**Your Mission:** Work with lists of data and practice slicing techniques.

**TO-DO List:**
- [ ] Create your own lists
- [ ] Practice accessing individual items
- [ ] Use slicing to get parts of lists
- [ ] Perform operations on list data
- [ ] Understand how this relates to image processing

In [None]:
# 🎯 Exercise 1.2: Practice with Lists and Slicing

# TODO: Create your own lists
favorite_foods = ["___", "___", "___"]  # Fill with your favorite foods
test_scores = [85, 92, 78, 91, 87]  # Pretend test scores
pixel_row = [0, 50, 100, 150, 100, 50]  # Simulated pixel values

# TODO: Basic list access
first_food = favorite_foods[___]     # Get the first food (index?)
last_food = favorite_foods[___]      # Get the last food (hint: -1)
middle_score = test_scores[___]      # Get the middle score using its index

print("🍕 Food Preferences:")
print(f"All foods: {favorite_foods}")
print(f"First favorite: {first_food}")
print(f"Last favorite: {last_food}")
print(f"Total favorites: {len(favorite_foods)}")

In [None]:
# TODO: Practice slicing
top_3_foods =     # Get first 3 foods
last_2_foods =       # Get last 2 foods
first_half_pixel_values =  # Get first half of pixel values

print(f"✂️ Slicing Practice\n")
print(f"Top 3 foods: {top_3_foods}")
print(f"Last 2 foods: {last_2_foods}")
print(f"First half pixel values: {first_half_pixel_values}")

### 3️⃣ Functions - Building Reusable Code

**What are Functions?** Reusable blocks of code that perform specific tasks. Like having a robot assistant that follows your instructions!

**Function Structure:**
```python
def function_name(parameters):
    """Description of what the function does"""
    # Code that does the work
    return result  # Optional: return a value
```

**Why Functions Are Amazing:**
- 🔄 **Reusability**: Write once, use many times
- 🧹 **Organization**: Keep code clean and readable
- 🐛 **Debugging**: Easier to find and fix problems
- 👥 **Teamwork**: Share code with others

**Example Code:**

In [None]:
# Functions Examples - Building Robot Helpers

# greeting
def greet_robot(robot_name, mission):
    """Create a personalized greeting for a robot"""
    greeting = f"Hello {robot_name}! Ready for mission: {mission}?"
    return greeting

# addition
def sum_numbers(a, b):
    """Add two numbers together"""
    result = a + b
    return result

In [None]:
# Test our functions
print("🤖 Testing Robot Functions:")

# Test greeting function
robot_name = "Vision Bot"
robot_mission = "Save the planet"

robot_greeting = greet_robot(robot_name, robot_mission)

# or pass in the arguments directly
# robot_greating = greet_robot("Bir Bot", "Save the planet earth")

print(robot_greeting)

# Test addition function
sum_result = sum_numbers(5, 10)
print(f"Sum: {sum_result}")

### 🎯 Exercise 1.3: Build Your Own Functions!

**Your Mission:** Create your own functions for robotics and image processing.

**TO-DO List:**
- [ ] Complete the provided function templates
- [ ] Create your own custom function
- [ ] Test all functions with different inputs

In [None]:
# 🎯 Exercise 1.3: Build Your Own Robot Functions

def subtractionn(a, b):
    """Subtract the 2nd argument from the first"""
    # TODO: Complete this function

    result = ___ - ___

    return result

def get_first(foodlist):
    """Get the first element in a list"""
    # TODO: Complete this function
    result = ___[___] # Remember to first specify the name of the list variable and then the index of the first element

    return result

# testing out your functions
print("🤖 Testing Your Functions:")

birth_year =
current_year =

age = subtractionn(current_year, birth_year)

print(f"Age: {age}")

# testing our the food list

# TODO: create a list containing your favoirite food
fav_food_list =  # remeber to use square brackets and for the whole list and commas "," to seperate individual values

first_food = get_first(fav_food_list)
print(f"First food: {first_food}")

print(f"\n🎉 Great job creating functions! These are the building blocks of programming!")

### 4️⃣ Importing Packages - Using Pre-Built Tools

**What are Packages?** Collections of functions and tools that other programmers have created. Instead of reinventing the wheel, we use their work!

**Why Import Packages?**
- ⚡ **Speed**: Don't write everything from scratch
- 🛡️ **Reliability**: Well-tested, proven code
- 🌟 **Features**: Advanced capabilities beyond basic Python
- 🤝 **Community**: Benefit from thousands of developers' work

**Common Import Patterns:**
```python
import package_name
import package_name as nickname
from package_name import specific_function
from package_name import *  # Import everything (use carefully!)
```

**NumPy Introduction:**
NumPy (Numerical Python) is the foundation of scientific computing in Python. Perfect for handling image data!

**Example Code:**

In [None]:
# Importing Packages and NumPy Introduction

# Different ways to import
import math                    # Import entire math module
import numpy as np            # Import numpy with alias 'np'
from random import randint    # Import specific function

print("📦 Package Import Examples\n")

# Using random function
random_number = randint(1, 100)
print(f"Random function: Generated {random_number}")

### Numpy Exploration

In [None]:
# Create arrays (like lists, but much more powerful)
simple_list = [1, 2, 3, 4, 5]
numpy_array = np.array([1, 2, 3, 4, 5])

print(f"Python list: {simple_list}")
print(f"NumPy array: {numpy_array}")
print(f"Array type: {type(numpy_array)}")

# Array operations (much faster than lists for math!)
doubled = numpy_array * 2        # Multiply every element by 2
squared = numpy_array ** 2       # Square every element
added = numpy_array + 10         # Add 10 to every element

print(f"\n🔢 Array Operations:")
print(f"Original: {numpy_array}")
print(f"Doubled:  {doubled}")
print(f"Squared:  {squared}")
print(f"Plus 10:  {added}")

### More Numpy

In [None]:
# Creating special arrays (very useful for images!)
zeros = np.zeros(5)           # Array of zeros
ones = np.ones(4)             # Array of ones
full = np.full(6, 255)        # Array filled with specific value
range_array = np.arange(0, 10, 2)  # Array with range of values

print(f"\n🏗️ Special Arrays\n")
print(f"Zeros: {zeros}")
print(f"Ones: {ones}")
print(f"Full (255): {full}")
print(f"Range: {range_array}")

# 2D arrays - this is how images work!
image_like = np.array([[0, 128, 255],
                       [64, 192, 100],
                       [255, 0, 200]])

print(f"\n🖼️ 2D Array (like a tiny image)\n")
print(image_like)
print(f"Shape: {image_like.shape}") #  (3 rows, 3 columns)

print(f"\nThis could represent a 3x3 pixel image!")

### 5️⃣ NumPy Arrays and Slicing - Image Data Foundation

**NumPy Arrays vs Python Lists:**
- 🚀 **Speed**: 10-100x faster for mathematical operations
- 🧮 **Vectorization**: Apply operations to entire arrays at once
- 📐 **Multi-dimensional**: Perfect for images (2D) and videos (3D)
- 🔧 **Specialized functions**: Built for scientific computing

**Array Slicing:**
- **1D**: `array[start:end:step]`
- **2D**: `array[row_start:row_end, col_start:col_end]`
- **Advanced**: Boolean indexing, fancy indexing

**Why This Matters for Images:**
Images are just 2D or 3D NumPy arrays of pixel values!

**Example Code:**

In [None]:
# NumPy Arrays and Slicing - The Heart of Image Processing

print("🔢 Advanced NumPy for Image Processing:")

# Create a 2D array representing a small grayscale image
# Values from 0 (black) to 255 (white)
mini_image = np.array([[0,   50,  100, 150, 200],
                       [25,  75,  125, 175, 225],
                       [50,  100, 150, 200, 255],
                       [75,  125, 175, 225, 200],
                       [100, 150, 200, 255, 150]])

print(f"Mini image array\n")
print(mini_image)
print(f"\nShape: {mini_image.shape} (height, width)")
print(f"Data type: {mini_image.dtype}")

In [None]:
# Array slicing examples

# Get specific elements
top_left = mini_image[0, 0]           # First row, first column
bottom_right = mini_image[-1, -1]     # Last row, last column
center = mini_image[2, 2]             # Center pixel

print(f"\n✂️ Array Slicing Examples\n")
print(f"Top-left pixel: {top_left}")
print(f"Bottom-right pixel: {bottom_right}")
print(f"Center pixel: {center}")

In [None]:
# Get rows and columns
first_row = mini_image[0, :]          # First row, all columns
last_column = mini_image[:, -1]       # All rows, last column

print(f"\nFirst row: {first_row}")
print(f"Last column: {last_column}")

# Get sub-regions (like cropping an image!)
top_left_quad = mini_image[:2, :2]    # Top-left 2x2 region
center_region = mini_image[1:4, 1:4]  # 3x3 center region

print(f"\nTop-left quadrant:")
print(top_left_quad)
print(f"\nCenter region:")
print(center_region)

In [None]:
# Array operations (like image filters!)
print(f"\n🎛️ Image Operations\n")

# Brighten the image (add value to all pixels)
brightened = mini_image + 60
brightened_clip = np.clip(brightened, 0, 255)  # Keep values in valid range

# Darken the image (multiply all pixels)
darkened = mini_image * 0.7
darkened = darkened.astype(np.uint8)  # Convert back to integer

print(f"Original: \n{mini_image}\n")
print(f"Brightened: \n{brightened}\n")
print(f"Brightened clipped: \n{brightened_clip}\n")
print(f"Darkened: \n{darkened}")

### 🎯 Exercise 1.4: Master NumPy for Images!

**Your Mission:** Work with NumPy arrays like a real computer vision engineer.

**TO-DO List:**
- [ ] Create and manipulate 2D arrays
- [ ] Practice array slicing techniques
- [ ] Apply mathematical operations to arrays
- [ ] Understand how this connects to real image processing
- [ ] Experiment with different array operations

In [None]:
# 🎯 Exercise 1.4: NumPy Arrays for Image Processing

# TODO: Create your own "image" array
# Create a 6x6 array with values that create an interesting pattern
my_image = np.array([[___,  ___, ___, ___],   # Fill in values 0-255
                     [___,  ___, ___, ___],   # Try to create a pattern!
                     [___,  ___, ___, ___],   # Maybe a gradient or shape?
                     [___,  ___, ___, ___]])

print("🖼️ Your Image Array\n")

In [None]:
# TODO: Print out your image array


In [None]:
# TODO: Print out the shape of the image


In [None]:
# TODO: Practice array access

top_left_arr = # Top-left corner
top_right_arr = # Top-right corner
bottom_left_arr = # Bottom-left corner
bottom_right_arr = # Bottom-right corner


# TODO: Practice slicing
# Extract different regions of your image
top_half =        # Top half of image
left_half =       # Left half of image
center_square =   # Center 2x2 square

print(f"\n✂️ Image Regions\n")
print(f"Top-left array:\n {top_left}")
print(f"Top half shape: {top_half.shape}")
print(f"Left half array:\n {left_half}")
print(f"Left half shape: {left_half.shape}")
print(f"Center square\n")
print(center_square)

In [None]:
# TODO: Apply image operations
# Simulate different image processing effects

# Brighten the image
brightened = my_image + ___    # Add a value (try 50)
brightened = np.clip(brightened, 0, 255)  # Keep in valid range

# Create a negative image
negative = ___ - my_image      # Subtract from max value (255)

print(f"\n🎨 Image Processing Results\n")
print(f"Original image:\n {my_image}\n")
print(f"Brightened image:\n {brightened}\n")
print(f"Negative image:\n {negative}\n")

print(f"\n🏆 Congratulations! You're now manipulating images like a computer vision expert!")
print(f"Every operation you just did is used in real image processing applications!")

### Visualizing Image Data

In [None]:
# Create sample image data
sample_image = np.array([[0,   64,  128, 192, 0],
                        [32,  96,  160, 224, 200],
                        [64,  128, 192, 255, 150],
                        [96,  160, 224, 200, 100],
                        [128, 192, 255, 150, 50]])

# display this array as an image
show_image(sample_image, "Sample Image", show_coords=True)

### 🎯 Excercise 1.5: Create Your Canvas and Explore Shades!

**🧠 Your Mission:**  
Design different canvas patterns and explore how pixel values translate into visual effects.

**✨ What You'll Learn:**
- How to create arrays that represent image data  
- How varying pixel values (0–255) produce different shades  
- How to visualize single-channel (grayscale) images  
- The link between numerical values and what we see visually  

**✅ TO-DO List:**
- [ ] Create basic canvases with various pixel values, like the examples below: ![Canvas Example](https://drive.google.com/uc?export=view&id=1s68essuksR5E8zjBSMqLNeJJjGLd5LI5)
- [ ] Experiment with grayscale shades (0 = black, 255 = white)  
- [ ] Design patterns by combining different pixel values  
- [ ] Discover how computers interpret images as numbers



In [None]:
# 🎯 Activity 1.2: Image Slicing and Value Writing

print("✂️ Activity 1.2: Mastering Image Slicing and Value Writing")

# TODO: Create your own image that is similar to the one above
my_robot_face = np.array([[_,   _,  _, _, _],
                        [_,  _,  _, _, _],
                        [_,  _, _, _, _],
                        [_,  _, _, _, _],
                        [_, _, _, _, _]])

# display this array as an image
show_image(my_robot_face, "My Robot Face", show_coords=True)

### Understanding Images as Data (Working with actual Images)

**🧠 Key Concept:** Images are just arrays of numbers!

**Grayscale Images:**
- 2D array of pixel values
- Each pixel: 0 (black) to 255 (white)
- Shape: (height, width)

**Color Images:**
- 3D array with color channels
- Each pixel has 3 values: Red, Green, Blue (RGB)
- Shape: (height, width, 3)
- Each channel: 0 to 255

**OpenCV Color Format:**
- Uses BGR (Blue, Green, Red) instead of RGB
- We need to convert for proper display with matplotlib

---
## 🖼️ Breakout 2: Working with Actual Images Computer Vision (15 minutes)

###  Part 1: Three Ways to Load Images

### Method 1: Upload from Your Computer
Upload any image from your device to work with.
---

In [None]:
# ===============================================================================
# METHOD 1: UPLOAD FROM COMPUTER
# ===============================================================================

print("📂 Method 1: Upload Image from Your Computer")
print("Click 'Choose Files' and select an image from your device")

# Upload file
uploaded = files.upload()

if uploaded:
    # Get the first uploaded file
    image_filename = list(uploaded.keys())[0]

    # Read the image using OpenCV
    uploaded_image = cv2.imread(image_filename)

    if uploaded_image is not None:
        print(f"✅ Successfully loaded: {image_filename}")
        print_image_info(uploaded_image, "Uploaded Image")
        show_image(uploaded_image, f"Uploaded Image: {image_filename}", show_coords=True)
    else:
        print("❌ Error loading image. Please try a different file.")
else:
    print("No file uploaded. Let's continue with other methods.")

### Method 2: Download from URL
Load images directly from the internet using a web address.

In [None]:
# METHOD 2A: FROM GOOGLE COLAB
# Upload the image
# get the URL

image_url = "/content/obj_image2.png"

colab_image = cv2.imread(image_url)

if colab_image is not None:
    print("✅ Successfully downloaded and loaded image")
    print_image_info(colab_image, "Downloaded Image")
    show_image(colab_image, "Downloaded Image from URL")

else:
    print("❌ Image downloaded but failed to load")

In [None]:
# METHOD 2b: DOWNLOAD FROM URL

print("🌐 Method 2: Download Image from URL")

# Single image URL to download
image_url = "https://drive.google.com/uc?export=download&id=1xLtag2U7vCf_YiNjifMdycd-1ofTBOFq"


# Download and load the image
url_image = None

print("📥 Downloading image from URL...")
urllib.request.urlretrieve(image_url, "downloaded_image.jpg")
url_image = cv2.imread("downloaded_image.jpg")

if url_image is not None:
    print("✅ Successfully downloaded and loaded image")
    print_image_info(url_image, "Downloaded Image")
    show_image(url_image, "Downloaded Image from URL")
else:
    print("❌ Image downloaded but failed to load")

### Method 3: Capture from Webcam
Take a live photo using your device's camera.

In [None]:
# METHOD 3: WEBCAM CAPTURE

from IPython.display import HTML, display, Javascript
from google.colab.output import eval_js
from base64 import b64decode
import ipywidgets as widgets

print("📷 Method 3: Capture from Webcam")

# Create output areas for camera interface
camera_output = widgets.Output()
photo_output = widgets.Output()

display(widgets.VBox([camera_output, photo_output]))

# Webcam capture interface
with camera_output:
    display(HTML('''
    <div style="text-align: center; padding: 20px;">
        <h3>📷 Webcam Photo Capture</h3>
        <div id="camera-status" style="margin: 10px 0; color: #666;">
            Click "Start Camera" to begin
        </div>

        <div id="video-container" style="display:none; margin: 20px 0;">
            <video id="webcam-video" width="400" height="300" autoplay
                   style="border: 3px solid #4285F4; border-radius: 10px;"></video>
        </div>

        <div style="margin: 20px 0;">
            <button id="start-camera-btn" onclick="startWebcam()"
                    style="padding: 12px 24px; font-size: 16px; background: #4285F4;
                           color: white; border: none; border-radius: 6px; cursor: pointer;">
                🎥 Start Camera
            </button>

            <button id="capture-photo-btn" onclick="capturePhoto()"
                    style="padding: 12px 24px; font-size: 16px; background: #34A853;
                           color: white; border: none; border-radius: 6px; margin-left: 10px;
                           cursor: pointer; display: none;">
                📸 Capture Photo
            </button>

            <button id="retake-btn" onclick="retakePhoto()"
                    style="padding: 12px 24px; font-size: 16px; background: #EA4335;
                           color: white; border: none; border-radius: 6px; margin-left: 10px;
                           cursor: pointer; display: none;">
                🔄 Retake
            </button>
        </div>
    </div>

    <script>
    let webcamStream = null;
    const video = document.getElementById('webcam-video');
    const status = document.getElementById('camera-status');
    const videoContainer = document.getElementById('video-container');
    const startBtn = document.getElementById('start-camera-btn');
    const captureBtn = document.getElementById('capture-photo-btn');
    const retakeBtn = document.getElementById('retake-btn');

    async function startWebcam() {
        try {
            status.textContent = "🔄 Starting camera...";
            webcamStream = await navigator.mediaDevices.getUserMedia({
                video: { width: 400, height: 300 }
            });

            video.srcObject = webcamStream;
            videoContainer.style.display = 'block';
            startBtn.style.display = 'none';
            captureBtn.style.display = 'inline-block';
            status.textContent = "📹 Camera ready! Click 'Capture Photo' when ready.";

        } catch (error) {
            status.textContent = "❌ Camera access denied or not available.";
            console.error('Camera error:', error);
        }
    }

    function capturePhoto() {
        const canvas = document.createElement('canvas');
        canvas.width = video.videoWidth || 400;
        canvas.height = video.videoHeight || 300;

        const ctx = canvas.getContext('2d');
        ctx.drawImage(video, 0, 0);

        const imageData = canvas.toDataURL('image/jpeg', 0.8);

        // Stop camera
        if (webcamStream) {
            webcamStream.getTracks().forEach(track => track.stop());
        }

        videoContainer.style.display = 'none';
        captureBtn.style.display = 'none';
        retakeBtn.style.display = 'inline-block';
        status.textContent = "📸 Photo captured successfully!";

        // Send to Python
        google.colab.kernel.invokeFunction('process_webcam_photo', [imageData], {});
    }

    function retakePhoto() {
        retakeBtn.style.display = 'none';
        startBtn.style.display = 'inline-block';
        status.textContent = "Click 'Start Camera' to take another photo.";
    }
    </script>
    '''))

def process_webcam_photo(image_data):
    """Process the captured webcam photo"""
    try:
        # Decode the base64 image
        image_bytes = b64decode(image_data.split(',')[1])

        # Save the image
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f'webcam_capture_{timestamp}.jpg'

        with open(filename, 'wb') as f:
            f.write(image_bytes)

        # Convert to OpenCV format
        pil_image = Image.open(io.BytesIO(image_bytes))
        webcam_image = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)

        # Display the captured photo
        with photo_output:
            photo_output.clear_output()
            print(f"📸 Photo captured at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
            print_image_info(webcam_image, "Webcam Photo")
            show_image(webcam_image, "Your Webcam Photo")
            print(f"💾 Saved as: {filename}")

        return webcam_image

    except Exception as e:
        with photo_output:
            photo_output.clear_output()
            print(f"❌ Error processing photo: {str(e)}")
        return None

# Register the callback function
try:
    from google.colab import output
    output.register_callback('process_webcam_photo', process_webcam_photo)
except:
    print("Note: Webcam capture requires Colab environment")

### 🎯 Exercise 2.1: Choose Your Image and Explore!

**Your Mission:** Select one of the three methods above to load an image, then explore its properties.

**TO-DO List:**
- [ ] Successfully load an image using any method
- [ ] Examine the image information (shape, size, data type)
- [ ] Understand the difference between color and grayscale images
- [ ] Experiment with different images


In [None]:
# 🎯 Exercise 1.1: Image Loading Practice

# TODO: Choose one of the images loaded above and assign it to 'my_image'
# You can use: uploaded_image, url_image, or webcam_image (if captured)

my_image =  # TODO: Input with your chosen image variable name

# TODO: Check if you successfully selected an image
if my_image is not None:
    print("✅ Image selected successfully!")

    # TODO: Display your chosen image with your own title
    show_image(my_image, "")

    # TODO: Print detailed information about your image
    print_image_info(my_image, "My Image")

    # TODO: Answer these questions by examining the output above:
    print("🤔 Questions to Answer:")
    print("1. What is the shape of your image? (height, width, channels)")
    print("2. Is your image color or grayscale?")
    print("3. What are the minimum and maximum pixel values?")
    print("4. How many total pixels does your image contain?")

else:
    print("❌ No image selected. Please choose one from the methods above.")
    print("💡 Tip: Make sure to run one of the image loading methods first!")


---

### 🌈 Part 2: Understanding Color Channels (RGB)

### What are Color Channels?

Every color image is made up of three separate **channels**:
- **Red Channel**: Contains the red intensity for each pixel
- **Green Channel**: Contains the green intensity for each pixel  
- **Blue Channel**: Contains the blue intensity for each pixel

**Key Points:**
- Each channel is a grayscale image (0-255 values)
- Combining all three channels creates the full color image
- OpenCV uses **BGR order** (Blue, Green, Red) instead of RGB
"""


In [None]:
# WORKING WITH COLOR CHANNELS

# Use the selected image or create a sample if none available
work_image = url_image.copy()

print("🌈 Working with Color Channels")
print_image_info(work_image, "Working Image")
show_image(work_image, "Working Image", show_coords=True)

In [None]:
# Split the image into separate channels
b_channel, g_channel, r_channel = cv2.split(work_image)

print("📊 Channel Information:")
print(f"Blue channel shape: {b_channel.shape}")
print(f"Green channel shape: {g_channel.shape}")
print(f"Red channel shape: {r_channel.shape}")

In [None]:
# Display individual channels
show_images_grid(
    [work_image, b_channel, g_channel, r_channel],
    ["Original Image", "Blue Channel", "Green Channel", "Red Channel"],
    rows=1, cols=4, figsize=(18, 5)
)

In [None]:
# Create colored versions of each channel for better visualization
merged_channels = cv2.merge([b_channel, g_channel, r_channel])

# display the merged channels
show_image(merged_channels, "Merged Channels", figsize=(10, 6))

### 🎯 Exercise 2.2: Channel Exploration Challenge!

**Your Mission:** Experiment with color channels to understand how they work.

**TO-DO List:**
- [ ] Split your image into color channels
- [ ] Create custom channel manipulations
- [ ] Understand how channels combine to form colors
- [ ] Create artistic effects using channel operations

In [None]:
# 🎯 Exercise 1.2: Channel Manipulation Practice

print("🎯 Exercise 1.2: Your Turn with Color Channels!")

# TODO: Use your selected image for channel work
exercise_image =  # TODO: create a copy of the image you want to use for this exercise

show_image(exercise_image, "Exercise Image")

In [None]:
# TODO: Split the image into individual channels
b_chan, g_chan, r_chan =   # TODO: Complete this line - which open cv function are we to use

print("📊 Your Channel Analysis:")
print(f"Original image shape: {exercise_image.shape}")
print(f"Blue channel shape: {b_chan.shape}")

In [None]:
# TODO: Display each channel individually
show_images_grid(
    [b_chan, g_chan, r_chan],
    ["Your Blue Channel", "Your Green Channel", "Your Red Channel"],
    rows=1, cols=3
)

In [None]:
# TODO: Merge the channels and create custom channel effects
merged_channels = cv2.merge([b_chan, g_chan, r_chan])


# TODO: display the merged channels
show_image(merged_channels, "Merged Channels", figsize=(10, 6))

# TODO: Answer these questions:
print("\n🤔 Channel Questions:")
print("1. What happens when you reorder the channels (e.g brg or rrb)?")

### ✂️ Part 3: Image Cropping and Region Selection


**Image Coordinate System:**
- **Origin (0,0)** is at the **top-left** corner
- **X-axis** goes **right** (columns)
- **Y-axis** goes **down** (rows)
- **Indexing**: `image[y:y+height, x:x+width]`

**Cropping Syntax:**
```python
# Basic crop: image[start_row:end_row, start_col:end_col]
cropped = image[100:300, 50:250]  # y=100-300, x=50-250

# With channels: image[start_row:end_row, start_col:end_col, :]
cropped_color = image[100:300, 50:250, :]

---

### 🎯 Exercise 2.3: Master Image Cropping!

**Your Mission:** Practice precise cropping and region selection by cropping the rgb image to its individual strips

**TO-DO List:**
- [ ] Understand image coordinate system
- [ ] Practice basic cropping operations
- [ ] Use the crop function with different parameters
- [ ] Create interesting compositions from crops

In [None]:
# view the details of the rgb image
print_image_info(url_image, "Image for Cropping")
show_image(url_image, "Original Image for Cropping", show_coords=True)

In [None]:
# TODO: by looking at the shape and axis, fill in the below

# Correct boundaries based on image height = 600 (divide by 3 = 200 pixels per stripe)
blue_end =        # TODO: Blue stripe ends at row
green_start =     # TODO: Green stripe starts at row
green_end =       # TODO: Green stripe ends at row
red_start =       # TODO: Red stripe starts at row
red_end =         # TODO: Red stripe ends at row

# Correct slicing assignments
blue_strip = url_image[0:blue_end, :, :]                    # Rows
green_strip = url_image[green_start:green_end, :, :]        # Rows
red_strip = url_image[red_start:red_end, :, :]              # Rows

# Display all strips side by side for comparison
show_images_grid(
    [blue_strip, green_strip, red_strip],
    ["Blue", "Green", "Red"],
    rows=1, cols=3, figsize=(18, 6)
)

In [None]:
# TODO: Analyze your crops

print(f"\n📊 Your Crop Analysis\n")
print(f"Crop 1 shape: {}") # TODO: complete to get the shape of the blue strip
print(f"Crop 2 shape: {}") # TODO: complete to get the shape of the blue strip
print(f"Crop 3 shape: {}") # TODO: complete to get the shape of the blue strip

### 🎨 Part 4: Drawing and Annotation

OpenCV provides powerful drawing functions to add:
- **Lines and Shapes**: Rectangles, circles, lines
- **Text**: Labels and annotations  
- **Markers**: Points and crosshairs

**Key Drawing Functions:**
- `cv2.rectangle(img, pt1, pt2, color, thickness)`: Draw rectangles
- `cv2.circle(img, center, radius, color, thickness)`: Draw circles
- `cv2.line(img, pt1, pt2, color, thickness)`: Draw lines
- `cv2.putText(img, text, org, fontFace, fontScale, color, thickness, lineType)`: Add text

In [None]:
# DRAWING AND ANNOTATION

print("🎨 Drawing and Annotation on Images")

# Create a clean canvas for drawing
canvas = np.full((400, 600, 3), 255, dtype=np.uint8)

# show canvas
show_image(canvas, "Canvas for Drawing", show_coords=True)

In [None]:
# Drawing parameters
color_red = (0, 0, 255)      # BGR format: Red
color_green = (0, 255, 0)    # BGR format: Green
color_blue = (255, 0, 0)     # BGR format: Blue
color_white = (255, 255, 255) # BGR format: White
color_black = (0, 0, 0) # BGR format: Black
thickness = 3

print("🖊️ Drawing Basic Shapes:")

# Draw a rectangle
cv2.rectangle(canvas, (50, 50), (150, 120), color_red, thickness)

# Draw a circle
cv2.circle(canvas, (250, 85), 40, color_green, thickness)

# Draw a line
cv2.line(canvas, (300, 50), (350, 120), color_blue, thickness)

# Add text
cv2.putText(canvas, "Black in Robotics", (100, 200),
            cv2.FONT_HERSHEY_SIMPLEX, 1, color_black, 2)

show_image(canvas, "Image with Annotations")

### 🎯 Exercise 2.4: Create Your Annotated Masterpiece!

**Your Mission:** Practice drawing and annotation to create informative visuals.

![Drawing Example](https://drive.google.com/uc?export=view&id=1WyPxkPLgUuuCoix_Oiydghw2PILSbNJl)

**TO-DO List:**
- [ ] Draw basic shapes on your image
- [ ] Add meaningful text labels  
- [ ] Create a detection-style annotation
- [ ] Experiment with colors and styles

In [None]:
# YOUR CREATIVE DRAWING SPACE
# Create a fresh canvas
my_canvas = np.zeros((400, 600, 3), dtype=np.uint8) # A Black Canvas

# display the canvas
show_image(my_canvas, "My Creative Robot Art Canvas!", show_coords=1)

In [None]:
# TODO: Add your creative drawings here!

# Face outline (circle)
cv2.circle(__, __, __, __, __)

# Eyes (filled circles)
cv2.circle(__, __, __, __, __)  # TODO: Yellow left eye
cv2.circle(__, __, __, __, __)  # TODO: Yellow right  eye

# Mouth (rectangle)
cv2.rectangle(__, __, __, __, __)  # TODO:  Red mouth

# Add text below the robot
cv2.putText(__ , __ , __ ,
            cv2.FONT_HERSHEY_SIMPLEX, __, color_white, __ )  # TODO: White subtitle

show_image(my_canvas, "My Creative Robot Art!", show_coords=True)

## 🎯 Break Out 2: Object Detection with YOLO (30 minutes)

### Welcome to Intelligent Vision! 🤖🔍

**What is Object Detection?**
Object detection is teaching computers to find and identify objects in images, just like humans do naturally!

**Key Concepts:**
- **Classification**: "What is this?" (e.g., "This is a cat")
- **Localization**: "Where is it?" (e.g., "The cat is at coordinates x,y")  
- **Detection**: Classification + Localization = "What is it and where?"

**YOLO (You Only Look Once):**
- State-of-the-art object detection algorithm
- Real-time performance
- Detects multiple objects simultaneously
- Provides bounding boxes and confidence scores

### What You'll Learn:
- How object detection works (the pipeline)
- Using YOLO for image detection
- Processing video frame by frame
- Real-time detection with webcam
- Understanding detection confidence and classes

### TO-DO List for This Section:
- [ ] Understand the object detection pipeline
- [ ] Install and setup YOLO
- [ ] Detect objects in static images
- [ ] Process video files frame by frame
- [ ] Implement real-time webcam detection

---

## 📖 Part 1: Understanding Object Detection Pipeline

### The Detection Process:

1. **Input**: Image or video frame
2. **Preprocessing**: Resize, normalize pixel values
3. **Model Inference**: YOLO processes the image
4. **Post-processing**: Filter detections, apply thresholds
5. **Output**: Bounding boxes, labels, confidence scores

In [None]:
# SETUP AND INSTALLATION

print("🔧 Setting up Object Detection Environment...")

# Install required packages
import subprocess
import sys

def install_package(package):
    """Install a package using pip"""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"✅ {package} installed successfully")
    except subprocess.CalledProcessError:
        print(f"❌ Failed to install {package}")

# Install ultralytics (YOLO implementation)
print("Installing YOLO...")
install_package("ultralytics")

# Import necessary libraries
try:
    from ultralytics import YOLO
    import torch
    print("✅ YOLO imported successfully")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please restart runtime and try again")

# Import other required libraries
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import time

print("🎉 Object Detection setup complete!")

### Understanding YOLO Classes

YOLO can detect 80 different object classes from the COCO dataset:

In [None]:
# YOLO COCO class names
COCO_CLASSES = [
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck',
    'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench',
    'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
    'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
    'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
    'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
    'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
    'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]

print(f"🎯 YOLO can detect {len(COCO_CLASSES)} different object types:")
print("Common objects:", COCO_CLASSES[:20])
print("Household items:", COCO_CLASSES[55:75])

---

### 🖼️ Part 2: Object Detection on Images

#### Loading the YOLO Model

Let's load a pre-trained YOLO model:

In [None]:
# LOADING YOLO MODEL

print("🤖 Loading YOLO Model...")

try:
    # Load YOLOv8 nano model (fastest, good for learning)
    model = YOLO('yolov8n.pt')
    print("✅ YOLOv8 Nano model loaded successfully")
    print(f"📊 Model info: {len(COCO_CLASSES)} classes, optimized for speed")
except Exception as e:
    print(f"❌ Model loading failed: {e}")

### Detection Function

Let's create a function to detect objects and draw results:

In [None]:
def detect_and_draw(image, model, confidence_threshold=0.5):
    """
    Detect objects in an image and draw bounding boxes

    Parameters:
    - image: input image (BGR format)
    - model: YOLO model
    - confidence_threshold: minimum confidence for detection

    Returns:
    - annotated_image: image with detection boxes
    - detections: list of detection information
    """
    # Run YOLO inference
    results = model(image, conf=confidence_threshold)

    # Get the first result (single image)
    result = results[0]

    # Create a copy of the image for drawing
    annotated_image = image.copy()

    # Storage for detection information
    detections = []

    # Process each detection
    if result.boxes is not None:
        for box in result.boxes:
            # Get bounding box coordinates
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)

            # Get confidence and class
            confidence = float(box.conf[0].cpu().numpy())
            class_id = int(box.cls[0].cpu().numpy())
            class_name = COCO_CLASSES[class_id]

            # Store detection info
            detection_info = {
                'bbox': (x1, y1, x2, y2),
                'confidence': confidence,
                'class': class_name,
                'class_id': class_id
            }
            detections.append(detection_info)

            # Choose color based on class
            color = (0, 255, 0)  # Green default
            if class_name == 'person':
                color = (255, 0, 0)  # Blue for person
            elif class_name in ['car', 'truck', 'bus']:
                color = (0, 0, 255)  # Red for vehicles

            # Draw bounding box
            cv2.rectangle(annotated_image, (x1, y1), (x2, y2), color, 2)

            # Create label
            label = f"{class_name}: {confidence:.2f}"

            # Get text size for background
            (text_width, text_height), _ = cv2.getTextSize(
                label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2
            )

            # Draw label background
            cv2.rectangle(annotated_image, (x1, y1 - text_height - 10),
                         (x1 + text_width, y1), color, -1)

            # Draw label text
            cv2.putText(annotated_image, label, (x1, y1 - 5),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2)

    return annotated_image, detections

print("🔧 Detection function created!")

### Testing Object Detection on Images

Let's test our detection on some images:

In [None]:
# ===============================================================================
# IMAGE OBJECT DETECTION
# ===============================================================================

print("🎯 Testing Object Detection on Images")

# Download your specific Google Drive image
test_images = []

print("📥 Downloading your Google Drive image...")
try:
    # Your Google Drive image URL (converted to direct download)
    google_drive_url = "https://drive.google.com/uc?export=download&id=1tozrXlV4YNs30pjwdjcKYVvPTx_QYM03"

    # Download the image
    urllib.request.urlretrieve(google_drive_url, "your_detection_image.jpg")
    your_image = cv2.imread("your_detection_image.jpg")

    if your_image is not None:
        print("✅ Successfully downloaded your Google Drive image")
        print_image_info(your_image, "Your Google Drive Image")
        test_images.append(('Your Google Drive Image', your_image))
    else:
        print("❌ Image downloaded but failed to load")
        raise Exception("Failed to load downloaded image")

except Exception as e:
    print(f"❌ Failed to download your image: {e}")
    print("💡 Make sure the Google Drive file is set to 'Anyone with the link can view'")
    print("🔧 Using fallback test image...")

    # Fallback to original test image
    try:
        test_url = "https://raw.githubusercontent.com/ultralytics/yolov5/master/data/images/bus.jpg"
        urllib.request.urlretrieve(test_url, "test_detection.jpg")
        test_img = cv2.imread("test_detection.jpg")
        if test_img is not None:
            test_images.append(('Fallback Test Image', test_img))
    except:
        print("Could not download fallback image, creating synthetic scene...")
        # Create a simple scene for testing
        test_img = np.zeros((400, 600, 3), dtype=np.uint8)
        test_img[100:300, 200:400] = [100, 150, 200]  # Add some color
        test_images.append(('Synthetic Image', test_img))

# Run detection on available images
for name, image in test_images:
    print(f"\n🔍 Detecting objects in: {name}")

    # Run detection
    detected_image, detections = detect_and_draw(image, model, confidence_threshold=0.3)

    # Print detection results
    print(f"📊 Detection Results:")
    print(f"   Found {len(detections)} objects")

    for i, detection in enumerate(detections):
        bbox = detection['bbox']
        print(f"   {i+1}. {detection['class']}: {detection['confidence']:.3f} at ({bbox[0]}, {bbox[1]}) to ({bbox[2]}, {bbox[3]})")

    # Display results
    show_images_grid(
        [image, detected_image],
        [f"Original {name}", f"Detected Objects ({len(detections)} found)"],
        rows=1, cols=2, figsize=(16, 6)
    )

### 🎯 Exercise 2.1: Your Object Detection Challenge!

**Your Mission:** Practice object detection and understand the results.

**TO-DO List:**
- [ ] Upload your own image for detection
- [ ] Experiment with confidence thresholds
- [ ] Analyze detection results
- [ ] Understand false positives and false negatives

In [None]:
# 🎯 Exercise 2.1: Object Detection Practice

print("🎯 Exercise 2.1: Your Object Detection Workshop!")

# TODO: Upload an image for detection
print("📂 Upload an image with objects to detect:")
uploaded_detection = files.upload()

if uploaded_detection:
    detection_filename = list(uploaded_detection.keys())[0]
    your_detection_image = cv2.imread(detection_filename)

    if your_detection_image is not None:
        print(f"✅ Loaded: {detection_filename}")
        show_image(your_detection_image, "Your Image for Detection")

        # TODO: Set your confidence threshold
        confidence_threshold = 0.9  # TODO: Experiment with this value (try 0.3, 0.5, 0.7, 0.9)

        print(f"\n🎚️ Running Detection with Confidence Threshold: {confidence_threshold}")

        # Run detection with your chosen threshold
        detected_img, detections = detect_and_draw(
            your_detection_image, model, confidence_threshold=confidence_threshold
        )

        # Print detection results
        print(f"\n📊 Detection Results:")
        print(f"   Objects found: {len(detections)}")

        if len(detections) > 0:
            print(f"   Detected objects:")
            for i, detection in enumerate(detections):
                print(f"   {i+1}. {detection['class']}: {detection['confidence']:.3f}")
        else:
            print(f"   No objects detected at confidence threshold {confidence_threshold}")
            print(f"   💡 Try lowering the threshold (e.g., 0.3 or 0.5) to see more objects")

        # Display original vs detected
        show_images_grid(
            [your_detection_image, detected_img],
            ["Original Image", f"Detected Objects (Conf: {confidence_threshold})"],
            rows=1, cols=2, figsize=(16, 6)
        )

        # TODO: Analyze your results
        print(f"\n🤔 Analysis Questions:")
        print(f"1. How many objects were detected at confidence {confidence_threshold}?")
        print(f"2. Are the detections accurate? Any false positives or missed objects?")
        print(f"3. Would you increase or decrease the threshold for better results?")
        print(f"4. What happens if you change the threshold to 0.3 or 0.5?")

    else:
        print("❌ Could not load the uploaded image")
else:
    print("No image uploaded, using previous examples for analysis")

---

## 🎬 Part 3: Video Processing and Frame-by-Frame Detection

### Understanding Video as Image Sequences

**Key Concept:** Videos are just sequences of images (frames) displayed rapidly!

**Video Processing Pipeline:**
1. **Read Video**: Load video file or camera stream
2. **Extract Frames**: Get individual images from video
3. **Process Each Frame**: Apply detection to each image
4. **Combine Results**: Create output video or display real-time

**Frame Rate Considerations:**
- Standard video: 24-30 FPS (Frames Per Second)
- Real-time processing: Must process faster than frame rate
- Trade-offs: Speed vs accuracy

In [None]:
# VIDEO PROCESSING SETUP

print("🎬 Video Processing and Object Detection")

def process_video_file(video_path, model, output_path=None, conf_threshold=0.5, max_frames=None):
    """
    Process a video file and detect objects in each frame

    Parameters:
    - video_path: path to input video
    - model: YOLO model
    - output_path: path for output video (optional)
    - conf_threshold: confidence threshold for detection
    - max_frames: maximum frames to process (for testing)

    Returns:
    - processed_frames: list of processed frames
    - detection_stats: statistics about detections
    """
    # Open video file
    cap = cv2.VideoCapture(video_path)

    if not cap.isOpened():
        print(f"❌ Error: Could not open video {video_path}")
        return None, None

    # Get video properties
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    print(f"📹 Video Info: {width}x{height}, {fps} FPS, {total_frames} frames")

    # Setup video writer if output path provided
    if output_path:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

    processed_frames = []
    detection_stats = {'total_detections': 0, 'frames_processed': 0, 'objects_per_frame': []}

    frame_count = 0
    start_time = time.time()

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        # Process frame
        detected_frame, detections = detect_and_draw(frame, model, conf_threshold)

        # Update statistics
        detection_stats['frames_processed'] += 1
        detection_stats['total_detections'] += len(detections)
        detection_stats['objects_per_frame'].append(len(detections))

        # Save processed frame
        processed_frames.append(detected_frame)

        # Write to output video if specified
        if output_path:
            out.write(detected_frame)

        frame_count += 1

        # Break if max_frames reached
        if max_frames and frame_count >= max_frames:
            break

        # Print progress
        if frame_count % 30 == 0:
            elapsed = time.time() - start_time
            fps_processed = frame_count / elapsed
            print(f"   Processed {frame_count} frames, {fps_processed:.1f} FPS")

    # Clean up
    cap.release()
    if output_path:
        out.release()

    # Calculate final statistics
    avg_objects = np.mean(detection_stats['objects_per_frame']) if detection_stats['objects_per_frame'] else 0
    processing_time = time.time() - start_time
    avg_fps = frame_count / processing_time

    print(f"✅ Video processing complete!")
    print(f"   Processed {frame_count} frames in {processing_time:.1f}s")
    print(f"   Average processing speed: {avg_fps:.1f} FPS")
    print(f"   Average objects per frame: {avg_objects:.1f}")

    return processed_frames, detection_stats

In [None]:
# TEST YOUR UPLOADED VIDEO

print("🎬 Testing Your Uploaded Video")

# TODO: Specify the path to your uploaded video file
video_path = "/content/Realistic_Object_Detection_Video_Generation.mp4"  # TODO: Replace with your actual video filename

# Process your video if file exists
if os.path.exists(video_path):
    print(f"\n🔍 Processing video: {video_path}")

    # TODO: Adjust these parameters for your video
    conf_threshold = 0.5    # TODO: Try different confidence values (0.3, 0.5, 0.7)
    max_frames = 120        # TODO: Limit frames for faster testing (or set to None for full video)
    output_path = "/content/detected_video_output.mp4"  # TODO: Set output path or None

    print(f"⚙️ Processing Settings:")
    print(f"   Confidence threshold: {conf_threshold}")
    print(f"   Max frames to process: {max_frames if max_frames else 'All frames'}")
    print(f"   Output video: {output_path if output_path else 'No output video'}")

    # Process the video
    processed_frames, stats = process_video_file(
        video_path=video_path,
        model=model,
        output_path=output_path,
        conf_threshold=conf_threshold,
        max_frames=max_frames
    )

    if processed_frames and stats:
        print(f"\n📊 Your Video Analysis Results:")
        print(f"   Total frames processed: {stats['frames_processed']}")
        print(f"   Total objects detected: {stats['total_detections']}")
        print(f"   Average objects per frame: {np.mean(stats['objects_per_frame']):.1f}")
        print(f"   Max objects in a frame: {max(stats['objects_per_frame']) if stats['objects_per_frame'] else 0}")
        print(f"   Min objects in a frame: {min(stats['objects_per_frame']) if stats['objects_per_frame'] else 0}")

        # Show sample frames from your video
        if len(processed_frames) >= 4:
            sample_indices = [0, len(processed_frames)//3, 2*len(processed_frames)//3, len(processed_frames)-1]
            sample_frames = [processed_frames[i] for i in sample_indices]
            sample_titles = [f"Frame {i+1}" for i in sample_indices]

            print(f"\n🖼️ Sample Frames from Your Video:")
            show_images_grid(sample_frames, sample_titles, rows=2, cols=2, figsize=(16, 12))

        # Download processed video if created
        if output_path and os.path.exists(output_path):
            print(f"\n💾 Processed video saved as: {output_path}")
            try:
                files.download(output_path)
                print("📥 Download started for processed video")
            except:
                print("❌ Download failed, but file is saved in /content/")

    else:
        print("❌ Video processing failed")

else:
    print(f"❌ Video file not found: {video_path}")
    print("💡 Make sure you've uploaded a video file to /content/")

### Video Processing Example

Let's download and process a sample video:

### 🎯 Exercise 2.3: Video Object Detection Master!

**Your Mission:** Record your own test video and analyze object detection performance over time.

**TO-DO List:**
- [ ] Record a 30-second video with various objects
- [ ] Upload your video to Google Colab
- [ ] Process your video with object detection
- [ ] Analyze detection patterns and performance

## 📹 Step 1: Record Your Test Video

**Recording Instructions:**
1. **Use your laptop/phone** to record a 30-second video
2. **Include diverse objects** from the COCO classes:
   - **People**: Yourself, friends, family members
   - **Vehicles**: Cars, bicycles, motorcycles (from window/outside)
   - **Animals**: Pets (cats, dogs, birds)
   - **Household items**: Chairs, laptops, cups, bottles, books
   - **Food**: Bananas, apples, pizza, sandwiches

**Video Tips:**
- **Good lighting**: Record in well-lit areas
- **Stable camera**: Keep camera relatively steady
- **Multiple objects**: Show 2-5 objects at once when possible
- **Movement**: Include some object movement/camera panning
- **Various distances**: Show objects up close and far away

**File Format:** Save as `.mp4`, `.avi`, or `.mov` (max 50MB for Colab)

In [None]:
print("🎯 Exercise 3.1: Video Object Detection Analysis!")

# TODO: Upload your recorded video
print("📤 Upload your 30-second test video:")
uploaded_video = files.upload()

if uploaded_video:
    video_filename = list(uploaded_video.keys())[0]
    video_path = f"/content/{video_filename}"

    print(f"✅ Uploaded: {video_filename}")

    # TODO: Set your processing parameters
    conf_threshold = 0.5    # TODO: Try different values (0.3, 0.5, 0.7)
    max_frames = 150        # TODO: Process ~5 seconds worth (30fps * 5 = 150)
    output_path = "/content/my_detected_video.mp4"

    print(f"\n🎬 Processing your video...")
    print(f"   Confidence threshold: {conf_threshold}")
    print(f"   Processing first {max_frames} frames")

    # Process your video
    processed_frames, stats = process_video_file(
        video_path=video_path,
        model=model,
        output_path=output_path,
        conf_threshold=conf_threshold,
        max_frames=max_frames
    )

    if processed_frames and stats:
        # TODO: Analyze your video results
        objects_per_frame = stats['objects_per_frame']

        print(f"\n📊 Your Video Analysis:")
        print(f"   Frames processed: {stats['frames_processed']}")
        print(f"   Total detections: {stats['total_detections']}")
        print(f"   Average objects per frame: {np.mean(objects_per_frame):.1f}")
        print(f"   Peak objects in one frame: {max(objects_per_frame)}")
        print(f"   Minimum objects detected: {min(objects_per_frame)}")

        # Show key frames from your video
        if len(processed_frames) >= 4:
            # Find interesting frames
            max_detections_frame = np.argmax(objects_per_frame)
            min_detections_frame = np.argmin(objects_per_frame)
            mid_frame = len(processed_frames) // 2

            key_frames = [
                processed_frames[0],
                processed_frames[mid_frame],
                processed_frames[max_detections_frame],
                processed_frames[-1]
            ]

            frame_titles = [
                "First Frame",
                "Middle Frame",
                f"Peak Activity ({max(objects_per_frame)} objects)",
                "Final Frame"
            ]

            print(f"\n🖼️ Key Frames from Your Video:")
            show_images_grid(key_frames, frame_titles, rows=2, cols=2, figsize=(16, 12))

        # Download your processed video
        if os.path.exists(output_path):
            print(f"\n💾 Your processed video is ready!")
            try:
                files.download(output_path)
                print("📥 Download started - check your Downloads folder")
            except:
                print(f"File saved as: {output_path}")

    else:
        print("❌ Video processing failed")
else:
    print("❌ No video uploaded")