# üìù Homework Assignment: The "Build Your Own Dataset" Lab

**Module:** Data Preprocessing & Feature Engineering for CNNs

### **The Objective**
In class, we learned that a Neural Network is only as good as the data you feed it. We used generic images of dogs and shapes. Now, it is your turn to act as a **Data Engineer**.

Your goal is to acquire a raw, high-resolution image of your choice and pass it through a complete **Preprocessing Pipeline** to make it "Machine Learning Ready."

### **Grading Criteria**
1.  **Image Loading:** Successfully loaded a custom local image.
2.  **Resizing:** Correctly reshaped to $224 \times 224$ (Bilinear).
3.  **Normalization:** Data range is verified to be $0.0 - 1.0$.
4.  **Augmentation:** 3 variations shown; Reasoning is sound.
5.  **One-Hot Encoding:** Correct vector generation.

In [None]:
# SETUP CELL
# Run this once to import your tools
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.utils import to_categorical

print(f"TensorFlow Version: {tf.__version__}")

## Step 1: Data Acquisition (The Creative Part)

**Task:**
1.  **Take a photo** with your smartphone OR find a high-resolution image online.
2.  **Subject:** Choose a distinct object (e.g., Your pet, your favorite sneakers, a coffee mug, a specific car).
3.  **Constraint:** The image must be **High Resolution** (at least $1000 \times 1000$ pixels) and rectangular (not a perfect square).
4.  Save the image in the same folder as this notebook (or upload it if using Colab).

In [None]:
# TODO: Enter the filename of your image here
filename = "YOUR_IMAGE_HERE.jpg" 

try:
    # Load the image
    raw_image = tf.keras.utils.load_img(filename)
    raw_image = tf.keras.utils.img_to_array(raw_image).astype('uint8')

    # Display Original
    plt.figure(figsize=(6,6))
    plt.imshow(raw_image)
    plt.title(f"Original Image\nShape: {raw_image.shape}")
    plt.axis('off')
    plt.show()
    
except Exception as e:
    print(f"Error loading image: {e}")
    print("Make sure the filename matches exactly and the file is in the correct folder.")

## Step 2: Geometric Transformation (Resizing)

**Task:**
1.  Resize the image to the industry-standard input size of **$224 \times 224$ pixels**.
2.  Use **Bilinear Interpolation**.
3.  Display the result. Does it look squashed? That is okay!

In [None]:
# TODO: Define the target size
target_size = (224, 224)

# TODO: Resize the raw_image using tf.image.resize
# Hint: Use method='bilinear'
resized_image = tf.image.resize(raw_image, target_size, method='bilinear')

# Display the resized image
plt.figure(figsize=(4,4))
# We cast to uint8 just for visualization purposes
plt.imshow(resized_image.numpy().astype("uint8"))
plt.title(f"Resized Input\nShape: {resized_image.shape}")
plt.axis('off')
plt.show()

## Step 3: Numerical Transformation (Normalization)

**Task:**
1.  Convert the pixel values from Integers ($0-255$) to Floating Points ($0.0-1.0$).
2.  **Proof:** Print the max pixel value **before** and **after** normalization to prove your code worked.

In [None]:
# Check values before
print(f"Max value BEFORE: {np.max(resized_image)}")

# TODO: Normalize the resized_image
# Hint: Divide by the maximum possible pixel value
normalized_image = resized_image / 255.0

# Check values after
print(f"Max value AFTER: {np.max(normalized_image)}")

## Step 4: Feature Engineering (Data Augmentation)

**Task:**
1.  Create an augmentation pipeline using `tf.keras.Sequential`.
2.  Includes at least **two** layers (e.g., RandomFlip, RandomRotation, RandomZoom).
3.  Generate and display **3 distinct variations** of your image.

In [None]:
# TODO: Create your data augmentation pipeline
data_augmentation = tf.keras.Sequential([
    # Add your layers here. Example
])

# Create a batch (Add dimension)
batched_img = tf.expand_dims(normalized_image, 0)

# Display 3 variations
plt.figure(figsize=(10, 4))
for i in range(3):
    # TODO: Pass the batched_img through your data_augmentation pipeline
    augmented_image = data_augmentation(batched_img)
    
    plt.subplot(1, 3, i + 1)
    plt.imshow(augmented_image[0])
    plt.title(f"Variation {i+1}")
    plt.axis("off")
plt.show()

### **Critical Thinking Question:**
*Double click this cell to edit.*

**Why did you choose these specific augmentations for your object?**
(Example: "I chose horizontal flip because a car is still a car if it faces left. I did not choose vertical flip because cars don't drive upside down.")

**Your Answer:**
[TYPE YOUR ANSWER HERE]

## Step 5: Label Engineering (One-Hot Encoding)

**Task:**
1.  Invent a list of 3-5 categories that your image belongs to.
2.  Manually define the correct index for your image.
3.  Generate the **One-Hot Encoded Vector**.

In [None]:
# TODO: Define your list of class names
# Example: my_classes = ['Sneaker', 'Boot', 'Sandal']
my_classes = [] 

# TODO: Define which index your image belongs to
# Example: correct_index = 0
correct_index = 0

# TODO: Generate the vector using to_categorical
vector = to_categorical(correct_index, num_classes=len(my_classes))

print(f"Classes: {my_classes}")
print(f"Selected Class: {my_classes[correct_index]}")
print(f"One-Hot Vector: {vector}")