# 📷 Chapter 14: Deep Computer Vision with CNNs — Practical Guide

---

This notebook provides a practical walkthrough of deep learning techniques for computer vision using Convolutional Neural Networks (CNNs). It covers architecture concepts, building models with Keras, transfer learning, object detection, segmentation, and more.

## I. 🤓 Architecture of the Visual Cortex

Inspired by biology: neurons respond to visual patterns like edges. CNNs mimic this via local receptive fields (convolutions) and hierarchical feature extraction.

## II. 🧩 Convolutional Layers

* **Filters (Kernels)** slide across images to detect local features.
* Multiple filters produce multiple **feature maps** and can be **stacked**.
* CNNs drastically reduce parameters compared to fully connected layers.

### CNN in TensorFlow (Keras)

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D

# Define a simple CNN with two convolutional layers
model = Sequential([
    Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=(64, 64, 3)),
    Conv2D(filters=64, kernel_size=3, activation='relu'),
])

model.summary()

2025-06-18 13:46:39.693559: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-18 13:46:40.020675: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-18 13:46:40.358286: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750243600.636131    2254 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750243600.736191    2254 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1750243601.396413    2254 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

## III. ➖ Pooling Layers

Reduces spatial dimensions while maintaining important info.

```python
from tensorflow.keras.layers import MaxPooling2D

model.add(MaxPooling2D(pool_size=2))  # Adds a max pooling layer
```

### Classic Architectures Overview

* **LeNet-5**: early handwritten-digit CNN.
* **AlexNet**: introduced ReLU, dropout for ImageNet.
* **GoogLeNet (Inception)**: efficient inception modules.
* **VGGNet**: deep stacks of small (3×3) convolutions.
* **ResNet**: residual connections to train very deep nets.
* **Xception / SENet**: separable convolutions and dynamic channel weighting.

## IV. 🧱 Building a ResNet-34 in Keras

Note: Keras does not include ResNet34 by default, but we can use `keras.applications.ResNet50` or other models. For demonstration, here's how to load a ResNet50 model:

In [4]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense

# Load ResNet50 pretrained on ImageNet, exclude top layers
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the pretrained weights
base_model.trainable = False

# Build a new model on top of ResNet
cnn_model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(1000, activation='relu'),  # additional dense layer
    Dense(10, activation='softmax')  # assuming 10 classes
])

# Compile the model
cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

cnn_model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m338s[0m 4us/step


## V. ✔️ Using Pretrained Models from Keras

```python
from tensorflow.keras.applications import VGG16, ResNet50

# Load VGG16 without top classification layer
vgg = VGG16(weights='imagenet', include_top=False)

# Load ResNet50 without top layer
resnet = ResNet50(weights='imagenet', include_top=False)
```

These models can be used as feature extractors or fine-tuned for your custom tasks.

## VI. 🔁 Transfer Learning

* Freeze early layers to preserve learned features.
* Add new layers (e.g., dense classifier) per your target dataset.
* Fine-tune top layers after initial training.

## VII. 🧭 Classification & Localization

Models like **Fast R-CNN** are used to predict both object class and bounding box coordinates.

## VIII. 🔍 Object Detection

* **Fully Convolutional Networks (FCNs)** output heatmaps for dense predictions.
* **YOLO** (You Only Look Once): divides image into grids and predicts bounding boxes + class probabilities in one pass.

```python
# Pseudocode for YOLO prediction (replace with actual implementation)
model = load_yolo_model()
preds = model.predict(image)
boxes, scores, classes = decode_yolo_output(preds)
```

## IX. 🎨 Semantic Segmentation

Per-pixel classification with networks like **U-Net** and **SegNet**, which use encoder–decoder architectures to restore resolution and produce dense masks.

## ✅ Practical Code Example (Transfer Learning)

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Prepare data generators
train_gen = ImageDataGenerator(rescale=1./255, horizontal_flip=True)
train_data = train_gen.flow_from_directory('data/train', target_size=(224,224), batch_size=32)
val_data = ImageDataGenerator(rescale=1./255).flow_from_directory('data/val', target_size=(224,224), batch_size=32)

# Train the model
history = cnn_model.fit(train_data, epochs=5, validation_data=val_data)

## 🧠 Chapter Summary

* CNNs simulate the visual cortex with convolutions + pooling.
* Deep architectures evolved from LeNet → ResNet → Xception/SENet.
* Keras includes powerful pretrained models for feature extraction or fine-tuning.
* Object detection and segmentation extend CNNs to detect and understand objects in images.

## 🧪 Exercises

1. Use ResNet-34 (or ResNet50) to classify a custom dataset (e.g., cats vs. dogs).
2. Fine-tune the top layers on a new dataset—monitor train vs validation accuracy.
3. Implement YOLO's prediction decoder to visualize bounding boxes.
4. Build a simple FCN to perform semantic segmentation on a toy dataset (e.g., segment flowers vs background).
5. Compare model performance before and after **unfreezing and retraining top layers**.