# Geometric Transformations & Image Registration
## 1. Introduction
**Lecture objectives**
- Understand common geometric transforms and when to use them.
- Apply affine and perspective transforms with OpenCV.
- Recognize the role of registration in aligning images.

**Overview**
Geometric transformations are fundamental operations in computer vision that modify the spatial layout of images. They include:
- **Affine transforms**: Preserve parallelism (rotation, translation, scaling, shearing).
- **Perspective transforms**: General projective transforms that handle viewing angle changes.
- **Registration**: Aligning multiple images to a common coordinate system for comparison or fusion.
- **Stitching**: Combining overlapping images to create wide-angle panoramas.

These techniques are essential for applications like medical imaging alignment, panorama creation, document scanning, and video stabilization.

**Required libraries**
- `numpy` for array math
- `opencv-python` (`cv2`) for image transforms
- `matplotlib` for quick visualization

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt

**Image data structure (NumPy array basics)**
- Grayscale image: 2D array with shape `(H, W)`.
- Color image (BGR in OpenCV): 3D array with shape `(H, W, 3)`.
- Pixel values are typically `uint8` in `[0, 255]` for 8-bit images.

## 2. Image I/O and Visualization
**Key concepts**
- **imread**: Load images from disk (BGR format in OpenCV, unlike standard RGB).
- **cvtColor**: Convert between color spaces (BGR, RGB, HSV, Grayscale, LAB).
- **imwrite**: Save processed images to disk.
- **Camera capture**: Real-time image acquisition from webcams or video files.

**Why OpenCV uses BGR**
OpenCV defaults to **BGR (Blue-Green-Red)** component ordering instead of the standard RGB. This arose historically from Intel's image processing conventions. When working with images from other libraries (e.g., Matplotlib expects RGB), you must explicitly convert using `cv2.cvtColor(img, cv2.COLOR_BGR2RGB)`. This is critical for correct color visualization and algorithm performance—imagine applying a blue-channel de-blur filter to what you think is the red channel.

**Color space selection guide**
- **BGR/RGB**: General-purpose color images; good for display.
- **Grayscale**: Reduces computation (1 channel vs 3); used when color is irrelevant (e.g., edge detection, feature matching).
- **HSV**: Hue-Saturation-Value; intuitive for human vision; excellent for color-based segmentation (detecting 'redness' independent of lighting).
- **LAB**: Perceptually uniform; used in medical imaging and cross-domain registration where color perception matters.

**Live camera input**
Capturing real-time video enables interactive applications: camera calibration, live effect demonstration, or augmented reality. Always release the camera with `cap.release()` to avoid resource leaks or conflicts with other applications.

**Application context**: Image I/O forms the pipeline foundation. Medical imaging software loads DICOM files and converts to standard color spaces before registration. Document scanners capture live frames and convert to grayscale for OCR processing.

In [None]:
# Goal: read an image to inspect size and dtype before any processing
# Why: many algorithms assume a consistent shape and 8-bit data
img_path = "img/cleon.jpg"
img_bgr = cv2.imread(img_path)
# Where: dataset inspection and preprocessing pipelines
if img_bgr is None:
    raise FileNotFoundError(f"Could not read {img_path}")

# Inspect array shape and dtype to confirm expectations
print("shape:", img_bgr.shape)
print("dtype:", img_bgr.dtype)

In [None]:
# Goal: convert BGR to RGB for correct notebook display
# Why: matplotlib expects RGB, but OpenCV loads BGR
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

# Where: quick visualization during analysis and debugging
plt.figure(figsize=(5, 4))
plt.imshow(img_rgb)
plt.title("RGB display via matplotlib")
plt.axis("off")
plt.show()

# Script-based display (won't show inside most notebooks)
# cv2.imshow("BGR image", img_bgr)
# cv2.waitKey(0)
# cv2.destroyAllWindows()

In [None]:
# Goal: open camera and capture a frame
# Why: live image capture for real-time processing and testing
cap = cv2.VideoCapture(0)

if not cap.isOpened():
    raise RuntimeError("Could not open camera")

# Capture a single frame
ret, frame = cap.read()
if not ret:
    raise RuntimeError("Failed to capture frame from camera")

cap.release()

# Display the captured frame
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(6, 4))
plt.imshow(frame_rgb)
plt.title("Camera capture")
plt.axis("off")
plt.show()

In [None]:
# Goal: show multiple images in one figure for comparison
# Why: side-by-side viewing highlights differences quickly
img2_bgr = cv2.imread("img/millennium-falcon.jpg")
img3_bgr = cv2.imread("img/the-vault.jpg")
# Where: dataset review and reporting
if img2_bgr is None or img3_bgr is None:
    raise FileNotFoundError("Could not read one of the extra images")

# Convert to RGB for matplotlib
imgs_rgb = [img_rgb, cv2.cvtColor(img2_bgr, cv2.COLOR_BGR2RGB), cv2.cvtColor(img3_bgr, cv2.COLOR_BGR2RGB)]
titles = ["cleon", "millennium-falcon", "the-vault"]

# Plot side-by-side in one row
plt.figure(figsize=(10, 4))
for i in range(3):
    plt.subplot(1, 3, i + 1)
    plt.imshow(imgs_rgb[i])
    plt.title(titles[i])
    plt.axis("off")
plt.show()

In [None]:
# Goal: save a derived image to disk
# Why: share results or build a processed dataset
out_path = "img/cleon_gray.png"
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
cv2.imwrite(out_path, img_gray)
print("saved:", out_path)

## 3. Drawing and Annotation
**Key functions**
- Drawing primitives: line, circle, rectangle, ellipse, polylines.
- Text overlay: putText with font selection and baseline adjustment.
- Region of Interest (ROI): Selecting sub-regions for processing or analysis.

**Why annotation matters**
Visualization is essential for debugging and communication in computer vision. Overlaying detected features, bounding boxes, or trajectory paths on images helps you verify algorithm correctness ("Did my edge detector work?"), communicate results to stakeholders, and prepare publication-quality figures. Color choice matters: use contrasting colors against the image background (e.g., green lines on complex scenes, blue on bright backgrounds).

**Practical uses**
- **Detection validation**: Draw bounding boxes around detected objects to confirm the detector is working.
- **Feature visualization**: Mark keypoints with circles to show where SIFT/ORB found interest points.
- **Interactive ROI selection**: Allow users to draw rectangles for region-based processing (e.g., histogram equalization in a selected area).
- **Augmented reality**: Overlay computed overlays (transformed chess boards, directional arrows) to demonstrate geometric accuracy.
- **Medical imaging**: Annotate regions of interest, measurements, or pathological findings.

**Application context**: In panorama stitching, you might annotate matched feature pairs to verify matching quality before homography estimation. In document scanning, you draw the detected document outline before perspective correction to show the algorithm's perception of the page boundary.

In [None]:
# Goal: draw shapes and labels on top of an image
# Why: visualize ROIs, annotations, or measurement overlays
canvas = img_bgr.copy()

# Get image size for relative placement
h, w = canvas.shape[:2]

# Draw primitives for annotation and debugging
cv2.line(canvas, (20, 20), (w - 20, 20), (0, 255, 0), 2)
cv2.circle(canvas, (w // 4, h // 2), 40, (255, 0, 0), 2)
cv2.rectangle(canvas, (w // 2 - 60, h // 2 - 40), (w // 2 + 60, h // 2 + 40), (0, 0, 255), 2)
cv2.ellipse(canvas, (w - 100, h - 80), (60, 30), 30, 0, 360, (255, 255, 0), 2)

# Define polygon vertices and draw a closed shape
poly_pts = np.array([[50, h - 60], [120, h - 100], [180, h - 60], [120, h - 20]], dtype=np.int32)
cv2.polylines(canvas, [poly_pts], isClosed=True, color=(0, 255, 255), thickness=2)

# Add label text for clarity
cv2.putText(canvas, "OpenCV", (20, h - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

# Where: UI overlays, labeling tools, and demo visuals
plt.figure(figsize=(6, 4))
plt.imshow(cv2.cvtColor(canvas, cv2.COLOR_BGR2RGB))
plt.title("Drawing primitives")
plt.axis("off")
plt.show()

In [None]:
# Goal: detect and visualize local features (keypoints)
# Why: keypoints support matching, tracking, and registration
orb = cv2.ORB_create(nfeatures=300)
# Detect keypoints and descriptors in the image
kps, desc = orb.detectAndCompute(img_bgr, None)
# Draw keypoints to see coverage and scale
kp_vis = cv2.drawKeypoints(img_bgr, kps, None, color=(0, 255, 0), flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

# Where: feature-based alignment and stitching
plt.figure(figsize=(6, 4))
plt.imshow(cv2.cvtColor(kp_vis, cv2.COLOR_BGR2RGB))
plt.title("Keypoints (ORB)")
plt.axis("off")
plt.show()

In [None]:
# Goal: match features between two images
# Why: correspondences are needed for registration and stitching
img2_bgr = cv2.imread("img/millennium-falcon.jpg")
if img2_bgr is None:
    raise FileNotFoundError("Could not read img/millennium-falcon.jpg")

# Detect keypoints and descriptors in the second image
kps2, desc2 = orb.detectAndCompute(img2_bgr, None)
# Guard against missing descriptors to avoid matcher errors
if desc is None or desc2 is None:
    raise ValueError("No descriptors found for matching")

# Match descriptors with brute-force Hamming distance (ORB uses binary descriptors)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(desc, desc2), key=lambda m: m.distance)
# Draw top matches for visual inspection
match_vis = cv2.drawMatches(img_bgr, kps, img2_bgr, kps2, matches[:20], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Where: panorama stitching, pose estimation, and registration
plt.figure(figsize=(10, 4))
plt.imshow(cv2.cvtColor(match_vis, cv2.COLOR_BGR2RGB))
plt.title("Feature matches (ORB + BFMatcher)")
plt.axis("off")
plt.show()

## 4. Basic Geometric Transformations
**Mathematical foundation**
Geometric transforms are expressed as matrix operations on image coordinates. In homogeneous coordinates, a 2D point $(x, y)$ becomes $(x, y, 1)$, allowing translation to be expressed as matrix multiplication. A 2×3 matrix $M$ transforms a point via:
$\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = M \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}$

**When to use each transform**
- **Translation**: Shifting images to align objects or correct misalignment from uncontrolled camera motion.
- **Rotation**: Correcting image orientation (camera tilted 30°), aligning rotated documents, or rotating feature-extracted patches for orientation-invariant analysis.
- **Scaling**: Changing resolution, zooming into regions of interest, or normalizing image sizes for batch processing.
- **Combined transforms**: Real-world scenarios rarely involve single operations; most camera motions combine rotation, translation, and scale.
- **Border handling**: Choice of constants, replication, or reflection affects how algorithms process image edges (important for edge-aware filters).

**Interpolation quality trade-offs**
- **INTER_NEAREST**: Fast but blocky; use only when speed dominates quality (e.g., real-time preview).
- **INTER_LINEAR**: Fast and smooth; good general-purpose choice for most applications.
- **INTER_CUBIC**: Slower but smoother; use for publication-quality results or when downsampling significantly.
- **INTER_LANCZOS4**: Highest quality but slowest; use for final output or small, precious datasets.

**Application context**: Correcting camera tilt in document scanning requires combined rotation + scaling. Augmented reality requires real-time transform updates as the camera moves.

In [None]:
# Goal: shift the image by a fixed offset (translation)
# Why: align objects or compensate for camera motion
tx, ty = 60, 30
M_trans = np.array([[1, 0, tx], [0, 1, ty]], dtype=np.float32)
# Apply translation; output size (w, h) is required
img_trans = cv2.warpAffine(img_bgr, M_trans, (w, h))

# Where: registration and tracking pipelines
plt.figure(figsize=(6, 4))
plt.imshow(cv2.cvtColor(img_trans, cv2.COLOR_BGR2RGB))
plt.title("Translation")
plt.axis("off")
plt.show()

### 4.1 Translation
Shift an image by a fixed offset. Useful for aligning images that were captured from slightly different positions. **Goal**: Demonstrate 2D translation matrix and pixel shifting logic. **Why**: Translation is the simplest transform; understanding it establishes the matrix-based approach used for all subsequent transforms. **Where**: Scene alignment, camera stabilization, object tracking (shifting template to match target).

In [None]:
# Goal: rotate the image around its center
# Why: normalize orientation differences
center = (w // 2, h // 2)
angle = 25
scale = 1.0
M_rot = cv2.getRotationMatrix2D(center, angle, scale)
# Apply rotation using the affine matrix
img_rot = cv2.warpAffine(img_bgr, M_rot, (w, h))

# Where: document alignment and robust matching
plt.figure(figsize=(6, 4))
plt.imshow(cv2.cvtColor(img_rot, cv2.COLOR_BGR2RGB))
plt.title("Rotation")
plt.axis("off")
plt.show()

### 4.2 Rotation
Rotate an image around a center point by an angle. Rotation changes both position and orientation of features. **Goal**: Show 2D rotation matrix composition and interpolation artifacts. **Why**: Rotation is fundamental for correcting tilted images and aligning patterns with specific orientations. **Where**: Correcting scanned document tilt, aligning aerial imagery, rotating face landmarks detected at arbitrary angles, panoramic image creation (aligning frames captured at different angles).

In [None]:
# Goal: resize with different interpolation strategies
# Why: speed/quality tradeoffs depend on interpolation choice
scale_x, scale_y = 0.6, 0.6
new_size = (int(w * scale_x), int(h * scale_y))

# Nearest is fastest, linear is common, cubic is smoother but slower
img_nearest = cv2.resize(img_bgr, new_size, interpolation=cv2.INTER_NEAREST)
img_linear = cv2.resize(img_bgr, new_size, interpolation=cv2.INTER_LINEAR)
img_cubic = cv2.resize(img_bgr, new_size, interpolation=cv2.INTER_CUBIC)

# Where: model input sizing and multi-scale analysis
plt.figure(figsize=(9, 3))
for i, (im, title) in enumerate([(img_nearest, "NEAREST"), (img_linear, "LINEAR"), (img_cubic, "CUBIC")], start=1):
    plt.subplot(1, 3, i)
    plt.imshow(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
    plt.title(title)
    plt.axis("off")
plt.tight_layout()
plt.show()

### 4.3 Scaling
Enlarge or shrink an image uniformly or non-uniformly (aspect-change). **Goal**: Demonstrate uniform (isotropic) and non-uniform (anisotropic) scaling. **Why**: Scaling is essential for multi-scale analysis (detecting objects at multiple scales), normalizing input to fixed dimensions, and zooming into regions of interest. Losing aspect ratio can distort face proportions or document text. **Where**: Thumbnail generation, batch image normalization, multi-scale feature detection (SIFT pyramid), video frame resizing to match processing requirements.

In [None]:
# Goal: compose transforms to apply once
# Why: reduces repeated interpolation artifacts
tx2, ty2 = -40, 20
M_t = np.array([[1, 0, tx2], [0, 1, ty2]], dtype=np.float32)
M_r = cv2.getRotationMatrix2D(center, -15, 1.0)

# Convert 2x3 affine matrices to 3x3 for composition
M_t3 = np.vstack([M_t, [0, 0, 1]])
M_r3 = np.vstack([M_r, [0, 0, 1]])
# Compose: first rotate, then translate
M_comp = M_t3 @ M_r3

# Convert back to 2x3 for warpAffine
M_comp2x3 = M_comp[:2, :]
img_comp = cv2.warpAffine(img_bgr, M_comp2x3, (w, h))

# Where: stabilized video frames and aligned augmentation
plt.figure(figsize=(6, 4))
plt.imshow(cv2.cvtColor(img_comp, cv2.COLOR_BGR2RGB))
plt.title("Combined: rotation + translation")
plt.axis("off")
plt.show()

### 4.4 Combined Transformations
Apply multiple transforms sequentially: rotation, then scaling, then translation. **Goal**: Illustrate matrix composition where order matters ($M_{final} = M_{trans} \times M_{scale} \times M_{rot}$). **Why**: Real-world camera motions involve rotation AND scale AND translation simultaneously. Proper matrix composition ensures all transforms apply correctly. **Where**: Simulating camera movements in video, aligning images taken from different viewpoints and distances, augmented reality object placement.

In [None]:
# Goal: show how border rules affect warped images
# Why: edges can introduce artifacts in filtering/warping
M_shift = np.array([[1, 0, 80], [0, 1, 0]], dtype=np.float32)

# Apply the same shift with different border modes
img_const = cv2.warpAffine(img_bgr, M_shift, (w, h), borderMode=cv2.BORDER_CONSTANT, borderValue=(0, 0, 0))
img_reflect = cv2.warpAffine(img_bgr, M_shift, (w, h), borderMode=cv2.BORDER_REFLECT)
img_replicate = cv2.warpAffine(img_bgr, M_shift, (w, h), borderMode=cv2.BORDER_REPLICATE)

# Where: warping, filtering, and registration pipelines
plt.figure(figsize=(9, 3))
for i, (im, title) in enumerate([(img_const, "CONSTANT"), (img_reflect, "REFLECT"), (img_replicate, "REPLICATE")], start=1):
    plt.subplot(1, 3, i)
    plt.imshow(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
    plt.title(title)
    plt.axis("off")
plt.tight_layout()
plt.show()

### 4.5 Border Handling Modes
When a transform shifts pixels outside the original image bounds, how should those new areas be filled? OpenCV offers: **BORDER_CONSTANT** (black padding), **BORDER_REPLICATE** (repeat edge pixels), **BORDER_REFLECT** (mirror image). **Goal**: Demonstrate how border choice affects edge processing. **Why**: Edge-aware algorithms may be sensitive to boundary values. Constant black borders can introduce false edges; replication preserves continuity. **Where**: Document scanning (replicate corners to maintain edges), removing letterboxes from video (constant), image blending where feathering is needed (reflect avoids sharp edges).

In [None]:
# Goal: estimate and apply an affine transform from 3 point pairs
# Why: affine captures rotation, translation, scale, and shear
src_pts = np.float32([[50, 50], [w - 60, 60], [60, h - 60]])
dst_pts = np.float32([[30, 80], [w - 80, 40], [80, h - 80]])

# Compute affine transform matrix (2x3)
M_aff = cv2.getAffineTransform(src_pts, dst_pts)
# Apply affine transform to the image
img_aff = cv2.warpAffine(img_bgr, M_aff, (w, h))

# Visualize mapping by marking source and target points
src_vis = img_bgr.copy()
dst_vis = img_aff.copy()
for pt in src_pts:
    cv2.circle(src_vis, tuple(pt.astype(int)), 6, (0, 255, 0), -1)
for pt in dst_pts:
    cv2.circle(dst_vis, tuple(pt.astype(int)), 6, (0, 255, 0), -1)

# Where: image registration and rectification tasks
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(src_vis, cv2.COLOR_BGR2RGB))
plt.title("Source points")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(dst_vis, cv2.COLOR_BGR2RGB))
plt.title("Transformed image + target points")
plt.axis("off")
plt.tight_layout()
plt.show()

## 5. Affine Transformation
**Theory**: An affine transformation preserves parallel lines and is defined by mapping 3 source points to 3 destination points. The resulting 2×3 transformation matrix encodes rotation, scaling, translation, and shearing in a single operation. **Goal**: Estimate affine matrix from point correspondences and apply it to an image. **Why**: Affine transforms model planar camera motions (viewing scene from slightly different angles) and are computationally efficient compared to perspective transforms. **Where**: Document perspective correction if viewed at a shallow angle, face alignment (warping detected landmarks to canonical positions), medical image registration (aligning scans from different angles). **Practical note**: Require exactly 3 well-distributed point pairs to be uniquely determined.

In [None]:
# Goal: map a quadrilateral region to a rectangle (bird's-eye view)
# Why: correct perspective distortion on planar surfaces
# Select four source points in the original image
src4 = np.float32([[80, 80], [w - 80, 60], [w - 60, h - 80], [70, h - 60]])
# Define destination rectangle corners
dst_w, dst_h = w, h
dst4 = np.float32([[0, 0], [dst_w - 1, 0], [dst_w - 1, dst_h - 1], [0, dst_h - 1]])

# Compute homography (3x3) and warp perspective
M_persp = cv2.getPerspectiveTransform(src4, dst4)
img_persp = cv2.warpPerspective(img_bgr, M_persp, (dst_w, dst_h))

# Visualize mapping: draw source quad on original
src_quad_vis = img_bgr.copy()
cv2.polylines(src_quad_vis, [src4.astype(int)], isClosed=True, color=(0, 255, 0), thickness=2)

# Where: bird's-eye views for documents and roads
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(src_quad_vis, cv2.COLOR_BGR2RGB))
plt.title("Source quadrilateral")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(img_persp, cv2.COLOR_BGR2RGB))
plt.title("Warped bird's-eye view")
plt.axis("off")
plt.tight_layout()
plt.show()

## 7. Advanced Warping & Remapping
- `cv2.remap()` applies a per-pixel mapping for flexible warps.
- `cv2.invertAffineTransform()` computes the inverse of a $2 \times 3$ affine matrix.
**Why it matters**
- Remapping enables custom distortions and coordinate corrections.
**Where it applies**
- Lens correction, undistortion, and geometry-based alignment.

In [None]:
# Goal: use remap to apply a custom warp (horizontal ripple)
# Why: remap gives per-pixel control over where each output pixel samples
map_x, map_y = np.meshgrid(np.arange(w, dtype=np.float32), np.arange(h, dtype=np.float32))

# Create a small sinusoidal horizontal shift based on y
shift = 12.0 * np.sin(2 * np.pi * map_y / 120.0)
map_x_warp = map_x + shift
map_y_warp = map_y.copy()

# Apply remap (border reflects to avoid black edges)
img_remap = cv2.remap(img_bgr, map_x_warp, map_y_warp, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REFLECT)

# Where: lens correction, creative effects, and calibration pipelines
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB))
plt.title("Original")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(img_remap, cv2.COLOR_BGR2RGB))
plt.title("Remap (ripple)")
plt.axis("off")
plt.tight_layout()
plt.show()

In [None]:
# Goal: invert an affine transform and map back to original space
# Why: useful when you need to undo a transform or map coordinates back
M_inv = cv2.invertAffineTransform(M_trans)
img_unshift = cv2.warpAffine(img_trans, M_inv, (w, h))

# Where: registration pipelines and coordinate back-projection
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img_trans, cv2.COLOR_BGR2RGB))
plt.title("Translated")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(img_unshift, cv2.COLOR_BGR2RGB))
plt.title("After inverse transform")
plt.axis("off")
plt.tight_layout()
plt.show()

## 8. Feature Detection
**Theory**: Features (keypoints) are distinctive image regions at multiple scales. Corner detectors (FAST, Harris) find intensity discontinuities; blob detectors (ORB, SIFT) find circular regions of interest. Each keypoint is described by a feature descriptor—a compact vector encoding local appearance. **Goal**: Detect ORB and SIFT features; compare by region count. **Why**: Features are repeatably detectable across image variations (rotation, scale, illumination). They form the foundation for image matching. **Where**: Structure-from-motion (aligning image sequences), object recognition (matching templates in scenes), image stitching (finding overlaps), content-based image retrieval, robot localization using loop closure detection. **Practical comparison**: ORB is fast (real-time), rotation-invariant, but less precise in low-texture regions. SIFT is slower, highly distinctive, scale-invariant, but patented (use with caution in commercial code).

In [None]:
# Goal: extract keypoints/descriptors with ORB and (optionally) SIFT
# Why: descriptors enable matching between images for registration
orb_fd = cv2.ORB_create(nfeatures=500)
orb_kps, orb_desc = orb_fd.detectAndCompute(img_bgr, None)
print("ORB keypoints:", len(orb_kps))
print("ORB descriptor shape:", None if orb_desc is None else orb_desc.shape)

# Try SIFT if available in the OpenCV build
if hasattr(cv2, "SIFT_create"):
    sift = cv2.SIFT_create()
    sift_kps, sift_desc = sift.detectAndCompute(img_bgr, None)
    print("SIFT keypoints:", len(sift_kps))
    print("SIFT descriptor shape:", None if sift_desc is None else sift_desc.shape)
else:
    print("SIFT not available in this OpenCV build")

# Where: feature matching and geometric verification pipelines

## 7. Advanced Warping Techniques
**Theory**: Beyond standard transforms, arbitrary pixel-level remapping (using coordinate lookup tables) and transform inversion enable specialized effects and techniques. **Remapping** applies per-pixel displacement without global geometric structure. **Inversion** reverses a transform (useful for undoing detected distortions). **Goal**: Apply ripple effects via remapping; invert transforms to 'unwarp' images. **Why**: Remapping is flexible for non-rigid deformations (water ripples, lens distortion correction); inversion is crucial for undoing known camera calibration issues. **Where**: Correcting barrel distortion in fisheye lenses, ripple/water effects in filters, reversing geometric distortions caused by camera movement, medical image unwarping (unwrapping cylindrical vessel walls in CT scans for analysis).

In [None]:
# Goal: match ORB descriptors with brute-force matching
# Why: ORB uses binary descriptors suited to Hamming distance
if orb_desc is None or desc2 is None:
    raise ValueError("Descriptors missing for matching")

# BFMatcher with Hamming for ORB; match() gives the best match per descriptor
bf_orb = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
bf_matches = sorted(bf_orb.match(orb_desc, desc2), key=lambda m: m.distance)
print("BF matches:", len(bf_matches))

# Visualize a few matches
bf_vis = cv2.drawMatches(img_bgr, orb_kps, img2_bgr, kps2, bf_matches[:20], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.figure(figsize=(10, 4))
plt.imshow(cv2.cvtColor(bf_vis, cv2.COLOR_BGR2RGB))
plt.title("BFMatcher (ORB)")
plt.axis("off")
plt.show()

In [None]:
# Goal: use k-NN matching + ratio test for cleaner matches
# Why: ratio test removes ambiguous matches with similar distances
# Use BFMatcher without crossCheck to allow k-NN
bf_knn = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
knn_matches = bf_knn.knnMatch(orb_desc, desc2, k=2)

# Apply Lowe's ratio test
ratio = 0.75
good = []
for m, n in knn_matches:
    if m.distance < ratio * n.distance:
        good.append(m)
print("KNN matches:", len(knn_matches), "Good after ratio:", len(good))

# Visualize ratio-test matches
knn_vis = cv2.drawMatches(img_bgr, orb_kps, img2_bgr, kps2, good[:30], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.figure(figsize=(10, 4))
plt.imshow(cv2.cvtColor(knn_vis, cv2.COLOR_BGR2RGB))
plt.title("KNN + ratio test (ORB)")
plt.axis("off")
plt.show()

In [None]:
# Goal: use FLANN-based matcher (fast approximate)
# Why: faster for large descriptor sets; works well with SIFT
if 'sift_desc' in locals() and sift_desc is not None:
    # Need SIFT descriptors for the second image too
    if hasattr(cv2, "SIFT_create"):
        sift2 = cv2.SIFT_create()
        sift_kps2, sift_desc2 = sift2.detectAndCompute(img2_bgr, None)
        
        if sift_desc2 is not None:
            # Create FLANN matcher for SIFT (float descriptors)
            FLANN_INDEX_KDTREE = 1
            index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
            search_params = dict(checks=50)
            flann = cv2.FlannBasedMatcher(index_params, search_params)
            
            # Match with k=2 for ratio test
            flann_knn = flann.knnMatch(sift_desc, sift_desc2, k=2)
            
            # Ratio test for SIFT matches
            good_flann = []
            for m, n in flann_knn:
                if m.distance < 0.75 * n.distance:
                    good_flann.append(m)
            print("FLANN good matches:", len(good_flann))

            # Visualize a subset
            flann_vis = cv2.drawMatches(img_bgr, sift_kps, img2_bgr, sift_kps2, good_flann[:30], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
            plt.figure(figsize=(10, 4))
            plt.imshow(cv2.cvtColor(flann_vis, cv2.COLOR_BGR2RGB))
            plt.title("FLANN + ratio test (SIFT)")
            plt.axis("off")
            plt.show()
        else:
            print("SIFT descriptors not found for second image; skipping FLANN demo")
    else:
        print("SIFT not available in this OpenCV build; skipping FLANN demo")
else:
    print("SIFT not available or descriptors missing; skipping FLANN demo")

## 9. Feature Matching
**Theory**: Given keypoints and descriptors in two images, matching finds corresponding points. Simple matching returns the nearest neighbor by descriptor distance. **Lowe's ratio test** improves accuracy by rejecting matches where the nearest and 2nd-nearest descriptors are too close (ambiguous). FLANN (Fast Library for Approximate Nearest Neighbors) uses space-partitioning trees for fast matching. **Goal**: Demonstrate basic BFMatcher, ratio-test filtering, and FLANN-based matching. **Why**: Robust matching reduces false positives (spurious correspondences leading to incorrect transforms). The ratio test is statistically principled: if the best match is only slightly better than the second-best, the match is unreliable. **Where**: Image stitching (finding overlapping regions), object recognition (matching templates), localization (find where an object appears in a scene), medical image registration (aligning patient scans). **Trade-off**: Ratio test reduces false positives but may discard some true matches in ambiguous regions.

In [None]:
# Goal: estimate a homography from matched keypoints using RANSAC
# Why: robustly fit a projective model while rejecting outliers
# Use ORB keypoints from Section 8 and good matches from Section 9

# Ensure we have good matches from the ratio test (cell 34)
if 'good' not in locals() or len(good) < 4:
    # Fallback: use brute-force matches if ratio test matches are insufficient
    if 'bf_matches' in locals() and len(bf_matches) >= 4:
        good = bf_matches
    else:
        raise ValueError("Need at least 4 good matches to compute homography")

if len(good) < 4:
    raise ValueError("Need at least 4 good matches to compute homography")

# Build point arrays for homography estimation
src_pts_h = np.float32([orb_kps[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dst_pts_h = np.float32([kps2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

# Estimate homography with RANSAC to separate inliers/outliers
H, mask = cv2.findHomography(src_pts_h, dst_pts_h, cv2.RANSAC, 5.0)
if H is None:
    raise ValueError("Homography estimation failed")

# Count inliers vs outliers
inlier_mask = mask.ravel().astype(bool)
num_inliers = int(inlier_mask.sum())
num_outliers = int(len(inlier_mask) - num_inliers)
print("Inliers:", num_inliers, "Outliers:", num_outliers)

# Visualize inlier matches only
inlier_matches = [m for m, keep in zip(good, inlier_mask) if keep]
H_vis = cv2.drawMatches(img_bgr, orb_kps, img2_bgr, kps2, inlier_matches[:30], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Where: panorama stitching and planar alignment
plt.figure(figsize=(10, 4))
plt.imshow(cv2.cvtColor(H_vis, cv2.COLOR_BGR2RGB))
plt.title("Inlier matches after RANSAC")
plt.axis("off")
plt.show()

## 10. Homography Estimation with RANSAC
**Theory**: RANSAC (Random Sample Consensus) is a robust estimation algorithm that finds geometric transforms despite abundant outliers. It repeatedly: (1) samples minimal sets (4 points for homography), (2) estimates a transform, (3) counts inliers within a threshold, (4) keeps the best model. This separates correct correspondences (inliers) from mismatches (outliers). **Goal**: Estimate homography; segment inliers/outliers; visualize them separately. **Why**: Feature matching produces false positives, especially near image boundaries or repetitive textures. RANSAC is statistically principled and handles high outlier ratios (even 80%+). **Where**: Panorama stitching (handling matched features with some jitter), object recognition in cluttered scenes, video stabilization, document detection in natural photos. **Practical insight**: The inlier threshold choice affects sensitivity; too loose includes noise, too tight rejects valid matches. Standard choice: median re-projection error $\times$ 2-3.

In [None]:
# Goal: align img_bgr to img2_bgr using the estimated homography
# Why: registration puts images into a common coordinate system
if 'H' not in locals() or H is None:
    raise ValueError("Homography H not available; run Section 10 first")

# Warp img_bgr into the coordinate frame of img2_bgr
h2, w2 = img2_bgr.shape[:2]
img_reg = cv2.warpPerspective(img_bgr, H, (w2, h2))

# Visual validation: overlay aligned image with the reference
overlay = cv2.addWeighted(img_reg, 0.5, img2_bgr, 0.5, 0)

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img_reg, cv2.COLOR_BGR2RGB))
plt.title("Warped to reference")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(overlay, cv2.COLOR_BGR2RGB))
plt.title("Overlay (validation)")
plt.axis("off")
plt.tight_layout()
plt.show()

## 11. Image Registration
**Theory**: Registration aligns images to a common coordinate system, enabling comparison or fusion. Given matched features and a robust homography, the estimated transform moves one image onto another. **Goal**: Apply estimated homography to warp an image; overlay to visualize alignment. **Why**: Medical imaging requires aligning scans (CT, MRI) for disease tracking or surgical planning. Satellite imagery is registered to maps for monitoring. Multi-exposure photos are registered for HDR. **Where**: Medical image alignment (patient follow-up), remote sensing (change detection), multi-temporal analysis (tracking land-use changes), face morphing (warping between identities), video stabilization. **Quality check**: Overlay aligned images with partial transparency; features should align. Misalignment indicates poor feature matching or outlier rejection failure. **Extensions**: Non-rigid registration deforms one image elastically to match the other (used for anatomical atlas matching).

In [None]:
# Goal: manual stitching using matches and homography
# Why: exposes each step of the panorama pipeline
# Ensure homography is available (from Section 10)
if 'H' not in locals() or H is None:
    raise ValueError("Homography H not available; run Section 10 first")

# Create a canvas that can hold both images side-by-side
h1, w1 = img_bgr.shape[:2]
h2, w2 = img2_bgr.shape[:2]
canvas_w = w1 + w2
canvas_h = max(h1, h2)

# Warp img_bgr into the canvas using the homography
warped = cv2.warpPerspective(img_bgr, H, (canvas_w, canvas_h))

# Paste the reference image on the left side
canvas = warped.copy()
canvas[0:h2, 0:w2] = img2_bgr

# Blend overlap region for a simple seam reduction
blend = cv2.addWeighted(canvas, 0.5, warped, 0.5, 0)

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(canvas, cv2.COLOR_BGR2RGB))
plt.title("Manual stitch (overlay)")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(blend, cv2.COLOR_BGR2RGB))
plt.title("Manual stitch (blended)")
plt.axis("off")
plt.tight_layout()
plt.show()

In [None]:
# Goal: automatic stitching using OpenCV's Stitcher
# Why: quick panorama without manual steps
stitcher = cv2.Stitcher_create()
status, pano = stitcher.stitch([img2_bgr, img_bgr])

if status == cv2.Stitcher_OK:
    plt.figure(figsize=(10, 4))
    plt.imshow(cv2.cvtColor(pano, cv2.COLOR_BGR2RGB))
    plt.title("Automatic stitch")
    plt.axis("off")
    plt.show()
else:
    print("Stitcher failed with status:", status)

## 12. Image Stitching
**Theory**: Panorama creation stitches together overlapping images into a wider field of view. Manual stitching: find overlaps, compute homographies, place onto canvas. Automatic stitching: pre-calibrated or estimated homographies, blend seams. **Goal**: Demonstrate manual canvas-based stitching and OpenCV's automated Stitcher. **Why**: Panoramas extend field of view beyond single camera capability. Useful for surveying, tourism, heritage documentation, and artistic effects. **Where**: Smartphone panorama mode, satellite map mosaicking (stitching adjacent orbital swaths), microscopy automontage (tiling large samples), augmented reality backgrounds, virtual tours. **Challenges**: Exposure variation (images taken at different times have different brightness), parallax (moving camera reveals objects at different depths), vignetting (brightness drop at image corners). Modern stitchers use blending, exposure compensation, and seam optimization. **Manual approach** shows the core concepts; auto-stitching adds robust feature detection and seam blending.

## 13. Blending and Masking
**Theory**: After aligning images, blending creates seamless transitions between them. Weighted blending combines images with alpha values: $result = \alpha \cdot img_1 + (1-\alpha) \cdot img_2$. Masks isolate regions (e.g., circular regions, detected objects). Bitwise operations (AND, OR, NOT) enable selective composition and region extraction. **Goal**: Demonstrate weighted blending, mask-based compositing, and bitwise operations. **Why**: Seams appear when stitching images due to exposure differences or geometric misalignment. Blending smooths transitions. Masks enable selective processing (e.g., face-only augmented reality filters). **Where**: Panorama rendering (feathering seams), image fusion (combining complementary information from multiple sensors), portrait retouching (selective enhancement of face vs. background), augmented reality (compositing virtual objects onto backgrounds), medical image overlay (fusing lesion maps with patient scans). **Practical insight**: Gaussian or soft masks reduce visible stitching artifacts compared to hard rectangular masks. Exposure compensation prior to blending improves results.

In [None]:
# Goal: blend two images with a simple weighted sum
# Why: quick way to combine information from two aligned images
img_a = img_bgr
img_b = cv2.resize(img2_bgr, (w, h))

blend_weighted = cv2.addWeighted(img_a, 0.6, img_b, 0.4, 0)

# Goal: create a mask to isolate a circular region
# Why: masks control where bitwise operations apply
mask = np.zeros((h, w), dtype=np.uint8)
cv2.circle(mask, (w // 2, h // 2), min(h, w) // 4, 255, -1)

# Use mask to extract region from img_a
masked_a = cv2.bitwise_and(img_a, img_a, mask=mask)

# Invert mask to extract the complement from img_b
mask_inv = cv2.bitwise_not(mask)
masked_b = cv2.bitwise_and(img_b, img_b, mask=mask_inv)

# Combine the two masked regions
combined = cv2.bitwise_or(masked_a, masked_b)

plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.imshow(cv2.cvtColor(blend_weighted, cv2.COLOR_BGR2RGB))
plt.title("addWeighted blend")
plt.axis("off")
plt.subplot(1, 3, 2)
plt.imshow(mask, cmap="gray")
plt.title("Mask")
plt.axis("off")
plt.subplot(1, 3, 3)
plt.imshow(cv2.cvtColor(combined, cv2.COLOR_BGR2RGB))
plt.title("Mask + bitwise combine")
plt.axis("off")
plt.tight_layout()
plt.show()