# Machine Vision Fundamentals

**Author:** Grok AI
**Date:** September 14, 2025

## Overview

This notebook introduces the fundamentals of machine vision using Python, NumPy, OpenCV, scikit-image, and matplotlib. It is designed for interns or beginners, providing clear explanations, code examples, and visualizations. By the end, you will learn:

- Basic image handling and properties
- Histogram analysis and contrast enhancement
- Geometric transformations
- Filtering and denoising
- Edge detection
- Morphological operations
- Feature detection and matching
- Simple image segmentation
- (Optional) Introduction to deep learning for feature extraction

The notebook uses sample images from scikit-image for reproducibility, so no external downloads are needed. Run the cells sequentially.

In [None]:
import sys
import numpy as np
import cv2
import skimage
import matplotlib
import scipy

print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"OpenCV version: {cv2.__version__}")
print(f"scikit-image version: {skimage.__version__}")
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"SciPy version: {scipy.__version__}")

## Image Basics

Images in machine vision are represented as arrays of pixels. Key concepts:

- **Pixels**: The smallest unit of an image, holding intensity values.
- **Dimensions**: Width (columns), height (rows), channels (e.g., 3 for RGB).
- **Data Type (dtype)**: Often uint8 (0-255) for images.
- **Color Spaces**: RGB (Red-Green-Blue), BGR (OpenCV default), Grayscale (single channel).

We'll load images using OpenCV (cv2) and scikit-image, noting that cv2 uses BGR by default, while matplotlib expects RGB.

In [None]:
from skimage import data
import matplotlib.pyplot as plt

# Load sample images from scikit-image
color_image = data.astronaut()  # RGB color image
gray_image = data.camera()  # Grayscale image

# Save temporarily and load with OpenCV to demonstrate I/O
from skimage.io import imsave, imread
imsave('temp_color.png', color_image)
imsave('temp_gray.png', gray_image)

cv_color = cv2.imread('temp_color.png')  # Loads as BGR
cv_gray = cv2.imread('temp_gray.png', cv2.IMREAD_GRAYSCALE)

# Display with matplotlib (convert BGR to RGB for cv_color)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(cv_color[:,:,::-1])  # BGR to RGB
axs[0].set_title('Color Image (RGB)')
axs[1].imshow(cv_gray, cmap='gray')
axs[1].set_title('Grayscale Image')
plt.show()

# Clean up temp files
import os
os.remove('temp_color.png')
os.remove('temp_gray.png')

In [None]:
print("Color Image Shape:", color_image.shape)
print("Color Image Dtype:", color_image.dtype)
print("Color Image Min/Max:", color_image.min(), color_image.max())

print("\nGrayscale Image Shape:", gray_image.shape)
print("Grayscale Image Dtype:", gray_image.dtype)
print("Grayscale Image Min/Max:", gray_image.min(), gray_image.max())

In [None]:
# Convert RGB to BGR
bgr_image = cv2.cvtColor(color_image, cv2.COLOR_RGB2BGR)

# Convert color to grayscale
gray_from_color = cv2.cvtColor(color_image, cv2.COLOR_RGB2GRAY)

# Differences: Grayscale reduces channels to 1, losing color info. BGR vs RGB is channel order.
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
axs[0].imshow(color_image)
axs[0].set_title('RGB')
axs[1].imshow(bgr_image[:,:,::-1])  # Show as RGB for consistency
axs[1].set_title('BGR (displayed as RGB)')
axs[2].imshow(gray_from_color, cmap='gray')
axs[2].set_title('Grayscale from Color')
plt.show()

## Histograms and Contrast

A histogram shows the distribution of pixel intensities. For grayscale, it's a plot of intensity (0-255) vs frequency.

Contrast adjustment improves visibility: linear rescale stretches values, histogram equalization spreads them out.

In [None]:
from skimage.exposure import equalize_hist

# Grayscale histogram
hist, bins = np.histogram(gray_image.ravel(), bins=256, range=(0, 256))

plt.figure(figsize=(6, 4))
plt.bar(bins[:-1], hist, width=1)
plt.title('Grayscale Histogram')
plt.xlabel('Intensity')
plt.ylabel('Frequency')
plt.show()

# Interpretation: Peaks indicate common intensities; flat means low contrast.

In [None]:
from skimage.exposure import rescale_intensity, equalize_adapthist

# Linear rescale
rescaled = rescale_intensity(gray_image, in_range='image', out_range=(0, 255))

# Histogram equalization
equalized = (equalize_hist(gray_image) * 255).astype(np.uint8)

# CLAHE (optional)
clahe = (equalize_adapthist(gray_image) * 255).astype(np.uint8)

# Plot histograms before/after
fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs[0,0].hist(gray_image.ravel(), bins=256)
axs[0,0].set_title('Original')
axs[0,1].hist(rescaled.ravel(), bins=256)
axs[0,1].set_title('Rescaled')
axs[1,0].hist(equalized.ravel(), bins=256)
axs[1,0].set_title('Equalized')
axs[1,1].hist(clahe.ravel(), bins=256)
axs[1,1].set_title('CLAHE')
plt.show()

# Discussion: Equalization spreads histogram, revealing details in dark/bright areas.

## Geometric Transforms

These change image size or orientation: resize (scale), crop (slice), rotate.

Interpolation methods: nearest (fast, blocky), bilinear (smooth).

In [None]:
from skimage.transform import resize, rotate

# Resize
small = resize(color_image, (color_image.shape[0]//2, color_image.shape[1]//2), anti_aliasing=True)
large = resize(color_image, (color_image.shape[0]*2, color_image.shape[1]*2), order=3)  # Bicubic

# Crop (array slicing)
crop = color_image[100:400, 100:400]

# Rotate
rotated = rotate(color_image, 30, mode='wrap')

fig, axs = plt.subplots(2, 2, figsize=(12, 12))
axs[0,0].imshow(small)
axs[0,0].set_title('2x Smaller (Bilinear)')
axs[0,1].imshow(large)
axs[0,1].set_title('2x Larger (Bicubic)')
axs[1,0].imshow(crop)
axs[1,0].set_title('Cropped')
axs[1,1].imshow(rotated)
axs[1,1].set_title('30° Rotated')
plt.show()

# Interpolation differences: Nearest would show artifacts in enlargement.

## Filtering and Denoising

Filters smooth or enhance: mean (box) blur averages, Gaussian weights by distance, median good for salt-pepper noise.

Add noise: Gaussian (normal dist), salt-pepper (random pixels).

In [None]:
import numpy as np
from skimage.util import random_noise
from skimage.filters import gaussian, median
from skimage.morphology import disk
from skimage.metrics import peak_signal_noise_ratio as psnr

np.random.seed(42)  # Reproducibility

# Add Gaussian noise
noisy_gauss = random_noise(gray_image, mode='gaussian', var=0.01)

# Add salt-and-pepper noise
noisy_sp = random_noise(gray_image, mode='s&p', amount=0.05)

# Denoise
blur_mean = cv2.blur(noisy_gauss, (3,3))
blur_gauss = gaussian(noisy_gauss, sigma=1)
median_sp = median(noisy_sp, disk(1))

# Compare PSNR
psnr_gauss = psnr(gray_image / 255.0, noisy_gauss)
psnr_denoised_gauss = psnr(gray_image / 255.0, blur_gauss)
psnr_sp = psnr(gray_image / 255.0, noisy_sp)
psnr_denoised_sp = psnr(gray_image / 255.0, median_sp)

print(f"Gaussian Noise PSNR: {psnr_gauss:.2f}, Denoised: {psnr_denoised_gauss:.2f}")
print(f"S&P Noise PSNR: {psnr_sp:.2f}, Denoised: {psnr_denoised_sp:.2f}")

# Viz
fig, axs = plt.subplots(2, 3, figsize=(15, 10))
axs[0,0].imshow(noisy_gauss, cmap='gray')
axs[0,0].set_title('Gaussian Noisy')
axs[0,1].imshow(blur_mean, cmap='gray')
axs[0,1].set_title('Mean Blur')
axs[0,2].imshow(blur_gauss, cmap='gray')
axs[0,2].set_title('Gaussian Blur')
axs[1,0].imshow(noisy_sp, cmap='gray')
axs[1,0].set_title('S&P Noisy')
axs[1,1].imshow(median_sp, cmap='gray')
axs[1,1].set_title('Median Filter')
plt.show()

## Edge Detection

Edges are abrupt intensity changes. Gradient: rate of change (Sobel for x/y).

Canny: multi-stage, uses thresholds for strong/weak edges.

In [None]:
from skimage.filters import sobel
from skimage.feature import canny

# Sobel magnitude
sobel_mag = sobel(gray_image / 255.0)

# Canny
canny_edges = canny(gray_image / 255.0, sigma=1)

fig, axs = plt.subplots(1, 3, figsize=(15, 5))
axs[0].imshow(gray_image, cmap='gray')
axs[0].set_title('Original')
axs[1].imshow(sobel_mag, cmap='gray')
axs[1].set_title('Sobel Magnitude')
axs[2].imshow(canny_edges, cmap='gray')
axs[2].set_title('Canny Edges')
plt.show()

## Morphological Operations

For binary images: erosion shrinks, dilation expands, opening removes noise, closing fills holes.

Structuring element (kernel): shape/size affects result.

In [None]:
from skimage.morphology import binary_erosion, binary_dilation, binary_opening, binary_closing, square

# Threshold to binary
_, binary = cv2.threshold(gray_image, 127, 255, cv2.THRESH_BINARY)

binary = binary.astype(bool)  # For skimage

# Operations with square kernel
eroded = binary_erosion(binary, square(3))
dilated = binary_dilation(binary, square(3))
opened = binary_opening(binary, square(3))
closed = binary_closing(binary, square(3))

# Show effects
fig, axs = plt.subplots(2, 3, figsize=(15, 10))
axs[0,0].imshow(binary, cmap='gray')
axs[0,0].set_title('Binary')
axs[0,1].imshow(eroded, cmap='gray')
axs[0,1].set_title('Eroded (3x3)')
axs[0,2].imshow(dilated, cmap='gray')
axs[0,2].set_title('Dilated (3x3)')
axs[1,0].imshow(opened, cmap='gray')
axs[1,0].set_title('Opened (3x3)')
axs[1,1].imshow(binary_closing(binary, square(5)), cmap='gray')
axs[1,1].set_title('Closed (5x5)')
plt.show()

# Larger kernel smooths more.

## Feature Detection and Description

Features are distinctive points (keypoints) with descriptors.

ORB: Oriented FAST and Rotated BRIEF, rotation/scale invariant.

In [None]:
# Create rotated image
rotated_color = rotate(color_image, 30, mode='wrap')

# ORB detector
orb = cv2.ORB_create()

# Detect keypoints and descriptors
kp1, des1 = orb.detectAndCompute(cv2.cvtColor(color_image, cv2.COLOR_RGB2GRAY), None)
kp2, des2 = orb.detectAndCompute(cv2.cvtColor(rotated_color, cv2.COLOR_RGB2GRAY), None)

# Matcher
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)

# Draw
img_kp1 = cv2.drawKeypoints(color_image, kp1, None, color=(0,255,0))
img_kp2 = cv2.drawKeypoints(rotated_color, kp2, None, color=(0,255,0))
img_matches = cv2.drawMatches(color_image, kp1, rotated_color, kp2, matches[:50], None, flags=2)

fig, axs = plt.subplots(1, 3, figsize=(20, 7))
axs[0].imshow(img_kp1[:,:,::-1])
axs[0].set_title('Keypoints Original')
axs[1].imshow(img_kp2[:,:,::-1])
axs[1].set_title('Keypoints Rotated')
axs[2].imshow(img_matches[:,:,::-1])
axs[2].set_title('Top 50 Matches')
plt.show()

# Observation: ORB finds matches despite rotation.

## Simple Segmentation

Segmentation separates foreground/background. Thresholding: pixels above threshold are foreground.
Otsu: automatic threshold selection.

In [None]:
from skimage.filters import threshold_otsu

# Global fixed
_, thresh_fixed = cv2.threshold(gray_image, 127, 255, cv2.THRESH_BINARY)

# Otsu
thresh_otsu = threshold_otsu(gray_image)
thresh_otsu_img = gray_image > thresh_otsu

# Foreground ratio
fg_fixed = np.sum(thresh_fixed > 0) / thresh_fixed.size
fg_otsu = np.sum(thresh_otsu_img) / thresh_otsu_img.size

print(f"Fixed Threshold FG Ratio: {fg_fixed:.2f}")
print(f"Otsu FG Ratio: {fg_otsu:.2f}")

fig, axs = plt.subplots(1, 3, figsize=(15, 5))
axs[0].imshow(gray_image, cmap='gray')
axs[0].set_title('Original')
axs[1].imshow(thresh_fixed, cmap='gray')
axs[1].set_title('Fixed Threshold')
axs[2].imshow(thresh_otsu_img, cmap='gray')
axs[2].set_title('Otsu Threshold')
plt.show()

# Compare: Otsu adapts to lighting, better for varying conditions.

## Optional: Deep Learning Intro

Use a pretrained CNN to classify an image. Requires torch and torchvision.

In [None]:
import torch
from torchvision import models, transforms
from torch.nn import functional as F

# Load pretrained ResNet18
model = models.resnet18(pretrained=True)
model.eval()

# Preprocess image
preprocess = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(color_image)
input_batch = input_tensor.unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_batch)

probabilities = F.softmax(output[0], dim=0)

# Top 5 classes (load labels)
!wget -q https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
    print(categories[top5_catid[i]], top5_prob[i].item())

# Visualize probabilities (bar plot)
plt.barh(categories[top5_catid.numpy()], top5_prob.numpy())
plt.title('Top 5 Class Probabilities')
plt.show()

## Exercises

### Exercise 1
Implement a function that takes an image and returns a contrast-stretched version limited to the 2nd–98th percentile of intensities. Test on grayscale and color images; show before/after histograms.

In [None]:
# Your code here

### Exercise 2
Implement a mini pipeline: (a) grayscale, (b) median denoise, (c) Otsu threshold, (d) opening clean, (e) Canny on cleaned mask. Count edge pixels, display 2x3 grid intermediates. Discuss parameters.

In [None]:
# Your code here

## Solutions

(Toggle or scroll to view)

### Solution Exercise 1

In [None]:
def contrast_stretch(img, low_p=2, high_p=98):
    if len(img.shape) == 3:  # Color
        out = np.zeros_like(img)
        for c in range(3):
            p_low, p_high = np.percentile(img[:,:,c], [low_p, high_p])
            out[:,:,c] = rescale_intensity(img[:,:,c], in_range=(p_low, p_high), out_range=(0, 255))
        return out.astype(np.uint8)
    else:  # Grayscale
        p_low, p_high = np.percentile(img, [low_p, high_p])
        return rescale_intensity(img, in_range=(p_low, p_high), out_range=(0, 255)).astype(np.uint8)

# Test grayscale
stretched_gray = contrast_stretch(gray_image)
fig, axs = plt.subplots(1, 2, figsize=(10, 4))
axs[0].hist(gray_image.ravel(), bins=256)
axs[0].set_title('Before')
axs[1].hist(stretched_gray.ravel(), bins=256)
axs[1].set_title('After')
plt.show()

# Test color
stretched_color = contrast_stretch(color_image)
fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(color_image)
axs[0].set_title('Before')
axs[1].imshow(stretched_color)
axs[1].set_title('After')
plt.show()

### Solution Exercise 2

In [None]:
def mini_pipeline(img, median_size=3, open_size=3, canny_sigma=1):
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    denoised = median(gray, disk(median_size))
    thresh_val = threshold_otsu(denoised)
    binary = denoised > thresh_val
    cleaned = binary_opening(binary, square(open_size))
    edges = canny(cleaned.astype(float), sigma=canny_sigma)
    
    edge_count = np.sum(edges)
    print(f"Edge pixels: {edge_count}")
    
    fig, axs = plt.subplots(2, 3, figsize=(15, 10))
    axs[0,0].imshow(gray, cmap='gray'); axs[0,0].set_title('Grayscale')
    axs[0,1].imshow(denoised, cmap='gray'); axs[0,1].set_title('Denoised')
    axs[0,2].imshow(binary, cmap='gray'); axs[0,2].set_title('Otsu Threshold')
    axs[1,0].imshow(cleaned, cmap='gray'); axs[1,0].set_title('Cleaned')
    axs[1,1].imshow(edges, cmap='gray'); axs[1,1].set_title('Canny Edges')
    axs[1,2].axis('off')
    plt.show()
    
    return edges

# Run on color_image
_ = mini_pipeline(color_image)

# Discussion: Median size=3 removes small noise; open_size=3 cleans specks; sigma=1 for fine edges. Adjust for image scale.