# Computer Vision Introduction

## Overview
- Computer vision problems involve interpreting visual data, similar to how humans perceive images.
- Machine learning, especially using PyTorch, is a powerful tool for solving computer vision challenges.

## Inputs and Outputs in Computer Vision
- **Inputs**: Typically, images are represented as tensors with dimensions for height, width, and color channels (RGB).
    - For example, a 24x24 image with 3 color channels would be represented numerically to capture the intensity of red, green, and blue at each pixel.
- **Outputs**: The goal is often classification, where the model predicts the probability of the image belonging to certain classes (e.g., sushi, steak, pizza).
    - The output is structured as a vector of prediction probabilities, one for each class.

## Multi-classification Example
- An app that identifies food in images demonstrates the application of computer vision in multi-classification.
- The process involves numerically encoding images into tensors and using a machine learning model (e.g., CNN) to classify the images into predefined categories.

## Convolutional Neural Networks (CNNs)
- CNNs are the go-to model for image data, known for their ability to capture spatial hierarchies in images.
- They work by applying filters to the input images to extract features, which are then used for classification or other tasks.
- Recent research also highlights the effectiveness of transformer architectures in handling image data.

## Practical Considerations
- **Tensor Shapes**: Ensuring compatibility between input and output tensor shapes is crucial. Common dimensions include batch size, image height and width, and color channels.
- **Model Training**: Improving model performance can involve showing more varied examples or adjusting the model architecture.

## Conclusion
- Computer vision encompasses a broad range of problems from classification to object detection and segmentation.
- Successful application requires understanding both the theoretical underpinnings and practical aspects of machine learning models and data representation.


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader

# Define a simple CNN architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # Input channels = 3 (RGB), Output channels = 6, Kernel size = 5
        self.pool = nn.MaxPool2d(2, 2)   # Pooling over a 2x2 window
        self.conv2 = nn.Conv2d(6, 16, 5) # Input channels = 6, Output channels = 16, Kernel size = 5
        self.fc1 = nn.Linear(16 * 5 * 5, 120) # Fully connected layers
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)    # Output layer for 10 classes

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)      # Flatten the tensor for fully connected layer
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)                 # No activation here since we'll use CrossEntropyLoss, which includes softmax
        return x

# Initialize the CNN
net = SimpleCNN()

# Load and transform the CIFAR10 dataset
transform = transforms.Compose([
    transforms.ToTensor(), # Convert images to PyTorch tensors
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize the images
])

# Download the training and test sets
trainset = CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = CIFAR10(root='./data', train=False, download=True, transform=transform)

# Create data loaders
trainloader = DataLoader(trainset, batch_size=4, shuffle=True)
testloader = DataLoader(testset, batch_size=4, shuffle=False)

# Example of accessing a single batch of images
images, labels = next(iter(trainloader))
print(f'Image tensor shape: {images.shape}') # Shape format: [batch_size, color_channels, height, width]

# Simplified training loop (pseudo-code)
# for epoch in range(num_epochs):
#     for images, labels in trainloader:
#         # Forward pass
#         outputs = net(images)
#         # Calculate loss
#         loss = criterion(outputs, labels)
#         # Backward pass and optimize
#         optimizer.zero_grad()
#         loss.backward()
#         optimizer.step()

# Note: This example is for demonstration purposes and omits parts like defining the loss function and optimizer,
# and the actual training loop for brevity. It shows the model definition, dataset loading, and handling input and output shapes.


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data\cifar-10-python.tar.gz


100.0%


Extracting ./data\cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Image tensor shape: torch.Size([4, 3, 32, 32])
