# Image Processing

Turning the images into a usable dataset for a CNN. 

In [76]:
import torch
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
from PIL import Image
import os

## Single Image Transformation

*Manually, with a little bit of pytorch*

**Input:** Any raw image object (PNG)

**Steps:**
1. Make sure the image is RGB
2. Resize to a consistent shape (224x224 pixels)
3. Scale raw pixel values from 0-255 to 0-1
4. Convert to tensor
    - 3D matrix (3 x Height x Width)
    - Each of the 3 layers corresponds to the R, G, or B pixel intensities
5. Normalize pixel values based on ImageNet means and stds
    - Pretrained CNNs (like ResNet) were trained on the ImageNet dataset
    - We need to standardize our pixels to match the ImageNet distribution so the pretrained model gets what it expects.


**Output**: An image tensor with the shape (3, 224, 224).

In [77]:
# function to take an image object and return a tensor suitable for model input
def image_to_tensor(image):

    # convert to RGB
    image = image.convert('RGB')
    
    # rest of preprocessing pipeline
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)), # resize to 224x224 pixels
        transforms.ToTensor(), # convert to tensor (automatically scales pixel values to 0-1)
        transforms.Normalize(mean=[0.485, 0.456, 0.406], # normalize with ImageNet RGB distribution
                             std=[0.229, 0.224, 0.225]) 
    ])

    # apply preprocessing
    image_tensor = preprocess(image)

    return image_tensor

In [78]:
# test with one image
alex1_image = Image.open('Data/Alex/Alex-Image01.png')
alex1_tensor = image_to_tensor(alex1_image)
alex1_tensor.shape  # should be torch.Size([3, 224, 224])

torch.Size([3, 224, 224])

In [79]:
alex1_tensor

tensor([[[ 0.3652,  0.3309,  0.3481,  ...,  0.3652,  0.3309,  0.3481],
         [ 0.2796,  0.2796,  0.3652,  ...,  0.4337,  0.3823,  0.3652],
         [ 0.3994,  0.3994,  0.4337,  ...,  0.4851,  0.4508,  0.4851],
         ...,
         [-0.2856, -0.3369, -0.3198,  ...,  0.1768,  0.1254,  0.1426],
         [-0.3541, -0.3883, -0.4054,  ...,  0.0227,  0.0227,  0.0912],
         [-0.4911, -0.4397, -0.3027,  ..., -0.1143,  0.0741,  0.2282]],

        [[ 0.2752,  0.2577,  0.2577,  ...,  0.2927,  0.2752,  0.2927],
         [ 0.2227,  0.2227,  0.2752,  ...,  0.3803,  0.3102,  0.3102],
         [ 0.3277,  0.3277,  0.3452,  ...,  0.4153,  0.3803,  0.4153],
         ...,
         [-0.4426, -0.4951, -0.4601,  ...,  0.1176,  0.0826,  0.0826],
         [-0.4776, -0.5126, -0.5126,  ..., -0.0399, -0.0224,  0.0126],
         [-0.6001, -0.5476, -0.4251,  ..., -0.1275,  0.0476,  0.1702]],

        [[ 0.5311,  0.5136,  0.5136,  ...,  0.6356,  0.6182,  0.6182],
         [ 0.5659,  0.5659,  0.6008,  ...,  0

## Processing All Images

*Still mostly manual*

**Input:** All raw Alex and Kelly images

**Steps:**
1. Create a tensor for each image in each folder
2. Assign a true label (Alex=0, Kelly=1) based on the folder the image came from
3. Stack the image tensors together
    - 4D matrix (N x 3 x height x width)
    - N = # of images = ~400
4. Stack the labels together
    - 1D matrix

**Output:** 
- X: a stack of image tensors of shape (N, 3, 244, 244) 
    - X[i] = ith processed image
- y: a stack of labels of shape (N)
    - y[i] = label for image i

In [80]:
# function to take many image objects and return a full 4D tensor
def all_images_to_tensor(images):
    tensor_list = [image_to_tensor(img) for img in images]
    batch_tensor = torch.stack(tensor_list)
    return batch_tensor

In [81]:
# test on a few images
images = [
    Image.open('Data/Alex/Alex-Image01.png'),
    Image.open('Data/Alex/Alex-Image02.png'),
    Image.open('Data/Alex/Alex-Image03.png')
]
batch_tensor = all_images_to_tensor(images)
batch_tensor.shape  # should be torch.Size([3, 3, 224, 224])

torch.Size([3, 3, 224, 224])

Now go through the Alex and Kelly folders to create one full tensor while assigning true labels:

In [82]:
images = []
labels = []

# loop through Alex images
for filename in os.listdir('Data/Alex'):
    img = Image.open(os.path.join('Data/Alex', filename))
    images.append(img)
    labels.append(0) # Alex = 0

# loop through Kelly images
for filename in os.listdir('Data/Kelly'):
    img = Image.open(os.path.join('Data/Kelly', filename))
    images.append(img)
    labels.append(1) # Kelly = 1

# convert all images to a single tensor
full_tensor = all_images_to_tensor(images)
full_tensor.shape  # should be torch.Size([total_images, 3, 224, 224])

torch.Size([485, 3, 224, 224])

## Full PyTorch Transformation

You can also just have PyTorch do the whole thing. Advantages of this:
- Very short code, ImageFolder and DataLoader do almost all the steps
- The outputted dataset is very accessible and formatted nicely
- dataset/dataloader only loads images on demand so the entire tensor is not stored in memory at all times
    - not as big of an issue with only ~400 images, but still nice

So use this for future modeling, but the more manual steps above are nice for outlining the actual process, or if we need specific control over a step for some reason.

In [83]:
# define transformation pipeline
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

# have PyTorch handle the rest
dataset = datasets.ImageFolder(root='Data', transform=transform)
dataloader = DataLoader(dataset, shuffle=True)
len(dataset) # should be the total number of images in both folders

Some things you can do with `dataset`:

- Look at the classes (should be the names of the folders, Alex and Kelly)

In [92]:
dataset.classes

['Alex', 'Kelly']

- The classes as indexes:

In [93]:
dataset.class_to_idx

{'Alex': 0, 'Kelly': 1}

- Extract the tensor and label from a single image:

In [101]:
tensor, label = dataset[0]
print(f"Tensor shape: {tensor.shape}") # should be torch.Size([3, 224, 224])
print(f"Label: {label}") # should be 0 or 1 depending on the image

Tensor shape: torch.Size([3, 224, 224])
Label: 0
