# Overview

In this project, we will perform image segmentation task using the [Oxford III-T Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). We will not worry about the specific pet categories for now. The goal is to create an algorithm that can segment and predict the `foreground` (pet) and `background`.

# Import modules

In [83]:
import os, glob
import shutil
import random
import numpy as np
import matplotlib.pyplot as plt
import albumentations as A
import pytorch_lightning as pl
import torch
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
from clearml import Logger, Task
from datetime import datetime
from albumentations.pytorch import ToTensorV2
from simple_parsing import ArgumentParser
from torch.utils.data import Subset, Dataset, DataLoader

random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x2892dba70>

# Utility functions

In [84]:
# Function to check corrupted images

def test_files(image_names: list, images_root: str):
    corrupted = []
    for filepath in images_list:
        full_path = os.path.join(images_root, filepath)
        filename = full_path.split("/")[-1]
        if filename.endswith((".jpg", ".png")): # requires tuple
            try:
                img = Image.open(filepath) # open the image file
                img.verify() # verify that it is, in fact an image
            except (IOError, SyntaxError) as e:
                print("Bad file: ", filename) # print out the names of corrupt files
                corrupted.append(filename)
        else:
            print("Not an image: ", filename) # find the files that are not images
            corrupted.append(filename)
        
    return corrupted

# Data Preprocessing

Downloaded the dataset and removed unnecessary files. Just keeping the `images` and the `masks`.

In [85]:
# Define paths

DATA_ROOT = "./data"
images_dir = os.path.join(DATA_ROOT, "images")
masks_dir = os.path.join(DATA_ROOT, "masks")

## Check images and masks

The total number of `images` and `masks` should be same.

In [86]:
# Check total number of images and masks

images_paths = sorted(os.listdir(images_dir))
masks_paths = sorted(os.listdir(masks_dir))
print(f"Total images: {len(images_paths)}")
print(f"Total masks: {len(masks_paths)}")

Total images: 7393
Total masks: 7390


The total number of images and mask is not same. Either we have missing masks or we have some files whic hare not images.

In [88]:
# Check for corrupted images and masks
corrutped_images = test_files(images_paths, images_dir)
# corrutped_masks = test_files(masks_paths, masks_dir)

Not an image:  Abyssinian_100.mat
Not an image:  Abyssinian_101.mat
Not an image:  Abyssinian_102.mat


We can create a new list which contains only the valid paths that exist in both lists.

In [75]:
valid_paths = [img_path for img_path in images_paths if os.path.exists(os.path.join(images_dir, img_path)) and img_path[:-3] + "png" in masks_paths]
valid_mask_paths = [img_path[:-3] for img_path in valid_paths]

In [76]:
print(f"Total valid images: {len(valid_paths)}")
print(f"Total valid masks: {len(valid_mask_paths)}")

Total valid images: 7393
Total valid masks: 7393


In [40]:
# Check if there are any corrpted images or masks

test_images(images_list)
test_images(masks_list)