This is a data processing script. Converts the image files into 3D Tensors and expression value as an int between 0 and 10. Can feed into pytorch dataloader for shuffling and creating batches. Using manually annotated list of folders for processing on small set of data. Can remove the requirement inFolder when getting all data. 

In [191]:
from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
from PIL import Image

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

plt.ion()   # interactive mode


This reads in the csv and splits it to get list of folders from Pandas dataframe. This is needed if we are only making a small dataset of specific folders. Ignore if using full dataset listed in excel document. Because we are using manually annotated data only, this is a necessary step. 

In [183]:
training_sheet = pd.read_csv('training.csv')
training_sheet_split = pd.DataFrame(training_sheet.subDirectory_filePath.str.split("/").tolist(),columns = ['folder','subpath'])
folders = list(map(int,training_sheet_split.folder))
folder_list = [1,10,100, 102, 103] + list(range(1000,1030))
inFolder = np.isin(folders, folder_list)
#print(np.where(inFolder)[0])

Creating dataset class that reads in csv, transforms, and included folders of images. The length function gives an accurate size, and the getitem allows retrieval of 3D tensor of image plus expression as an int. This is returned as a tuple when indexing through the dataset object. 

In [192]:
class FaceDataset(Dataset):
    """Face dataset."""

    def __init__(self, csv_file, root_dir, transform=None, inFolder=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.training_sheet = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
        if inFolder.any() == None:
            self.inFolder = np.full((len(self.training_sheet),), True)
        
        self.loc_list = np.where(inFolder)[0]
        

    def __len__(self):
        return  np.sum(self.inFolder*1)

    def __getitem__(self, idx):
        idx = self.loc_list[idx] 
        emotion = self.training_sheet.iloc[idx,6]
        img_name = os.path.join(self.root_dir,
                                self.training_sheet.iloc[idx, 0])
        
        image = Image.open(img_name)
        sample = image
        
        if self.transform:
            sample = self.transform(sample)

        return sample, emotion

This loads the dataset with resizing, random cropping, and transforming to a tensor. Info can be found here. 
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

In [186]:
face_dataset = FaceDataset(csv_file='training.csv',
                                    root_dir='Manually_Annotated_Images', transform=transforms.Compose([
                                        transforms.Resize(256), transforms.RandomCrop(size=128), transforms.ToTensor()
                                    ]), inFolder = inFolder)
                                  
    

In [190]:
# Testing

im, y = face_dataset[0]
im.shape

torch.Size([3, 128, 128])

In [None]:
dataloader = DataLoader(transformed_dataset, batch_size=4,
                        shuffle=True, num_workers=4)