# Explore the Yale cropped faces dataset.
Stough, DIP

Here I'm exploring a part of the [Yale Extended Face Dataset B](http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html). This CroppedYale dataset consists of ~65 cropped face images for each of 38 individuals. The images are taken under different lighting conditions, but the face orientation stays the same. 

The dataset is organized in a very convenient way for PyTorch, as a root directory each of whose subdirectories represents a class and contains the images of that class. This is ideal for torch's [ImageFolder](https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.ImageFolder) Dataset.

In [None]:
%matplotlib widget
# or widget
import matplotlib.pyplot as plt

import numpy as np
from random import shuffle
import os


# from keras.datasets import mnist
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from torchvision.utils import make_grid

from torchvision.datasets import ImageFolder
import torch

# For timing.
import time
tic, toc = (time.time, time.time)

&nbsp;

## Let's start by just getting the data, seeing what it is.
As a Dataset, it's indexable with `__getitem__`. Indexing the Dataset shows it gives you a tuple. This is the input, target tuple that you can train for. Further, the `.classes` and `.class_to_idx` show the folder names corresponding to the target index in the tuples.

In [None]:
yaleData = ImageFolder('/home/dip365/data/CroppedYale/')
yaleData

In [None]:
yaleData[1000]

In [None]:
yaleData.class_to_idx

&nbsp;

## Now, let's load the same dataset, BUT
with a composition of transforms that gives us tensor data that our torch models can deal with. We'll turn it into grayscale, make sure all the images are the same size, and convert to Tensor, all through [built-in transforms](https://pytorch.org/docs/stable/torchvision/transforms.html).

In [None]:
yaleData = ImageFolder('/home/dip365/data/CroppedYale/', 
                       transform=transforms.Compose([
                           transforms.Grayscale(),
                           transforms.Resize((192,168), interpolation=transforms.InterpolationMode.BILINEAR),
                           transforms.ToTensor()
                       ]))

In [None]:
samples = torch.stack([yaleData[i][0] 
                       for i in np.random.choice(len(yaleData), 64)])
plt.figure(figsize=(5,5))
plt.imshow(make_grid(samples, nrow=8, pad_value=1.0).permute(1,2,0))
plt.tight_layout()

&nbsp;

## We can even do some [data augmentation](https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/) 
through the torch transforms available. Here we'll use [ColorJitter](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.ColorJitter) to randomly perturb the brightness and contrast in an image each time we load it from our Dataset. Notice in the below I load the very same image 64 times, but it looks a little different each time. 

In [None]:
yaleData = ImageFolder('/home/dip365/data/CroppedYale/', 
                       transform=transforms.Compose([
                           transforms.Grayscale(),
                           transforms.Resize((192,168), interpolation=transforms.InterpolationMode.BILINEAR),
                           transforms.ColorJitter(brightness=.5, contrast=.3), # Random recolorization every load.
                           transforms.ToTensor()
                       ]))

In [None]:
# Here, let's view some random faces
samples = torch.stack([yaleData[100][0] 
                       for i in range(64)])
plt.figure(figsize=(5,5))
plt.imshow(make_grid(samples, nrow=8, pad_value=1.0).permute(1,2,0))
plt.tight_layout()

&nbsp;

## We can also randomly split the data into 
as many subsets (like training and testing) as we like using torch's [random_split](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split) utility. It doesn't support stratified splits though, so that we likely won't have equal proportions of the classes/subjects across splits.

Here we'll do a 90%-10% split, then look at the distribution of classes/subjects in the test set.

In [None]:
train_test_splits = np.round([f*len(yaleData) for f in [.9, .1]]).astype(np.int32)
trainFaces, testFaces = torch.utils.data.random_split(yaleData, train_test_splits.tolist())

In [None]:
import pandas
testlabels = pandas.Series([yaleData.classes[tup[1]] for tup in testFaces])
testlabels.value_counts()

&nbsp;

## Lastly, here I take a quick look at the Extended 
(multiple face orientations and lighting conditions) dataset. This is much larger, with 
some 585 images for each of the 38 subjects. Also each image is larger, at 480x640. 
The below assumes you've made the link (ln -s) connecting your local `data/ExtendedYaleB` to `~dip365/data/ExtendedYaleB/`.

In [None]:
extData = ImageFolder('/home/dip365/data/ExtendedYaleB/', # or I guess, '~dip379/data/ExtendedYaleB/'
                       transform=transforms.Compose([
                           transforms.Grayscale(),
                           transforms.Resize((480,640)),
                           transforms.ToTensor()
                       ]))

In [None]:
samples = torch.stack([extData[i][0] 
                       for i in np.random.choice(len(extData), 64, replace=False)])
plt.figure(figsize=(6,6))
plt.imshow(make_grid(samples, nrow=8, padding=4,pad_value=1.0).permute(1,2,0))
plt.tight_layout()