## Loading image data

The easiest way to load image data is with `datasets.ImageFolder` from torchvision.
Example:
```python
# transforms = transforms.Compose(...)
dataset = datasets.ImageFolder('path/to/data', transforms=transforms)
```
Each class should have its own directory, like so:
* root/dog/123.png
* root/cat/456.png


#### Transforms
We need to define transforms when loading image data. Images might be different sizes so we need to resize them to a standard size for training.
We also convert the images to pytorch tensors with `transforms.ToTensor()`
We combine these two transforms into a pipeline with `transforms.Compose()`
```python
transforms = transforms.Compose([
    transforms.Resize(255),
    transforms.CenterCrop(224),
    transforms.ToTensor()
])
```


#### Data Loaders
After ImageLoader loads, you pass it to a DataLoader. The DataLoader takes a dataset and returns batches of images and the corresponding labels.
```python
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
```
The dataloader is a generator, which means you have to loop through it/convert to iterator and call `next()`


#### Data Augmentation
It's a good strategy to introduce randomness in the input data. We can randomly rotate, mirror, scale and crop images during training. This helps the network generalize well as it's seeing the same images but in different locations, size and orientations.

In [1]:
import torch
from torchvision import datasets, transforms

train_transforms = transforms.Compose([
    transforms.RandomRotation(30),
    transforms.RandomResizedCrop(100),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])
# we also normalize images by passing mean and standard devitions (or a list of the two)

The color channels are normalized like so
```
input[channel] = (input[channel] - mean[channel] / std[channel])
```
NOTE:
* Subtracting `mean` centers data around zero
* Dividing by `std` squishes the values to be between -1 and 1.
* Normalizing helps keep the work weights near zero which helps backpropagation to be more stable. Without normalization, networks tend to fail to learn.


#### Loading Cats and Dogs images and build a dataloader
We'll use the Cats and Dogs classification data from Kaggle

In [10]:
import torch
from torchvision import datasets, transforms

data_dir = 'dogs_vs_cats/'

train_transforms = transforms.Compose([
    transforms.RandomRotation(30),
    transforms.RandomResizedCrop(100),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()])

test_transforms = transforms.Compose([
    transforms.Resize(255),
    transforms.CenterCrop(224),
    transforms.ToTensor()]
)

train_data = datasets.ImageFolder(data_dir + 'train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + 'test1', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=32)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)


RuntimeError: Found 0 files in subfolders of: dogs_vs_cats/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp