# Get Flowers102 Dataset With Caption

The Text-to-Image Synthesis 
- [Original repo](https://github1s.com/reedscot/icml2016)
- [Example repo](https://github1s.com/aelnouby/Text-to-Image-Synthesis/tree/master/images)
    - Download Flowers102 [here](https://drive.google.com/file/d/1EgnaTrlHGaqK5CCgHKLclZMT_AMSTyh8/view)

The [paper](https://arxiv.org/pdf/1605.05395.pdf) that first published the annotated Flower102 dataset

Please perform the following steps to correctly download the dataset.

1. Create a `./data` directory under the root directory of this repo. Run `mkdir data && cd ./data`
2. Goto [this repo](https://github1s.com/aelnouby/Text-to-Image-Synthesis/tree/master/images), examine its README.md, then proceed to the [download link](https://drive.google.com/file/d/1EgnaTrlHGaqK5CCgHKLclZMT_AMSTyh8/view)
3. Verify the file is successfully downloaded, and resides in `./data/flowers.hdf5`
4. Install [h5py](https://pypi.org/project/h5py/) using `pip install h5py`

## 1 Examine Items in Flowers.hdf5

In [13]:
import h5py, numpy as np

''' 1. Get familiar with the hdf5 structure '''

file = h5py.File('./data/flowers.hdf5', 'r')

for split in ['train', 'valid', 'test']:
    ds_keys = [str(key) for key in file[split].keys()]
    print(f'The [{split}] set has [{len(ds_keys)}] images, e.g. {ds_keys[:3]} ...')

The [train] set has [29390] images, e.g. ['image_00001_0', 'image_00001_1', 'image_00001_2'] ...
The [valid] set has [5780] images, e.g. ['image_03369_0', 'image_03369_1', 'image_03369_2'] ...
The [test] set has [5775] images, e.g. ['image_03095_0', 'image_03095_1', 'image_03095_2'] ...


In [3]:
from dataset import *

''' 2. Get familiar with images, labels, captions 
    ref: https://github1s.com/aelnouby/Text-to-Image-Synthesis/blob/master/txt2image_dataset.py#L1-L11
'''

dataset = Flowers102('./data/flowers.hdf5')

item = dataset.__getitem__(0)
print(f'There are 5 keys in each item: [{item.keys()}]')
item['txt']

There are 5 keys in each item: [dict_keys(['right_images', 'right_embed', 'wrong_images', 'inter_embed', 'txt'])]


'prominent purple stigma,petals are white inc olor\n'

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.utils as vutils
import torchvision.datasets as dset
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from torchvision.datasets import Flowers102

import numpy as np
import matplotlib.pyplot as plt

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataroot = "./data/cvpr2016_flowers"
image_size = 64
batch_size = 64
noise_channels = 100
num_workers = 4

transform = transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ])


# dataset = Flowers102(root=dataroot, download='True')
dataset = dset.ImageFolder(root=dataroot, transform = transform)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

for im, lab in dataloader:
    print(lab)