<a href="https://colab.research.google.com/github/marcelounb/tensorflow_udacity/blob/master/05_images_of_flowers_exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import numpy as np
import glob
import shutil
import math

In [2]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten

from keras.layers.convolutional import Convolution2D 
from keras.layers.convolutional import MaxPooling2D

from keras.utils import np_utils
from keras.utils import to_categorical

# Data Loading
In order to build our image classifier, we can begin by downloading the flowers dataset. We first need to download the archive version of the dataset and after the download we are storing it to "/tmp/" directory.

After downloading the dataset, we need to extract its contents.

In [3]:
URL = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
zip_file = tf.keras.utils.get_file(fname="flower_photos.tgz", origin=URL, extract=True)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz


In [4]:
base_dir = os.path.join(os.path.dirname(zip_file), 'flower_photos')

In [6]:
base_dir

'/root/.keras/datasets/flower_photos'

The dataset we downloaded contains images of 5 types of flowers:

 .  Rose

 .  Daisy

 .  Dandelion

 .  Sunflowers

 .  Tulips
 
So, let's create the labels for these 5 classes:

In [7]:
classes = ['roses', 'daisy', 'dandelion', 'sunflowers', 'tulips']

In [13]:
!find $base_dir -type d

/root/.keras/datasets/flower_photos
/root/.keras/datasets/flower_photos/sunflowers
/root/.keras/datasets/flower_photos/roses
/root/.keras/datasets/flower_photos/daisy
/root/.keras/datasets/flower_photos/tulips
/root/.keras/datasets/flower_photos/dandelion


In [17]:
num_sunflowers = len(os.listdir('/root/.keras/datasets/flower_photos/sunflowers'))
num_roses = len(os.listdir('/root/.keras/datasets/flower_photos/roses'))
num_daisy = len(os.listdir('/root/.keras/datasets/flower_photos/dandelion'))
num_tulips = len(os.listdir('/root/.keras/datasets/flower_photos/tulips'))
num_dandelion = len(os.listdir('/root/.keras/datasets/flower_photos/dandelion'))
print(f'Directory Sunflowers: {num_sunflowers}')
print(f'Directory Roses: {num_roses}')
print(f'Directory Daisy: {num_daisy}')
print(f'Directory Tulips: {num_tulips}')
print(f'Directory Dandelion: {num_dandelion}')

Directory Sunflowers: 699
Directory Roses: 641
Directory Daisy: 898
Directory Tulips: 799
Directory Dandelion: 898


As you can see there are no folders containing training and validation data. Therefore, we will have to create our own training and validation set. Let's write some code that will do this.

The code below creates a train and a val folder each containing 5 folders (one for each type of flower). It then moves the images from the original folders to these new folders such that 80% of the images go to the training set and 20% of the images go into the validation set. In the end our directory will have the following structure:

flower_photos

|__ diasy

|__ dandelion

|__ roses

|__ sunflowers

|__ tulips

|__ train

    |______ daisy: [1.jpg, 2.jpg, 3.jpg ....]

    |______ dandelion: [1.jpg, 2.jpg, 3.jpg ....]

    |______ roses: [1.jpg, 2.jpg, 3.jpg ....]

    |______ sunflowers: [1.jpg, 2.jpg, 3.jpg ....]

    |______ tulips: [1.jpg, 2.jpg, 3.jpg ....]

 |__ val

    |______ daisy: [507.jpg, 508.jpg, 509.jpg ....]

    |______ dandelion: [719.jpg, 720.jpg, 721.jpg ....]

    |______ roses: [514.jpg, 515.jpg, 516.jpg ....]

    |______ sunflowers: [560.jpg, 561.jpg, 562.jpg .....]

    |______ tulips: [640.jpg, 641.jpg, 642.jpg ....]



Since we don't delete the original folders, they will still be in our flower_photos directory, but they will be empty. The code below also prints the total number of flower images we have for each type of flower.

In [18]:
for cl in classes:
  img_path = os.path.join(base_dir, cl)
  images = glob.glob(img_path + '/*.jpg')
  print("{}: {} Images".format(cl, len(images)))
  train, val = images[:round(len(images)*0.8)], images[round(len(images)*0.8):]

  for t in train:
    if not os.path.exists(os.path.join(base_dir, 'train', cl)):
      os.makedirs(os.path.join(base_dir, 'train', cl))
    shutil.move(t, os.path.join(base_dir, 'train', cl))

  for v in val:
    if not os.path.exists(os.path.join(base_dir, 'val', cl)):
      os.makedirs(os.path.join(base_dir, 'val', cl))
    shutil.move(v, os.path.join(base_dir, 'val', cl))

roses: 641 Images
daisy: 633 Images
dandelion: 898 Images
sunflowers: 699 Images
tulips: 799 Images


For convenience, let us set up the path for the training and validation sets

In [20]:
train_dir = os.path.join(base_dir, 'train')
val_dir = os.path.join(base_dir, 'val')

In [24]:
print("Train Directory:")
for cl in classes:
  img_path = os.path.join(train_dir, cl)
  images = glob.glob(img_path + '/*.jpg')
  print(" |_{}: {} Images".format(cl, len(images)))
print("")
print("Validation Directory:")
for cl in classes:
  img_path = os.path.join(val_dir, cl)
  images = glob.glob(img_path + '/*.jpg')
  print(" |_{}: {} Images".format(cl, len(images)))

Train Directory:
 |_roses: 513 Images
 |_daisy: 506 Images
 |_dandelion: 718 Images
 |_sunflowers: 559 Images
 |_tulips: 639 Images

Validation Directory:
 |_roses: 128 Images
 |_daisy: 127 Images
 |_dandelion: 180 Images
 |_sunflowers: 140 Images
 |_tulips: 160 Images
