# Data source

Data for download: https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz

Resources used as reference:
1) Training the image classifier to recognize different species of flowers:
https://www.kaggle.com/dtosidis/flower-classifier-tensorflow
  
2) Loading and preprocessing an image dataset
https://www.tensorflow.org/tutorials/load_data/images

3) Data augmentation
https://www.tensorflow.org/tutorials/images/data_augmentation

4) Image classification
https://www.tensorflow.org/tutorials/images/classification


##  ONce you found the best number of epochs, use that epoch in this notebook

In [1]:
# !pip install keras

In [2]:
# !pip install scikit-learn

In [1]:
# Dependencies
import matplotlib.pyplot as plt
%matplotlib inline

import os
import numpy as np
import tensorflow as tf

import PIL
import PIL.Image

import pathlib

os.environ['KMP_DUPLICATE_LIB_OK']='True'

from tensorflow import keras
from tensorflow.keras.preprocessing import image
import tensorflow_hub as hub

from keras.datasets import mnist
from keras.layers.core import Dense, Dropout, Activation
from keras.models import Sequential

from tensorflow.keras.applications.vgg19 import (
    VGG19, 
    preprocess_input, 
    decode_predictions
)

#New method for test train split

from sklearn.model_selection import train_test_split



In [2]:
!pwd

/c/ShankersDocs/EDUCATION/RICE_Bootcamp_DataAnalytics/FinalProject_Img_Recognition_Flowers/Final_RICEproject_ImageRecognition_flowers


### Total images in the dataset

In [3]:
data_dir = 'flower_photos'
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))
# print(image_count)

## Image Classification
https://www.tensorflow.org/tutorials/images/classification?hl=zh-tw

## Hyperparameters of the model

### Batch Size
* 32, 64, 128

### Epochs
* From 1 to 100
* We will use accuracy score on the validation data to find the best epoch

* We will use the accuracy score on the validation data to find the best hyperparameter of the model
* Once we find the best hyperparameter of the model. we train the model using that hyperparameter and then estimate the model performance on the test data

# Just change this line, to trake differentt batch sizes

In [4]:
batch_size = 128
img_height = 180
img_width = 180

## Generating datasets
https://keras.io/examples/vision/image_classification_from_scratch/

### Generating a training dataset

In [5]:
# When the subset below is defined as "training" the 0.2 validation split takes 80% of the data as the training set

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  'flower_photos',
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Found 3670 files belonging to 5 classes.
Using 2936 files for training.


### Generating a validation dataset

In [6]:
# When the subset below is defined as "validation" the 0.1 validation split takes 10% of the data as the validation set

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  'flower_photos',
  validation_split=0.1,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Found 3670 files belonging to 5 classes.
Using 367 files for validation.


### Generating a test dataset

In [7]:
# When the subset below is defined as "validation" the 0.1 validation split takes 10% of the data as the test set

test_ds = tf.keras.preprocessing.image_dataset_from_directory(
  'flower_photos',
  validation_split=0.1,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Found 3670 files belonging to 5 classes.
Using 367 files for validation.


### Class names

In [8]:
class_names = train_ds.class_names
print(class_names)

['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']


### Rescaling the data
https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Rescaling


In [9]:
from tensorflow.keras import layers

normalization_layer = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)

### Normalizing the data (trainign and validation datasets)

In [10]:
normalized_train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
normalized_val_ds =  val_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_train_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image)) 

0.0 1.0


### Autotune is done to cache data and make processing and resource mgmt more effieicient

In [11]:
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

In [12]:
normalized_train_ds = normalized_train_ds.cache().prefetch(buffer_size=AUTOTUNE)
normalized_val_ds = normalized_val_ds.cache().prefetch(buffer_size=AUTOTUNE)

In [13]:
# num_classes = 5
num_classes = len(class_names)
num_classes

5

## Model 1 (Sequential Model)
https://www.tensorflow.org/guide/keras/sequential_model

## Model 2 

In [14]:
model_2 = tf.keras.Sequential([
  layers.experimental.preprocessing.Rescaling(1./255),
  layers.Conv2D(128, 3, activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(64, activation='relu'),
  layers.Dense(num_classes)
])

In [15]:
model_2.compile(
  optimizer='adam',
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['accuracy'])

In [16]:
EPOCHS = 7

In [17]:
history = model_2.fit(train_ds,validation_data= val_ds, epochs=EPOCHS)

Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7


In [18]:
model_2.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
rescaling_1 (Rescaling)      (None, 180, 180, 3)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 178, 178, 128)     3584      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 89, 89, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 87, 87, 64)        73792     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 43, 43, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 41, 41, 32)        18464     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 20, 20, 32)        0

In [19]:
# ake the validation accuracy of the last epoch

# Create table
Batch Size | Validation accuracy
-----------|----------
32 |63.2%
64 | 63.2%
128 | 64.0%

Then take the ebatch size with thew best validation accuracy
* Now, you have tht best epoch and batch size. Now train the besat model with that epoch and batch size
* After you finished trainng, get the perrofrmance on the test data and save it to use on the api(if you want)