# Plant recognition project

## 1 Introduction
 

---

Accurate identification of plant species is essential for a wide range of use cases starting from biodiversity and conservation project, agriculture project and simple nature explorations among others. So far, this task required a specialist knowledge, was time consuming and was often difficult even for professionals.

The image classification algorithms are considered to be as a promising directions in reducing the complexity in plan species classification and assisting the professionals whenever possible. Before the deep learning methods became
available, the task was primarily tackled by identifying the leaf shape patterns, however this required a clear picture of a leave against a white background and had limited accuracy. Deep learning methods allow to use “noisy” photo images and provide increased accuracy.

A range of academic studies has been conducted in the field [1],[2] There exist a number of mobile apps that aim to tackle the problem: noticeable progress in this way was achieved by several projects and apps like LeafSnap
(http://leafsnap.com/), PlantNet (http://identify.plantnet-project.org/) or Folia
(http://liris.univ-lyon2.fr/reves/content/en/index.php).

In addition, the CLEF evaluation forum (http://www.imageclef.org/) hosted a number of challenges over the last few years aiming to increase the accuracy of the identification, which is the main source of both the dataset and an inspiration for the project in general (http://www.imageclef.org/lifeclef/2015/plant)

<img src="example.jpg" alt="Example plant picture" style="width: 400px;"/>

---
### This notebook

In this notebook, I will be documenting my approach to solving the problem and showing the interim results.


### The Road Ahead

I have broke down the notebook into separate steps.  Feel free to use the links below to navigate the notebook.

* [Step 1](#step1): Import Datasets
* [Step 2](#step2): Detect Humans
* [Step 3](#step3): Detect Dogs

>**Note:** Code and Markdown cells can be executed using the **Shift + Enter** keyboard shortcut.  Markdown cells can be edited by double-clicking the cell to enter edit mode.

---
<a id='step1'></a>
## Step 1: Set up and Dataset import

### Import plant Dataset

In the code cell belows, we load the additional libraries for future code, set up global variables and import datasets of plant images. We populate a few variables through the use of the `load_files` function from the scikit-learn library:
- `train_files`, `valid_files`, `test_files` - numpy arrays containing file paths to images
- `train_targets`, `valid_targets`, `test_targets` - numpy arrays containing onehot-encoded classification labels 
- `dog_names` - list of string-valued dog breed names for translating labels

### Set up and globals

In the code cell below, we import all required modules used later on and set up a seed number for reproducibility

In [1]:
# load all required libraries
import numpy as np
import pandas as pd
# this is a library with various utils to help show the pictures etc
import project_utils
import random
import cv2 
import os               
import matplotlib.pyplot as plt  
import xml.etree.ElementTree as ET                
from tqdm import tqdm
from glob import glob
from sklearn.datasets import load_files       
from keras.utils import np_utils
from keras.preprocessing import image
from PIL import ImageFile 

%matplotlib inline 

random.seed(999) # this is required to ensure reproducibility

Using TensorFlow backend.


In [24]:
# Globals
ImageFile.LOAD_TRUNCATED_IMAGES = True 

BATCH_SIZE = 41  # tweak to your GPUs capacity
# the batch size needs to be in multiples of 2,3 and 41 as we have 91758 train files

IMG_HEIGHT = 224   # ResNetInceptionv2 & Xception like 299, ResNet50 & VGG like 224
IMG_WIDTH = IMG_HEIGHT
CHANNELS = 3
DIMENSIONS = (IMG_HEIGHT,IMG_WIDTH,CHANNELS)

### Import Datasets

In the code cell below, we import a dataset of images and corresponding XML files containing metadata, where the file paths are stored in the numpy arrays 'picture_files' and 'metadata_files'

This assumes we have the following structure:

├── data
│   ├── test
│   ├── train
│   ├── small_data_test
│   ├── test_tensor_file.npy
│   ├── train_tensor_file.npy
├── keras.best.h5
├── plant_recognition_project.ipynb

In [3]:
# define function to load function
# in both test and train folders we have a mixture or .jpg and .xml files
# we need to treat them separately

def load_dataset(path):
    path_pictures = path + "*.jpg"
    path_metadata = path + "*.xml"
    picture_files = np.array(glob(path_pictures))
    metadata_files = np.array(glob(path_metadata))
    return picture_files, metadata_files

In [4]:
# this is function to create a dictionary out of an individual xml file

def get_xml_metadata(file):
    pic_meta = {}
    pic_meta['file_name'] = os.path.basename(file)
    tree = ET.parse(file)
    root = tree.getroot()
    # create a dictionary with all metadata
    for child in root:
        pic_meta[child.tag] = child.text
    return pic_meta

In [5]:
#lets get all data from both the test and the train folders

train_path = './data/train/'
test_path = './data/test/'

train_images, train_metadata = load_dataset(train_path)
test_images, test_metadata = load_dataset(test_path)

# print statistics about the dataset
print('There are %d total train picture files.' % len(train_images))
print('There are %d total train metadata files. \n' % len(train_metadata))

print('There are %d total test picture files.' % len(test_images))
print('There are %d total test metadata files.' % len(test_metadata))

There are 91758 total train picture files.
There are 91758 total train metadata files. 

There are 21446 total test picture files.
There are 21446 total test metadata files.


In [6]:
# Let's take a look at the xml file structure
file_meta = get_xml_metadata(train_metadata[10])
file_meta

{'Author': 'liliane roubaudi',
 'ClassId': '30052',
 'Content': 'Flower',
 'Date': '2013-8-13',
 'Family': 'Convolvulaceae',
 'Genus': 'Convolvulus',
 'ImageId2014': '11652',
 'Latitude': None,
 'LearnTag': 'Train',
 'Location': 'Nantua',
 'Longitude': None,
 'MediaId': '37007',
 'ObservationId': '18801',
 'ObservationId2014': '16074',
 'Species': 'Convolvulus arvensis L.',
 'Vote': '4',
 'YearInCLEF': 'PlantCLEF2014',
 'file_name': '37007.xml'}

In [10]:
# So, the plant species is captured in the 'Species' field
#lets get all species names for both test and train metadata files

def get_train_species(train_metadata):
    
    try:
        train_species = np.load("./data/train_species.npy")
        print('loading from ./data/train_species.npy ')
        
    except:
        
        print('getting from train_metadata')
        train_species = []
        for file in train_metadata:
            metadata_inf = get_xml_metadata(file)
            train_species.append(metadata_inf['Species'])
        
            #it does take a fair amount of time to do the pre-processing, 
            #so I will save the files on the disk to save time should something go wrong
        train_species_file = './data/train_species'
        np.save(train_species_file,train_species,allow_pickle=True)
    return train_species

def get_test_species(test_metadata):
    
    try:
        
        train_species = np.load("./data/test_species.npy")
        print('loading from ./data/test_species.npy ')
        
    except:
        
        print('getting from test_metadata')
        test_species = []
        for file in test_metadata:
            metadata_inf = get_xml_metadata(file)
            test_species.append(metadata_inf['Species'])
        
            #it does take a fair amount of time to do the pre-processing, 
            #so I will save the files on the disk to save time should something go wrong
        test_species_file = './data/test_species'
        np.save(test_species_file,test_species,allow_pickle=True)
    return test_species

In [None]:
train_species = get_train_species(train_metadata)
test_species = get_test_species(test_metadata)

In [None]:
# let's see how many unique labels we have
test_species_unique = np.unique(test_species)
train_species_unique = np.unique(train_species)

print('There are %d unique species in the train files.' % len(train_species_unique))
print('There are %d unique species in the test files.' % len(test_species_unique))

In [None]:
#let's see if there are any species in the test data not presents in the train dataset
np.setdiff1d(test_species_unique,train_species_unique)

In [None]:
Great, all test species are present in the train data

In [12]:
# later we will need to have one-hot encoded values of the labels
# we will use sklearn to trasfer non-integer list to an integer array
# we can do an inverse transfer later as well
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(train_species)
train_to_integer = le.transform(train_species)
test_to_integer = le.transform(test_species)

In [13]:
# how we can do one hot encoding 
# we know there are 1000 different species in the data set
Y_train = np_utils.to_categorical(train_to_integer, 1000)
Y_test = np_utils.to_categorical(test_to_integer, 1000)

### Data preprocessing

In this project, I will be Keras as a main library to work with CNNs and will be using TensorFlow as backend.
The below info is a reminder for me to get the right data in Keras:

Keras CNNs require a 4D array (tensor) as input, with shape:
$$
(\text{nb_samples}, \text{rows}, \text{columns}, \text{channels}),
$$

where `nb_samples` corresponds to the total number of images (or samples), and `rows`, `columns`, and `channels` correspond to the number of rows, columns, and channels for each image, respectively.  

The `path_to_tensor` function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN.  The function first loads the image and resizes it to a square image that is $ IMG\_HEIGHT \times IMG\_HEIGHT $ pixels. where IMG\_HEIGHT is defined as a global variable and is dictated by the choise of the CNN. Next, the image is converted to an array, which is then resized to a 4D tensor.  In this case, since we are working with color images, each image has three channels.  Likewise, since we are processing a single image (or sample), the returned tensor will always have shape

$$
(1, IMG\_HEIGHT, IMG\_HEIGHT, 3).
$$

The `paths_to_tensor` function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape 

$$
(\text{nb_samples}, IMG\_HEIGHT, IMG\_HEIGHT, 3).
$$

Here, `nb_samples` is the number of samples, or number of images, in the supplied array of image paths.  It is best to think of `nb_samples` as the number of 3D tensors (where each 3D tensor corresponds to a different image) in your dataset!

In [14]:
from keras.preprocessing import image                  
from tqdm import tqdm

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(IMG_HEIGHT, IMG_HEIGHT))
    # convert PIL.Image.Image type to 3D tensor with shape (IMG_HEIGHT, IMG_HEIGHT, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, IMG_HEIGHT, IMG_HEIGHT, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

In [15]:
# pre-process the test data for Keras
#but only if it hasn't been already done
def test_preproc():
    try:
        test_tensor = np.load("data/test_tensor_file.npy")
        print('load from data/test_tensor_file.npy ')
    except:
        test_tensors = paths_to_tensor(test_images).astype('float32')/255   
        #it does take a fair amount of time to do the pre-processing, 
        #so I will save the files on the disk to save time should something go wrong
        test_tensor_file = '/data/test_tensor_file'
        np.save(test_tensor_file,test_tensors,allow_pickle=True)
    return test_tensor

In [16]:
test_x = test_preproc()

---
<a id='step2'></a>
## Step 2: Initial data exploration

In the code cell below, we will take a look at the images we have, take a look at labels and at the metadata

In [None]:
#Let's see a couple of images

images = train_images[0:4]
labels = train_species[0:4]
project_utils.plot_images(images=images, cls_true=labels, smooth=False)

### Let's see how species are distributed
Here I will need to take a look at a number of counts/ distributions - TODO

In [None]:
# let's see how species are distributed
import pandas as pd

unique_train, counts_train = np.unique(train_species, return_counts=True)
unique_test, counts_test = np.unique(test_species, return_counts=True)

train_data_species = pd.DataFrame()
train_data_species['names'] = unique_train
train_data_species['counts'] = counts_train

test_data_species = pd.DataFrame()
test_data_species['names'] = unique_test
test_data_species['counts'] = counts_test


In [None]:
print(test_data_species.describe())
print(train_data_species.describe())
#import seaborn as sns
#sns.set(style="darkgrid")
#ax = sns.countplot(y="names", data=train_data_species)

In [None]:
# from the summary table we can see some species are more prevalent compared to others in both datasets
#there is a large difference between the 25% and 75% in both datasets

---
<a id='step2'></a>
## Step 2: Detect plants

In this section, we will use transfer learning based on ResNet50 CNN architecture.

In [17]:
import keras
from keras import  metrics, models, regularizers, optimizers, layers
from keras.applications import ResNet50 #, Xception, InceptionResNetV2
from keras.models import Sequential, Model 
from keras.layers import Dropout, Flatten, Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint , EarlyStopping

### Model Architecture

The model uses the the pre-trained ResNet50 model as a fixed feature extractor, where the last convolutional output of ResNet50 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

In [68]:
def get_model():
    
    # define the model
    base_model = ResNet50(input_shape=DIMENSIONS, weights='imagenet', include_top=False)

    # Freeze the layers which you don't want to train. Here I am freezing all of them
    for layer in base_model.layers[:4]:
        layer.trainable = False
    
    model_gen = models.Sequential()

    # Add the vgg convolutional base model
    model_gen.add(base_model)

    #Adding custom Layers 
    model_gen.add(layers.GlobalAveragePooling2D(name='avg_pool_2'))
    model_gen.add(layers.Dense(1000, activation='softmax'))
    
    # compile the model 
    model_gen.compile(
        loss='categorical_crossentropy',
        optimizer=optimizers.Adam(1e-3),
        metrics=['acc'])
    
    return model_gen

In [69]:
# the train dataset it too large to fit into the memory 
# so need to create a generator that will return batches of images instead
# also the generator need to perform online augmentation of the images

def generate_batches_from_train_folder(images_to_read, labels, batchsize = BATCH_SIZE):
    
    """
    Generator that returns batches of images ('xs') and labels ('ys') from the train folder
    :param string filepath: Full filepath of files to read - this needs to be a list of image files
    :param np.array: list of all labels for the images_to_read - those need to be one-hot-encoded
    :param int batchsize: Size of the batches that should be generated.
    :return: (ndarray, ndarray) (xs, ys): Yields a tuple which contains a full batch of images and labels.
    """
    
    dimensions = (BATCH_SIZE, IMG_HEIGHT, IMG_HEIGHT, 3) # pixels, three channels
    
    train_datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=20,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
 
    
    # needs to be on a infinite loop for the generator to work
    while 1:
        filesize = len(images_to_read)

        # count how many entries we have read
        n_entries = 0
        # as long as we haven't read all entries from the file: keep reading
        while n_entries < (filesize - batchsize):
            
            # start the next batch at index 0
            # create numpy arrays of input data (features) 
            # - this is already shaped as a tensor (output of the support function paths_to_tensor)
            xs = paths_to_tensor(images_to_read[n_entries : n_entries + batchsize])

            # and label info. Contains 1000 labels in my case for each possible plant species
            ys = labels[n_entries : n_entries + batchsize]

            # we have read one more batch from this file
            n_entries += batchsize
              
            #perform online augmentation on the xs and ys
            augmented_generator = train_datagen.flow(xs, ys, batch_size = batchsize)
            
            yield  augmented_generator.next()


In [78]:
from sklearn.model_selection import train_test_split, StratifiedKFold
fold_full_set = list(StratifiedKFold(n_splits=5, shuffle=True, random_state=1).split(train_images, train_to_integer))

In [88]:
fold_test_func = list(StratifiedKFold(n_splits=3, shuffle=True, random_state=1).split(train_images[5000:35000], train_to_integer[5000:35000]))

In [89]:
#train the model on each fold
folds = fold_test_func

for j, (train_idx, val_idx) in enumerate(folds):
    
    print('\nFold ',j)
    X_train_cv_path = train_images[train_idx]
    y_train_cv = Y_train[train_idx,:]
   
    X_valid_cv = train_images[val_idx]
    y_valid_cv= Y_train[val_idx]
    
    name_weights = "final_model_fold" + str(j) + "_weights.h5"
    
    #saves the model weights after each epoch if the validation loss decreased
    checkpoint = ModelCheckpoint(name_weights, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
    early = EarlyStopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')

    callbacks = [checkpoint,early]
    
    test_generator = generate_batches_from_train_folder(
        images_to_read = X_train_cv_path, 
        labels = y_train_cv, 
        batchsize = BATCH_SIZE)
    
    print('generating validation data...')
    x_valid_cv = paths_to_tensor(X_valid_cv) # this takes paths to images and return the np array
    valid_datagen = ImageDataGenerator(rescale=1./255)
    valid_generator = valid_datagen.flow(x_valid_cv, y_valid_cv, batch_size=BATCH_SIZE)
    
    model = get_model()
    
    model.fit_generator(
                test_generator,
                steps_per_epoch = len(X_train_cv_path)/BATCH_SIZE,
                epochs = 5,
                verbose = 1,
                validation_data = valid_generator,
                validation_steps= len(X_valid_cv)//BATCH_SIZE,
                callbacks = callbacks)
    
    print(model.evaluate(valid_x_cv, y_valid_cv))


Fold  0


  0%|          | 3/10330 [00:00<06:57, 24.74it/s]

generating validation data...


100%|██████████| 10330/10330 [03:44<00:00, 45.98it/s]
 12%|█▏        | 5/41 [00:00<00:00, 49.96it/s]

Epoch 1/5


100%|██████████| 41/41 [00:00<00:00, 41.54it/s]
100%|██████████| 41/41 [00:01<00:00, 18.28it/s]
100%|██████████| 41/41 [00:00<00:00, 47.86it/s]
100%|██████████| 41/41 [00:00<00:00, 42.63it/s]
100%|██████████| 41/41 [00:00<00:00, 42.07it/s]
100%|██████████| 41/41 [00:01<00:00, 23.09it/s]
100%|██████████| 41/41 [00:01<00:00, 23.05it/s]
100%|██████████| 41/41 [00:02<00:00, 17.36it/s]
100%|██████████| 41/41 [00:02<00:00, 15.90it/s]
100%|██████████| 41/41 [00:02<00:00, 20.51it/s]
100%|██████████| 41/41 [00:02<00:00, 19.90it/s]


  1/479 [..............................] - ETA: 19727s - loss: 7.1736 - acc: 0.0000e+00

100%|██████████| 41/41 [00:01<00:00, 21.90it/s]


  2/479 [..............................] - ETA: 16811s - loss: 7.3918 - acc: 0.0000e+00

100%|██████████| 41/41 [00:01<00:00, 22.48it/s]


  3/479 [..............................] - ETA: 15637s - loss: 7.6054 - acc: 0.0081    

100%|██████████| 41/41 [00:01<00:00, 23.20it/s]


  4/479 [..............................] - ETA: 15042s - loss: 7.6535 - acc: 0.0061

100%|██████████| 41/41 [00:01<00:00, 22.65it/s]


  5/479 [..............................] - ETA: 14619s - loss: 7.5854 - acc: 0.0049

100%|██████████| 41/41 [00:01<00:00, 24.04it/s]


  6/479 [..............................] - ETA: 14333s - loss: 7.6046 - acc: 0.0041

100%|██████████| 41/41 [00:01<00:00, 23.83it/s]


  7/479 [..............................] - ETA: 14106s - loss: 7.5482 - acc: 0.0035

100%|██████████| 41/41 [00:01<00:00, 23.70it/s]


  8/479 [..............................] - ETA: 13937s - loss: 7.5139 - acc: 0.0030

100%|██████████| 41/41 [00:01<00:00, 22.58it/s]


  9/479 [..............................] - ETA: 13803s - loss: 7.5089 - acc: 0.0054

100%|██████████| 41/41 [00:01<00:00, 23.47it/s]


 10/479 [..............................] - ETA: 13688s - loss: 7.4861 - acc: 0.0049

100%|██████████| 41/41 [00:01<00:00, 23.78it/s]


 11/479 [..............................] - ETA: 13586s - loss: 7.4829 - acc: 0.0044

100%|██████████| 41/41 [00:01<00:00, 20.20it/s]


 12/479 [..............................] - ETA: 13497s - loss: 7.4701 - acc: 0.0041

100%|██████████| 41/41 [00:01<00:00, 24.50it/s]


 13/479 [..............................] - ETA: 13420s - loss: 7.4450 - acc: 0.0038

100%|██████████| 41/41 [00:01<00:00, 24.27it/s]


 14/479 [..............................] - ETA: 13353s - loss: 7.4364 - acc: 0.0035

100%|██████████| 41/41 [00:01<00:00, 23.16it/s]


 15/479 [..............................] - ETA: 13290s - loss: 7.4351 - acc: 0.0033

100%|██████████| 41/41 [00:01<00:00, 23.92it/s]


 16/479 [>.............................] - ETA: 13231s - loss: 7.4298 - acc: 0.0030

100%|██████████| 41/41 [00:01<00:00, 23.56it/s]


 17/479 [>.............................] - ETA: 13174s - loss: 7.4154 - acc: 0.0029

100%|██████████| 41/41 [00:01<00:00, 26.26it/s]


 18/479 [>.............................] - ETA: 13119s - loss: 7.4342 - acc: 0.0027

100%|██████████| 41/41 [00:01<00:00, 25.61it/s]


 19/479 [>.............................] - ETA: 13066s - loss: 7.4315 - acc: 0.0026

100%|██████████| 41/41 [00:01<00:00, 23.31it/s]


 20/479 [>.............................] - ETA: 13021s - loss: 7.4539 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 22.56it/s]


 21/479 [>.............................] - ETA: 12975s - loss: 7.4553 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 22.62it/s]


 22/479 [>.............................] - ETA: 12935s - loss: 7.4524 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 22.74it/s]


 23/479 [>.............................] - ETA: 12896s - loss: 7.4436 - acc: 0.0021

100%|██████████| 41/41 [00:01<00:00, 24.25it/s]


 24/479 [>.............................] - ETA: 12854s - loss: 7.4597 - acc: 0.0020

100%|██████████| 41/41 [00:01<00:00, 23.09it/s]


 25/479 [>.............................] - ETA: 12815s - loss: 7.4569 - acc: 0.0020

100%|██████████| 41/41 [00:01<00:00, 25.24it/s]


 26/479 [>.............................] - ETA: 12778s - loss: 7.4543 - acc: 0.0019

100%|██████████| 41/41 [00:01<00:00, 23.91it/s]


 27/479 [>.............................] - ETA: 12741s - loss: 7.4477 - acc: 0.0018

100%|██████████| 41/41 [00:01<00:00, 23.42it/s]


 28/479 [>.............................] - ETA: 12703s - loss: 7.4469 - acc: 0.0017

100%|██████████| 41/41 [00:01<00:00, 21.86it/s]


 29/479 [>.............................] - ETA: 12665s - loss: 7.4429 - acc: 0.0017

100%|██████████| 41/41 [00:01<00:00, 23.39it/s]


 30/479 [>.............................] - ETA: 12629s - loss: 7.4301 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 22.60it/s]


 31/479 [>.............................] - ETA: 12595s - loss: 7.4279 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 22.37it/s]


 32/479 [=>............................] - ETA: 12591s - loss: 7.4209 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 21.91it/s]


 33/479 [=>............................] - ETA: 12556s - loss: 7.4144 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 25.37it/s]


 34/479 [=>............................] - ETA: 12521s - loss: 7.4177 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 22.43it/s]


 35/479 [=>............................] - ETA: 12486s - loss: 7.4115 - acc: 0.0021

100%|██████████| 41/41 [00:01<00:00, 23.58it/s]


 36/479 [=>............................] - ETA: 12453s - loss: 7.4103 - acc: 0.0020

100%|██████████| 41/41 [00:01<00:00, 24.53it/s]


 37/479 [=>............................] - ETA: 12421s - loss: 7.4098 - acc: 0.0020

100%|██████████| 41/41 [00:01<00:00, 22.69it/s]


 38/479 [=>............................] - ETA: 12388s - loss: 7.4083 - acc: 0.0019

100%|██████████| 41/41 [00:01<00:00, 25.48it/s]


 39/479 [=>............................] - ETA: 12357s - loss: 7.3994 - acc: 0.0019

100%|██████████| 41/41 [00:01<00:00, 24.01it/s]


 40/479 [=>............................] - ETA: 12324s - loss: 7.3931 - acc: 0.0018

100%|██████████| 41/41 [00:01<00:00, 23.39it/s]


 41/479 [=>............................] - ETA: 12292s - loss: 7.3888 - acc: 0.0018

100%|██████████| 41/41 [00:01<00:00, 24.38it/s]


 42/479 [=>............................] - ETA: 12285s - loss: 7.3892 - acc: 0.0017

100%|██████████| 41/41 [00:02<00:00, 17.80it/s]


 43/479 [=>............................] - ETA: 12272s - loss: 7.3839 - acc: 0.0017

100%|██████████| 41/41 [00:02<00:00, 17.49it/s]


 44/479 [=>............................] - ETA: 12277s - loss: 7.3860 - acc: 0.0017

100%|██████████| 41/41 [00:01<00:00, 23.01it/s]


 45/479 [=>............................] - ETA: 12276s - loss: 7.3854 - acc: 0.0016

100%|██████████| 41/41 [00:02<00:00, 19.65it/s]


 46/479 [=>............................] - ETA: 12266s - loss: 7.3821 - acc: 0.0021

100%|██████████| 41/41 [00:01<00:00, 24.14it/s]


 47/479 [=>............................] - ETA: 12257s - loss: 7.3843 - acc: 0.0021

100%|██████████| 41/41 [00:02<00:00, 20.37it/s]


 48/479 [==>...........................] - ETA: 12244s - loss: 7.3768 - acc: 0.0020

100%|██████████| 41/41 [00:02<00:00, 16.97it/s]


 49/479 [==>...........................] - ETA: 12263s - loss: 7.3780 - acc: 0.0020

100%|██████████| 41/41 [00:01<00:00, 22.15it/s]


 50/479 [==>...........................] - ETA: 12248s - loss: 7.3715 - acc: 0.0020

100%|██████████| 41/41 [00:02<00:00, 19.63it/s]


 51/479 [==>...........................] - ETA: 12260s - loss: 7.3727 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 22.57it/s]


 52/479 [==>...........................] - ETA: 12247s - loss: 7.3682 - acc: 0.0023

100%|██████████| 41/41 [00:02<00:00, 17.29it/s]


 53/479 [==>...........................] - ETA: 12253s - loss: 7.3591 - acc: 0.0023

100%|██████████| 41/41 [00:02<00:00, 16.36it/s]


 54/479 [==>...........................] - ETA: 12267s - loss: 7.3537 - acc: 0.0027

100%|██████████| 41/41 [00:02<00:00, 19.21it/s]


 55/479 [==>...........................] - ETA: 12274s - loss: 7.3476 - acc: 0.0027

100%|██████████| 41/41 [00:02<00:00, 20.35it/s]


 56/479 [==>...........................] - ETA: 12280s - loss: 7.3434 - acc: 0.0026

100%|██████████| 41/41 [00:02<00:00, 17.27it/s]


 57/479 [==>...........................] - ETA: 12286s - loss: 7.3387 - acc: 0.0026

100%|██████████| 41/41 [00:02<00:00, 16.47it/s]


 58/479 [==>...........................] - ETA: 12283s - loss: 7.3351 - acc: 0.0025

100%|██████████| 41/41 [00:02<00:00, 20.34it/s]


 59/479 [==>...........................] - ETA: 12273s - loss: 7.3336 - acc: 0.0025

100%|██████████| 41/41 [00:01<00:00, 21.25it/s]


 60/479 [==>...........................] - ETA: 12264s - loss: 7.3333 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 21.85it/s]


 61/479 [==>...........................] - ETA: 12254s - loss: 7.3267 - acc: 0.0024

100%|██████████| 41/41 [00:02<00:00, 19.65it/s]


 62/479 [==>...........................] - ETA: 12245s - loss: 7.3280 - acc: 0.0024

100%|██████████| 41/41 [00:02<00:00, 20.20it/s]


 63/479 [==>...........................] - ETA: 12237s - loss: 7.3272 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 21.39it/s]


 64/479 [===>..........................] - ETA: 12235s - loss: 7.3242 - acc: 0.0023

100%|██████████| 41/41 [00:02<00:00, 19.24it/s]


 65/479 [===>..........................] - ETA: 12234s - loss: 7.3219 - acc: 0.0023

100%|██████████| 41/41 [00:02<00:00, 17.05it/s]


 66/479 [===>..........................] - ETA: 12234s - loss: 7.3202 - acc: 0.0022

100%|██████████| 41/41 [00:02<00:00, 18.34it/s]


 67/479 [===>..........................] - ETA: 12231s - loss: 7.3152 - acc: 0.0025

100%|██████████| 41/41 [00:02<00:00, 19.00it/s]


 68/479 [===>..........................] - ETA: 12224s - loss: 7.3127 - acc: 0.0025

100%|██████████| 41/41 [00:01<00:00, 21.83it/s]


 69/479 [===>..........................] - ETA: 12210s - loss: 7.3112 - acc: 0.0025

100%|██████████| 41/41 [00:01<00:00, 23.55it/s]


 70/479 [===>..........................] - ETA: 12172s - loss: 7.3074 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 24.71it/s]


 71/479 [===>..........................] - ETA: 12132s - loss: 7.3024 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 22.45it/s]


 72/479 [===>..........................] - ETA: 12098s - loss: 7.3002 - acc: 0.0024

100%|██████████| 41/41 [00:01<00:00, 27.08it/s]


 73/479 [===>..........................] - ETA: 12071s - loss: 7.2971 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 22.03it/s]


 74/479 [===>..........................] - ETA: 12041s - loss: 7.2974 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 16.38it/s]


 75/479 [===>..........................] - ETA: 12011s - loss: 7.2955 - acc: 0.0023

100%|██████████| 41/41 [00:01<00:00, 23.88it/s]


 76/479 [===>..........................] - ETA: 11983s - loss: 7.2959 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 23.10it/s]


 77/479 [===>..........................] - ETA: 11949s - loss: 7.2940 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 21.97it/s]


 78/479 [===>..........................] - ETA: 11922s - loss: 7.2917 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 22.96it/s]


 79/479 [===>..........................] - ETA: 11896s - loss: 7.2886 - acc: 0.0022

100%|██████████| 41/41 [00:01<00:00, 20.90it/s]


 80/479 [====>.........................] - ETA: 11863s - loss: 7.2849 - acc: 0.0021

100%|██████████| 41/41 [00:01<00:00, 22.96it/s]


 81/479 [====>.........................] - ETA: 11833s - loss: 7.2815 - acc: 0.0021

100%|██████████| 41/41 [00:01<00:00, 18.33it/s]


KeyboardInterrupt: 

In [23]:
# settings for the model
n_of_train_samples = len(train_images)
epochs = 10
n_of_val_samples = len(test_images)

In [43]:
# Train the model 

#model_final.load_weights(BEST_MODEL)

"""
model_final.compile(
    optimizer=optimizers.Adam(lr=1e-3,),
    loss='categorical_crossentropy',
    metrics=['acc'])

"""

model_final = model_final.fit_generator(
    generate_batches_from_train_folder(images_to_read = train_images, labels = Y_train, batchsize = BATCH_SIZE),
    steps_per_epoch= n_of_train_samples//BATCH_SIZE,
    epochs= epochs,
    validation_data=test_generator,
    validation_steps= n_of_val_samples//BATCH_SIZE,
    verbose=1,
    callbacks = [checkpoint, early])


 10%|▉         | 4/41 [00:00<00:00, 38.76it/s]

Epoch 1/10


100%|██████████| 41/41 [00:00<00:00, 42.14it/s]
100%|██████████| 41/41 [00:01<00:00, 25.11it/s]
100%|██████████| 41/41 [00:01<00:00, 22.93it/s]
100%|██████████| 41/41 [00:02<00:00, 17.90it/s]
100%|██████████| 41/41 [00:02<00:00, 20.14it/s]
100%|██████████| 41/41 [00:01<00:00, 22.62it/s]
100%|██████████| 41/41 [00:01<00:00, 24.05it/s]
100%|██████████| 41/41 [00:01<00:00, 24.76it/s]
100%|██████████| 41/41 [00:01<00:00, 23.33it/s]
100%|██████████| 41/41 [00:01<00:00, 22.15it/s]
100%|██████████| 41/41 [00:01<00:00, 20.24it/s]


   1/2238 [..............................] - ETA: 72935s - loss: 7.2940 - acc: 0.0732

100%|██████████| 41/41 [00:01<00:00, 24.73it/s]


   2/2238 [..............................] - ETA: 67948s - loss: 7.1327 - acc: 0.0488

100%|██████████| 41/41 [00:01<00:00, 24.39it/s]


KeyboardInterrupt: 

In [None]:
model_final.save('cnn50_4_layers_1_epoch.h5')

Need to find how to do cross validations as well