<a id="header"></a>
<center><p style="background:#DFDFDF; font-family:Arial; font-weight:bold; font-size:250%; color:black; text-align:center; width:100%; padding:50px">Rice Classification with CNNs</p></center>

<a id="import"></a>
<center><p style="background:#DFDFDF url('pylogo.jpg') no-repeat; font-family:Courier; font-size:200%; color:black; text-align:center; width:80%; padding:30px">Importing Libraries</p></center>

In [8]:
# Linear Algebra
import numpy as np

# Model Building + Training
import tensorflow as tf

# Clearing Memory
from keras.backend import set_session
from keras.backend import clear_session
from keras.backend import get_session
import gc

# Operating system
import os

In [2]:
# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it does something you should see a number as output

    # use the same config as you used to create the session
    config = tf.compat.v1.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tf.compat.v1.Session(config=config))

<a id="import"></a>
<center><p style="background:#DFDFDF url('pylogo.jpg') no-repeat; font-family:Courier; font-size:200%; color:black; text-align:center; width:80%; padding:30px">Train-Val-Test Split</p></center>

The split-folders library split folders with files (e.g. images) into train, validation and test (dataset) folders from this format: 
<pre>

input/
    class1/
        img1.jpg
        img2.jpg
        ...
    class2/
        imgWhatever.jpg
        ...
    ...

</pre>

Into this format:

<pre>

output/
    train/
        class1/
            img1.jpg
            ...
        class2/
            imga.jpg
            ...
    val/
        class1/
            img2.jpg
            ...
        class2/
            imgb.jpg
            ...
    test/
        class1/
            img3.jpg
            ...
        class2/
            imgc.jpg
            ...

</pre>

In [3]:
# Adjust base directory according to your file path
base_dir = "Rice_Image_Dataset\\"

In [4]:
!pip install split-folders

Collecting split-folders
  Downloading split_folders-0.5.1-py3-none-any.whl (8.4 kB)
Installing collected packages: split-folders
Successfully installed split-folders-0.5.1


In [5]:
import splitfolders

# Splits all images into Train, Test Validation
splitfolders.ratio(base_dir, output="output",seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False)

In [9]:
output_dir = "output\\"

# Setting the directories for our Train, val and test folders
train_dir = os.path.join(output_dir, "train")
print("Train directory: ", train_dir)

val_dir = os.path.join(output_dir, "val")
print("Validation directory: ", val_dir)

test_dir = os.path.join(output_dir, "test")
print("Test directory: ", test_dir)

Train directory:  output\train
Validation directory:  output\val
Test directory:  output\test


<a id="import"></a>
<center><p style="background:#DFDFDF url('pylogo.jpg') no-repeat; font-family:Courier; font-size:200%; color:black; text-align:center; width:80%; padding:30px">Loading our Data</p></center>

To load our data, we will be using Tensorflow's ImageDataGenerator. ImageDataGenerator allows to quickly set up Python generators that can automatically turn image files on disk into batches of pre-processed tensors.

ImageDataGenerator provides options for adjusting your data, including data augmentation, resizing images, downscaling, and more.

Read more [here](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator).

Initially, we will assess the model's performance using our original data, without applying data augmentation. Therefore, our focus will be on downscaling the data.

In [17]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Load 2 instances of ImageDataGenerator. One for our train set, and one for test set
# We do not need one for Validation, as it works the same way as our test set
train_datagen = ImageDataGenerator(rescale =1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

"""
The flow_from_directory function accepts a directory path and generates data batches. 
It provides options to resize images, specify the color mode, set the class_mode (e.g., binary or categorical), 
define the batch size, and etc.

"""
train_generator = train_datagen.flow_from_directory(train_dir, # Train Directory
                                                    target_size=(250, 250), # Image size 250x250px
                                                    color_mode='rgb', # 3 channels: R, G, B
                                                    class_mode='categorical', # 5 classes, so categorical
                                                    batch_size=32) # Set 32 as our batch size

# Repeat process for validation set
validation_generator = test_datagen.flow_from_directory(val_dir,
                                                        target_size=(250, 250), 
                                                        color_mode='rgb', 
                                                        class_mode='categorical',
                                                        batch_size=32) 

Found 60000 images belonging to 5 classes.
Found 7500 images belonging to 5 classes.


In [18]:
for data_batch, labels_batch in train_generator:
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)
    

data batch shape: (32, 250, 250, 3)
labels batch shape: (32, 5)
