## Problem Description

Food security is threatened by a number of factors 

1. Climate change (Tai et al., 2014), 
2. The decline in pollinators (Report of the Plenary of the Intergovernmental Science-PolicyPlatform on Biodiversity Ecosystem and Services on the work of its fourth session, 2016), 
3. Plant diseases (Strange and Scott, 2005), and others. 

Plant diseases normaly have disastrous consequences for smallholder farmers whose livelihoods depend on healthy crops. More than 80 percent of the agricultural production is generated by smallholder farmers (UNEP, 2013), and reports of yield loss of more than 50% due to pests and diseases are common (Harvey et al., 2014). 


## Dataset

https://github.com/spMohanty/PlantVillage-Dataset

The different versions of the dataset are present in the raw directory :

- color : Original RGB images
- grayscale : grayscaled version of the raw images
- segmented : RGB images with just the leaf segmented and color corrected.

We will be using the diseases of Tomoto plants for our case.


### Tomato Diseases and their descriptions

https://www.thespruce.com/tomato-leaf-diseases-1403409

## Load libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os

## Download Data

In [None]:
#!pip install --upgrade gdown

In [None]:
!gdown 1JFZDEqRcReAjJiqCxz-W4707ipDEFe6i -O trainset.zip

In [None]:
# Uncomment the following lines if you have downloaed older versions of data already
#!rm tomatoes.zip
#!rm -r tomatoes

In [None]:
!ls -al

In [None]:
!unzip -q trainset.zip -d trainset/

In [None]:
!ls -al trainset/

## Load data

In [None]:
image_dir = '/content/trainset/MultiClassDataDemo/'

In [None]:
def loadImageFiles(dir):
    files = [(dir + '/'+ f)
             for f in os.listdir(dir)
             if f.endswith('.jpg')]
    return files

In [None]:
def loadImages(files, n = 10):
    images = [cv2.imread(file) for file in files[0:n]]
    return images

In [None]:
healthy_images_dir = image_dir + 'Healthy/'
bacterial_spot_images_dir = image_dir + 'Bacterial_spot/'
healthy_images_files = loadImageFiles(healthy_images_dir)
bacterial_spot_images_files = loadImageFiles(bacterial_spot_images_dir)

In [None]:
# load all the images from name
healthy_images = loadImages(healthy_images_files)
bacterial_spot_images = loadImages(bacterial_spot_images_files)

## Data Pre-processing

### Healthy Leaf

In [None]:
fig, ax = plt.subplots(1, 5, figsize=(15, 8))

for i in range(5):
    ax[i].imshow(healthy_images[i])

In [None]:
fig, ax = plt.subplots(1, 5, figsize=(15, 8))

for i in range(5):
    ax[i].imshow(bacterial_spot_images[i+5])

## Splitting Train and Test

In [None]:
from keras.preprocessing.image import ImageDataGenerator

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(image_dir,
                                                    target_size=(256, 256),
                                                    batch_size=64,
                                                    class_mode='categorical')

In [None]:
train_generator.class_indices

In [None]:
image_shape = train_generator.image_shape

## Read the first batch

In [None]:
x_batch, y_batch = next(train_generator)

In [None]:
x_batch.shape

In [None]:
plt.imshow( x_batch[0] );
plt.grid(False)    
plt.show();

## Build the model for Classifying all tomato categories 

In [None]:
from tensorflow import keras

In [None]:
# import necessary building blocks
from keras.models import Sequential, Model
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout, Input, ReLU

### Model: Convolution Neural Network

** Architecture **
- Conv -> Maxpool -> Conv-> Maxpool -> Maxpool -> Conv -> Maxpool -> Dense(256) -> Dense(64) -> Dense(10) -> Softmax

** Optimizer **

- Adam
- Batch size = 128
- Epoch = 20

In [None]:
#tf.keras.backend.clear_session()

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3,3), strides=1, padding='same', input_shape=image_shape))
model.add(ReLU())
                            
model.add(MaxPooling2D(pool_size=(3, 3)))

model.add(Conv2D(filters=32, kernel_size=(3,3), strides=1, padding='same', input_shape=image_shape))
model.add(ReLU())
                            
model.add(MaxPooling2D(pool_size=(3, 3)))

model.add(Conv2D(filters=64, 
                 kernel_size=(3,3), 
                 strides=1, 
                 padding='same', 
                 input_shape=image_shape))
model.add(ReLU())
                            
model.add(MaxPooling2D(pool_size=(3, 3)))

model.add(Flatten())
    
model.add(Dense(256))
model.add(ReLU())

model.add(Dense(64))
model.add(ReLU())

model.add(Dense(5))
model.add(Activation('softmax'))

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])

EPOCHS = 10

history = model.fit_generator(train_generator,
                              steps_per_epoch=30,
                              epochs=EPOCHS,
                              validation_data=train_generator,
                              validation_steps=5)

### Function: train and test accuracy plot

In [None]:
def plot_train_val_accuracy(hist):
    plt.plot(hist['accuracy'])
    plt.plot(hist['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

### Function: train and test loss plot

In [None]:
def plot_train_val_loss(hist):
    plt.plot(hist['loss'])
    plt.plot(hist['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

In [None]:
plot_train_val_accuracy(history.history)

In [None]:
plot_train_val_loss(history.history)

### Making Model Predictions

In [None]:
# Uncomment if you have already downloaded testset before
#!rm testset.zip
#!rm -r testset

In [None]:
!gdown 1T1r6If53dbS98Y5q1I2M7Q4kdVruTCcP -O testset.zip

In [None]:
!unzip -q testset.zip -d testset/

In [None]:
from keras.utils import image_dataset_from_directory

In [None]:
test_dir = '/content/testset/testset'

testdata = image_dataset_from_directory(test_dir,
                                        shuffle = True,
                                        image_size=(256, 256),
                                        labels="inferred",
                                        batch_size = 512,
                                        label_mode="int")

In [None]:
testdata.class_names

In [None]:
for image_batch, labels_batch in testdata:
  print(image_batch.shape)
  print(labels_batch.shape)
  break

In [None]:
image_batch = image_batch/255.0

In [None]:
y_prob_test = model.predict(image_batch)

In [None]:
y_prob_test[0:10]

In [None]:
y_test = labels_batch.numpy()

In [None]:
y_pred = np.argmax(y_prob_test, axis=1)

In [None]:
np_label_names = list(train_generator.class_indices.keys())

In [None]:
np_label_names

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

In [None]:
cm = confusion_matrix(y_test, y_pred)

In [None]:
plt.figure(figsize=(12, 10))
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=np_label_names)
fig, ax = plt.subplots(figsize=(8,6))
# Deactivate default colorbar
disp.plot(ax=ax, colorbar=True);