<h1> Cassava Leaf Disease Classification </h1>
<h2>Identify the type of disease present on a Cassava Leaf image </h2>
<img src  = "https://cookingwithoutborders.files.wordpress.com/2011/12/cassava-leaves.jpg"> <br>

<i> Cassava is a root vegetable. It is the underground part of the cassava shrub, which has the Latin name Manihot esculenta. Like potatoes and yams, it is a tuber crop. Cassava roots have a similar shape to sweet potatoes.

People can also eat the leaves of the cassava plant. Humans living along the banks of the Amazon River in South America grew and consumed cassava hundreds of years before Christopher Columbus first voyaged there.

Today, more than 80 countries throughout the tropics grow cassava, and it is a primary component of the diet of more than 800 million people around the world. It is popular because it is a hardy crop that is resistant to drought and does not require much fertilizer, although it is vulnerable to bacterial and viral diseases. </i>

<h3> Usage </h3>
<i>Cassava is a rich, affordable source of carbohydrates. It can provide more calories per acre of the crop than other cereals, which makes it a very useful crop in the developing world.

People prepare and eat cassava in various ways in different parts of the world, with baking and boiling being the most common methods. In some places, people ferment cassava before using it.

It is essential to peel cassava and never eat it raw. It contains dangerous levels of cyanide unless a person cooks it thoroughly before eating it.</i>

Dishes that people can make using cassava include:

* bread, which can contain cassava flour only, or both cassava and wheat flour
* French fries
* mashed cassava
* cassava chips
* cassava bread soaked in coconut milk
* cassava cake
* cassava in coconut sauce
* yuca con mojo, a Cuban dish that combines cassava with a sauce comprising citrus juices, garlic, onion, cilantro, cumin, and oregano

In addition to eating cassava, people also use it for:

* making tapioca, which is a common dessert food
* making starch and flour products, which people can use to make gluten-free bread
* feeding animals
* making medications, fabrics, paper, and building materials, such as plywood.
Scientists may eventually be able to replace high-fructose corn syrup with cassava starch. Researchers are also hoping that cassava could be a source of the alcohol that manufacturers use to make polystyrene, PVC, and other industrial products. <br>
---> [Ref here](https://www.medicalnewstoday.com/articles/323756)

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import os
import json
import cv2
import itertools

import matplotlib.pyplot as plt
%matplotlib inline
from tqdm import tqdm
from PIL import Image

In [None]:
import keras

import tensorflow as tf
from tensorflow.python.keras import backend as K
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,classification_report


from keras.models import Sequential, Model,load_model
from keras.preprocessing.image import ImageDataGenerator,load_img, img_to_array
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Input, Flatten,BatchNormalization,Activation
from keras.layers import GlobalMaxPooling2D
from keras.models import Model
from keras.optimizers import Adam, SGD, RMSprop
from keras.callbacks import ModelCheckpoint, Callback, EarlyStopping,ReduceLROnPlateau
from keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.utils.vis_utils import plot_model

In [None]:
# Defining data path
IMAGE_PATH = "../input/cassava-leaf-disease-classification/"
WORK_DIR = './'

train_df = pd.read_csv('../input/cassava-leaf-disease-classification/train.csv')
submission_csv = pd.read_csv(os.path.join(IMAGE_PATH, "sample_submission.csv"))


#Training data
print('Training data shape: ', train_df.shape)
train_df.head(5)


In [None]:
with open(os.path.join('../input/cassava-leaf-disease-classification', "label_num_to_disease_map.json")) as file:
    map_classes = json.loads(file.read())
    
print(json.dumps(map_classes))

In [None]:
train_df['Classes'] = train_df['label'].astype(str).map(map_classes)

In [None]:
# Total number of images in the dataset(train+test)
print("Total images in Train set: ",train_df['image_id'].count())

In [None]:
# Null values and Data types
print('Train Set')
print(train_df.info())
print('-------------')

In [None]:
plt.figure(figsize=(17, 5))
sns.countplot("Classes", data=train_df);

In [None]:
images = train_df['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display Random Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
CBB = train_df[train_df['Classes']=='Cassava Bacterial Blight (CBB)']
healthy = train_df[train_df['Classes']=='Healthy']
CBSD = train_df[train_df['Classes']=='Cassava Brown Streak Disease (CBSD)']
CMD = train_df[train_df['Classes']=='Cassava Mosaic Disease (CMD)']
CGM = train_df[train_df['Classes']=='Cassava Green Mottle (CGM)']


In [None]:
images = CBB['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display CBB Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
images = healthy['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display Healthy Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
images = CBSD['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display CBSD Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
images = CMD['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display CMD Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
images = CGM['image_id'].values

# Extract 9 random images from it
random_images = [np.random.choice(images) for i in range(9)]

# Location of the image dir
img_dir = IMAGE_PATH+'/train_images'

print('Display CGM Images')

# Adjust the size of your images
plt.figure(figsize=(10,8))

# Iterate and plot random images
for i in range(9):
    plt.subplot(3, 3, i + 1)
    img = plt.imread(os.path.join(img_dir, random_images[i]))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
# Adjust subplot parameters to give specified padding
plt.tight_layout()

In [None]:
f = plt.figure(figsize=(16,8))
f.add_subplot(1,2, 1)

sample_img = train_df['image_id'][0]
raw_image = plt.imread(os.path.join(img_dir, sample_img))
plt.imshow(raw_image, cmap='gray')
plt.colorbar()
plt.title('Image')
print(f"Image dimensions:  {raw_image.shape[0],raw_image.shape[1]}")
print(f"Maximum pixel value : {raw_image.max():.1f} ; Minimum pixel value:{raw_image.min():.1f}")
print(f"Mean value of the pixels : {raw_image.mean():.1f} ; Standard deviation : {raw_image.std():.1f}")

f.add_subplot(1,2, 2)

#_ = plt.hist(raw_image.ravel(),bins = 256, color = 'orange',)
_ = plt.hist(raw_image[:, :, 0].ravel(), bins = 256, color = 'red', alpha = 0.5)
_ = plt.hist(raw_image[:, :, 1].ravel(), bins = 256, color = 'Green', alpha = 0.5)
_ = plt.hist(raw_image[:, :, 2].ravel(), bins = 256, color = 'Blue', alpha = 0.5)
_ = plt.xlabel('Intensity Value')
_ = plt.ylabel('Count')
_ = plt.legend(['Red_Channel', 'Green_Channel', 'Blue_Channel'])
plt.show()

In [None]:
f = plt.figure(figsize=(16,8))
f.add_subplot(1,2, 1)

sample_img = train_df['image_id'][1]
raw_image = plt.imread(os.path.join(img_dir, sample_img))
plt.imshow(raw_image, cmap='gray')
plt.colorbar()
plt.title('Image')
print(f"Image dimensions:  {raw_image.shape[0],raw_image.shape[1]}")
print(f"Maximum pixel value : {raw_image.max():.1f} ; Minimum pixel value:{raw_image.min():.1f}")
print(f"Mean value of the pixels : {raw_image.mean():.1f} ; Standard deviation : {raw_image.std():.1f}")

f.add_subplot(1,2, 2)

#_ = plt.hist(raw_image.ravel(),bins = 256, color = 'orange',)
_ = plt.hist(raw_image[:, :, 0].ravel(), bins = 256, color = 'red', alpha = 0.5)
_ = plt.hist(raw_image[:, :, 1].ravel(), bins = 256, color = 'Green', alpha = 0.5)
_ = plt.hist(raw_image[:, :, 2].ravel(), bins = 256, color = 'Blue', alpha = 0.5)
_ = plt.xlabel('Intensity Value')
_ = plt.ylabel('Count')
_ = plt.legend(['Red_Channel', 'Green_Channel', 'Blue_Channel'])
plt.show()

In [None]:
IMG_SIZE = 320
labels=[]
data=[]
for i in range(train_df.shape[0]):
    data.append(img_dir +'/'+ train_df['image_id'].iloc[i])
    labels.append(train_df['label'].iloc[i])
df=pd.DataFrame(data)
df.columns=['images']
df['target']=labels

In [None]:
X_train, X_val, y_train, y_val = train_test_split(df['images'],df['target'], test_size=0.2, random_state=1234)

train=pd.DataFrame(X_train)
train.columns=['images']
train['target']=y_train
train['target'] = train['target'].astype('string')


validation=pd.DataFrame(X_val)
validation.columns=['images']
validation['target']=y_val
validation['target'] = validation['target'].astype('string')


In [None]:
train_datagen = ImageDataGenerator(rotation_range=360,
                                width_shift_range=0.1,
                                height_shift_range=0.1,
                                brightness_range=[0.2,1.5],
                                shear_range=25,
                                zoom_range=0.3,
                                channel_shift_range=0.1,
                                horizontal_flip=True,
                                vertical_flip=True,
                                rescale=1/255,
                                validation_split=0.15)
test_datagen = ImageDataGenerator(rescale=1./255,validation_split = 0.2)
train_generator = train_datagen.flow_from_dataframe(train,x_col = 'images',y_col = 'target',target_size=(IMG_SIZE, IMG_SIZE),batch_size=32,class_mode='categorical')
validation_generator = test_datagen.flow_from_dataframe(validation,x_col = 'images',y_col = 'target',target_size=(IMG_SIZE, IMG_SIZE),batch_size=32,class_mode='categorical')

In [None]:
# Initialising the CNN
model = tf.keras.models.Sequential()

# Step 1 - Convolution
model.add(tf.keras.layers.Conv2D(filters=32,padding="same",kernel_size=3, activation='relu', input_shape=[IMG_SIZE, IMG_SIZE, 3]))

# Step 2 - Pooling
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

# Adding a second convolutional layer
model.add(tf.keras.layers.Conv2D(filters=64,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

# Adding a third convolutional layer
model.add(tf.keras.layers.Conv2D(filters=128,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))


model.add(tf.keras.layers.Conv2D(filters=256,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

# Adding a five convolutional layer
model.add(tf.keras.layers.Conv2D(filters=256,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

model.add(tf.keras.layers.Conv2D(filters=256,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))


# Adding a six convolutional layer


model.add(tf.keras.layers.Conv2D(filters=512,padding='same',kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

model.add(tf.keras.layers.Dropout(0.5))



# Step 3 - Flattening
model.add(tf.keras.layers.Flatten())

# Step 4 - Full Connection
model.add(tf.keras.layers.Dense(units=512, activation='relu'))

model.add(tf.keras.layers.Dropout(0.5))

model.add(tf.keras.layers.Dense(units=1024, activation='relu'))

model.add(tf.keras.layers.Dropout(0.5))


# Step 5 - Output Layer
model.add(tf.keras.layers.Dense(units=5, activation='softmax')) 

In [None]:
model.summary()
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
opt = Adam(lr=0.001)
model.compile(loss = 'categorical_crossentropy', metrics=['acc'],optimizer=opt)

In [None]:
nb_epochs = 100
batch_size=32
nb_train_steps = train.shape[0]//batch_size
nb_val_steps=validation.shape[0]//batch_size
print("Number of training and validation steps: {} and {}".format(nb_train_steps,nb_val_steps))

In [None]:
# Adapted  from - https://gist.github.com/swanandM/260f73ec7c89a2fb540e37169ba728bc
def plot_history(history):
    loss_list = [s for s in history.history.keys() if 'loss' in s and 'val' not in s]
    val_loss_list = [s for s in history.history.keys() if 'loss' in s and 'val' in s]
    acc_list = [s for s in history.history.keys() if 'acc' in s and 'val' not in s]
    val_acc_list = [s for s in history.history.keys() if 'acc' in s and 'val' in s]
    
    if len(loss_list) == 0:
        print('Loss is missing in history')
        return 
    
    ## As loss always exists
    epochs = range(1,len(history.history[loss_list[0]]) + 1)
    
    ## Loss
    plt.figure(1)
    for l in loss_list:
        plt.plot(epochs, history.history[l], 'b', label='Training loss (' + str(str(format(history.history[l][-1],'.5f'))+')'))
    for l in val_loss_list:
        plt.plot(epochs, history.history[l], 'g', label='Validation loss (' + str(str(format(history.history[l][-1],'.5f'))+')'))
    
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    
    ## Accuracy
    plt.figure(2)
    for l in acc_list:
        plt.plot(epochs, history.history[l], 'b', label='Training accuracy (' + str(format(history.history[l][-1],'.5f'))+')')
    for l in val_acc_list:    
        plt.plot(epochs, history.history[l], 'g', label='Validation accuracy (' + str(format(history.history[l][-1],'.5f'))+')')

    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.show()


In [None]:
cb=[EarlyStopping(patience = 5,verbose = 1,restore_best_weights = True),ReduceLROnPlateau(patience = 2, verbose = 1),ModelCheckpoint(filepath = WORK_DIR,save_best_only=True)]
history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_steps,
    epochs=nb_epochs,
    validation_data=validation_generator,
    callbacks=cb,
    validation_steps=nb_val_steps)

In [None]:
plot_history(history)

In [None]:
target=[]
for image_id in submission_csv.image_id:
    img=cv2.imread(str(IMAGE_PATH + "test_images/" + str(image_id)))
    img = cv2.resize(img, (320,320))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img=np.reshape(img,(1,320,320,3))
    target.append(np.argmax(model.predict(img)))


submission_csv['label']=target

In [None]:
submission_csv.to_csv('submission.csv', index = False)