Overview:

***As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.

Existing methods of disease detection require farmers to solicit the help of government-funded agricultural experts to visually inspect and diagnose the plants. This suffers from being labor-intensive, low-supply and costly. As an added challenge, effective solutions for farmers must perform well under significant constraints, since African farmers may only have access to mobile-quality cameras with low-bandwidth.

In this competition, we introduce a dataset of 21,367 labeled images collected during a regular survey in Uganda. Most images were crowdsourced from farmers taking photos of their gardens, and annotated by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala. This is in a format that most realistically represents what farmers would need to diagnose in real life.

Your task is to classify each cassava image into four disease categories or a fifth category indicating a healthy leaf. With your help, farmers may be able to quickly identify diseased plants, potentially saving their crops before they inflict irreparable damage.***

In [None]:
# Lets check the GPU provided
!nvidia-smi 

In [None]:
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# user_credential = user_secrets.get_gcloud_credential()
# user_secrets.set_tensorflow_credential(user_credential)

In [None]:
!ls /kaggle/input

In [None]:
# Import all the directories
import os
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import applications
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import warnings
import json
import cv2

warnings.filterwarnings('ignore')
%matplotlib inline

IMAGE_SIZE=128

In [None]:
# Lets check the tensorflow version
tf.__version__

In [None]:
# GPU Initialize
device_name = tf.test.gpu_device_name()
if device_name!='/device:GPU:0':
    raise SystemError('GPU Device not found')
print('Found GPU at:{}'.format(device_name))

In [None]:
# Lets initialize the parent dir
PARENT_DIR = '../input/cassava-leaf-disease-classification'

In [None]:
# List folders are files
print(os.listdir(PARENT_DIR))

In [None]:
# Import train and sample csv
train_df = pd.read_csv(os.path.join(PARENT_DIR,'train.csv'))
sample_df = pd.read_csv(os.path.join(PARENT_DIR,'sample_submission.csv'))

In [None]:
# Reading Json file and check the mapping of labels
with open(os.path.join(PARENT_DIR, "label_num_to_disease_map.json")) as jfile:
    map_classes = json.loads(jfile.read())

print(map_classes)

In [None]:
# Take a look on training csv
train_df.head()

In [None]:
# Lets take a look into the images
train_df_0 = train_df[train_df['label']==0].head(10).image_id
train_df_1 = train_df[train_df['label']==1].head(10).image_id
train_df_2 = train_df[train_df['label']==2].head(10).image_id
train_df_3 = train_df[train_df['label']==3].head(10).image_id
train_df_4 = train_df[train_df['label']==4].head(10).image_id

In [None]:
def show_image(img_dir):
    i_dir = img_dir
    train_dir= PARENT_DIR +'/'+'train_images'
    i = 1
    plt.figure(figsize=(20,10))
    for img in i_dir:
        img = cv2.imread(os.path.join(train_dir,img),cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE),interpolation = cv2.INTER_NEAREST)
        plt.subplot(2,5,i)
        plt.imshow(img)
        i+=1

In [None]:
# Images for --Cassava Bacterial Blight (CBB)
show_image(train_df_0)

In [None]:
# Images for --Cassava Brown Streak Disease (CBSD)
show_image(train_df_1)

In [None]:
# Images for --Cassava Green Mottle (CGM)
show_image(train_df_2)

In [None]:
# Images for --Cassava Mosaic Disease (CMD)
show_image(train_df_3)

In [None]:
# Images for --Healthy
show_image(train_df_4)

In [None]:
# Mapping numbers with labels
train_df['label'] = train_df.label.map({0: 'Cassava Bacterial Blight (CBB)',
                    1: 'Cassava Brown Streak Disease (CBSD)',
                    2: 'Cassava Green Mottle (CGM)',
                    3: 'Cassava Mosaic Disease (CMD)', 
                    4: 'Healthy'})

In [None]:
# Check for class imbalence will handle it using agumentation in ImageDataGenerator
plt.figure(figsize=(20,10))
sns.countplot(train_df['label'])
plt.show()

In [None]:
# Creating training validation and test generator
datagen = ImageDataGenerator(
                    rotation_range = 40,
                    width_shift_range = 0.2,
                    height_shift_range = 0.2,
                    shear_range = 0.2,
                    zoom_range = 0.2,
                    horizontal_flip = True,
                    vertical_flip = True,
                    fill_mode = 'nearest',
                    validation_split=0.25
                    )

train_generator=datagen.flow_from_dataframe(
                    dataframe=train_df,
                    directory="../input/cassava-leaf-disease-classification/train_images/",
                    x_col="image_id",
                    y_col="label",
                    subset="training",
                    batch_size=32,
                    seed=42,
                    shuffle=True,
                    class_mode = 'categorical',
                    color_mode='rgb',
                    target_size=(IMAGE_SIZE,IMAGE_SIZE)
                    )


val_generator=datagen.flow_from_dataframe(
                    dataframe=train_df,
                    directory="../input/cassava-leaf-disease-classification/train_images/",
                    x_col="image_id",
                    y_col="label",
                    subset="validation",
                    batch_size=32,
                    seed=42,
                    shuffle=True,
                    class_mode="categorical",
                    color_mode='rgb',
                    target_size=(IMAGE_SIZE,IMAGE_SIZE)
                    )

In [None]:
# Mapping numbers with test labels
sample_df['label'] = sample_df.label.map({0: 'Cassava Bacterial Blight (CBB)',
                    1: 'Cassava Brown Streak Disease (CBSD)',
                    2: 'Cassava Green Mottle (CGM)',
                    3: 'Cassava Mosaic Disease (CMD)', 
                    4: 'Healthy'})

In [None]:
'''
test_datagen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255.)
test_generator=test_datagen.flow_from_dataframe(
    dataframe=sample_df,
    directory='../input/cassava-leaf-disease-classification/test_images/',
    x_col="image_id",
    y_col=None,
    batch_size=32,
    seed=42,
    shuffle=False,
    class_mode=None,
    target_size=(IMAGE_SIZE,IMAGE_SIZE)
)
'''

In [None]:
# Lets check the size of one batch
for img,lab in train_generator:
#     print(lab)
    print(img.shape)
    break

In [None]:
# Define CallBacks
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(
    "InceptionV3_Model.h5",
    save_best_only=True,
    monitor = 'val_loss',
    mode='min'
)
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor = 'val_loss',
                                  factor = 0.3,
                                  patience = 3,
                                  min_lr = 1e-5,
                                  mode = 'min',
                                  verbose = 1)

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    mode='min', 
    patience=5,
    restore_best_weights=True, 
    verbose=1
)

In [None]:
# Perform training 
BATCH_SIZE = 32
with tf.device('/gpu:0'):
    model = tf.keras.Sequential([
        tf.keras.applications.EfficientNetB7(
            input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3),
            weights=None,
            include_top=False
        #    drop_connect_rate=0.7
        ),
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512,
                              activation = 'relu', 
                              bias_regularizer=tf.keras.regularizers.l1_l2(l1=0.01,
                                                                           l2=0.001)),
        tf.keras.layers.Dropout(0.7),
        tf.keras.layers.Dense(5, activation='softmax')
    ])
    model.add_weight('../input/tfkerasefficientnetimagenetnotop/efficientnetb7_notop.h5')
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate = 1e-3),
        loss='categorical_crossentropy',
        metrics=['categorical_accuracy'])
    history = model.fit(
            train_generator,
            steps_per_epoch = train_generator.n/BATCH_SIZE,
            epochs=20,
            batch_size = BATCH_SIZE,
            validation_data=val_generator,
            validation_steps = val_generator.n/BATCH_SIZE,
            callbacks=[checkpoint_cb,reduce_lr,early_stopping_cb])

In [None]:
# Plotting accuracy history
plt.figure(figsize= (15,10))
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['val_categorical_accuracy'])
plt.title('Accuracy Tracker', fontsize=15)
plt.xlabel('Epochs', fontsize=15)
plt.ylabel('Accuracy', fontsize=15)
plt.legend(['training', 'validation'])

In [None]:
# Plotting loss history
plt.figure(figsize= (15,10))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss Tracker', fontsize=15)
plt.xlabel('Epochs', fontsize=15)
plt.ylabel('Loss', fontsize=15)
plt.legend(['training', 'validation'])

In [None]:
submission = pd.DataFrame(columns=['image_id','label'])
for image_name in os.listdir(PARENT_DIR + '/test_images'):
    image_path = os.path.join(PARENT_DIR + '/test_images', image_name)
    image = tf.keras.preprocessing.image.load_img(image_path)
    resized_image = image.resize((IMAGE_SIZE, IMAGE_SIZE))
    numpied_image = np.expand_dims(resized_image, 0)
    tensored_image = tf.cast(numpied_image, tf.float32)
    submission = submission.append(pd.DataFrame({'image_id': image_name,
                                                 'label': model.predict_classes(tensored_image)}))


In [None]:
submission.head()

In [None]:
# Saving CSV to output folder
submission.to_csv('submission.csv',index=False)