# **Modelling and Evaluation Notebook**

## Objectives

* Answer business requirement 2:
    * Deliver an ML system that is capable of predicting whether a cherry leaf is healthy or contains powdery mildew. 

## Inputs

* inputs/mildew_dataset/leaves_images/train
* inputs/mildew_dataset/leaves_images/test
* inputs/mildew_dataset/leaves_images/validation
* image shape embeddings.

## Outputs

* Images distribution plot in train, validation, and test set.
* Image augmentation.
* Class indices to change prediction inference in labels.
* Machine learning model creation and training.
* Save model.
* Learning curve plot for model performance.
* Model evaluation on pickle file.
* Prediction on the random image file.


## Additional Comments

* No additional comments.



---

## Import regular packages

In [24]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, Dropout, Flatten, Dense, BatchNormalization, MaxPooling2D
import cv2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Set Working Directory

In [25]:
cwd= os.getcwd()

In [26]:
os.chdir('/workspace/mildew-detection')
print("You set a new current directory")

You set a new current directory


In [27]:
work_dir = os.getcwd()
work_dir

'/workspace/mildew-detection'

## Import Dataframes

In [6]:
import pandas as pd

# Paths to the CSV files previously created
train_csv_path = '/workspace/mildew-detection/jupyter_notebooks/train_dataframe.csv'
validation_csv_path = '/workspace/mildew-detection/jupyter_notebooks/validation_dataframe.csv'
test_csv_path = '/workspace/mildew-detection/jupyter_notebooks/test_dataframe.csv'

# Read the CSV files into dataframes
train_df = pd.read_csv(train_csv_path)
validation_df = pd.read_csv(validation_csv_path)
test_df = pd.read_csv(test_csv_path)

## Set input directories

In [7]:
# Define input directories from dataframes
train_path = train_df['file']
validation_path = validation_df['file']
test_path = test_df['file']

## Set output directory

In [8]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

Old version is already available create a new version.


## Set labels

In [9]:
# Extract labels from the train DataFrame
labels = train_df['file'].apply(os.path.basename)
print('Labels for the images are:', labels.unique())

Labels for the images are: ['0008f3d3-2f85-4973-be9a-1b520b8b59fc___JR_HL 4092.JPG'
 '0008f3d3-2f85-4973-be9a-1b520b8b59fc___JR_HL 4092_180deg.JPG'
 '0008f3d3-2f85-4973-be9a-1b520b8b59fc___JR_HL 4092_flipTB.JPG' ...
 'ffdeb404-b84d-4389-9cc9-e1d3159374fe___FREC_Pwd.M 4625.JPG'
 'ffdeb404-b84d-4389-9cc9-e1d3159374fe___FREC_Pwd.M 4625_flipLR.JPG'
 'fff3ae4b-4bce-4b7a-b53c-98c482d9d8fd___FREC_Pwd.M 0414.JPG']


## Set image shape

In [11]:
import pandas as pd
from tensorflow.keras.preprocessing.image import load_img

# Paths to the CSV files previously created
train_csv_path = '/workspace/mildew-detection/jupyter_notebooks/train_dataframe.csv'
validation_csv_path = '/workspace/mildew-detection/jupyter_notebooks/validation_dataframe.csv'
test_csv_path = '/workspace/mildew-detection/jupyter_notebooks/test_dataframe.csv'

# Read the CSV files into dataframes
train_df = pd.read_csv(train_csv_path)
validation_df = pd.read_csv(validation_csv_path)
test_df = pd.read_csv(test_csv_path)

# Image path from train DataFrame 
sample_image_path = train_df['file'].iloc[0]  

# Load the sample image to get its shape
sample_image = load_img(sample_image_path)
image_shape = sample_image.size + (3,)  

print("Image Shape:", image_shape)

Image Shape: (256, 256, 3)


---

## Number of images in train, test and validation DataFrames

Calculate and plot how many images

In [13]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.preprocessing.image import load_img

# Set Working Directory
cwd = os.getcwd()
os.chdir('/workspace/mildew-detection')
print("You set a new current directory")
work_dir = os.getcwd()
print("Current directory:", work_dir)

# Import Dataframes
train_csv_path = 'jupyter_notebooks/train_dataframe.csv'
validation_csv_path = 'jupyter_notebooks/validation_dataframe.csv'
test_csv_path = 'jupyter_notebooks/test_dataframe.csv'

train_df = pd.read_csv(train_csv_path)
validation_df = pd.read_csv(validation_csv_path)
test_df = pd.read_csv(test_csv_path)

# Print the number of images in each DataFrame
print("Number of images in train DataFrame:", len(train_df))
print("Number of images in validation DataFrame:", len(validation_df))
print("Number of images in test DataFrame:", len(test_df))

You set a new current directory
Current directory: /workspace/mildew-detection
Number of images in train DataFrame: 2944
Number of images in validation DataFrame: 420
Number of images in test DataFrame: 844


---

## Image data augmentation

---

### ImageDataGenerator

In [14]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

### Initialize ImageDataGenerator

In [22]:
augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255
                                          )

batch_size = 20  # Set batch size
train_set = augmented_image_data.flow_from_directory(train_path,
                                                     target_size=image_shape[:2],
                                                     color_mode='rgb',
                                                     batch_size=batch_size,
                                                     class_mode='binary',
                                                     shuffle=True
                                                     )

validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(val_path,
                                                                        target_size=image_shape[:2],
                                                                        color_mode='rgb',
                                                                        batch_size=batch_size,
                                                                        class_mode='binary',
                                                                        shuffle=False
                                                                        )

test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(test_path,
                                                                  target_size=image_shape[:2],
                                                                  color_mode='rgb',
                                                                  batch_size=batch_size,
                                                                  class_mode='binary',
                                                                  shuffle=False
                                                                  )

# Plot augmented training image
for _ in range(3):
    img, label = train_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()
# Plot augmented validation and test images
for _ in range(3):
    img, label = validation_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()
for _ in range(3):
    img, label = test_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()
# Save class_indices
joblib.dump(value=train_set.class_indices,
            filename=f"{file_path}/class_indices.pkl")

TypeError: listdir: path should be string, bytes, os.PathLike, integer or None, not Series

---

# Model creation

---

## ML Model

### Import model packages

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

## Model

In [None]:
def create_tf_model():
    model = Sequential()

    model.add(Conv2D(filters=32, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(filters=64, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(filters=64, kernel_size=(3, 3),
              input_shape=image_shape, activation='relu', ))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(128, activation='relu'))

    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

### Model Summary

In [None]:
create_tf_model().summary()

### Early stopping

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)

## Fit model for model training

In [None]:
model = create_tf_model()
model.fit(train_set,
          epochs=25,
          steps_per_epoch=len(train_set.classes) // batch_size,
          validation_data=validation_set,
          callbacks=[early_stop],
          verbose=1
          )

## Plot augmented training image

In [None]:
for _ in range(3):
    img, label = train_set.next()
    print(img.shape)  # (1,256,256,3)
    plt.imshow(img[0])
    plt.show()

## Save model

In [None]:

model.save('outputs/v1/malaria_detector_model.h5')

---

# Model Performace

---

## Model learning curve

In [None]:
losses = pd.DataFrame(model.history.history)

sns.set_style("whitegrid")
losses[['loss', 'val_loss']].plot(style='.-')
plt.title("Loss")
plt.savefig(f'{file_path}/model_training_losses.png',
            bbox_inches='tight', dpi=150)
plt.show()

print("\n")
losses[['accuracy', 'val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.savefig(f'{file_path}/model_training_acc.png',
            bbox_inches='tight', dpi=150)
plt.show()

## Model Evaluation

In [None]:
from keras.models import load_model
model = load_model('outputs/v1/mildew_detection_model.h5')

Evaluate model on test set

### Save evaluation pickle

In [None]:
joblib.dump(value=evaluation,
            filename=f"outputs/v1/evaluation.pkl")

## Predict on new data

Load a random image as PIL

In [None]:
from tensorflow.keras.preprocessing import image

pointer = 66
label = labels[0]  # select Uninfected or Parasitised

pil_image = image.load_img(test_path + '/' + label + '/' + os.listdir(test_path+'/' + label)[pointer],
                           target_size=image_shape, color_mode='rgb')
print(f'Image shape: {pil_image.size}, Image mode: {pil_image.mode}')
pil_image

Convert image to array and prepare for prediction

In [None]:
my_image = image.img_to_array(pil_image)
my_image = np.expand_dims(my_image, axis=0)/255
print(my_image.shape)

Predict class probabilities

In [None]:
pred_proba = model.predict(my_image)[0, 0]

target_map = {v: k for k, v in train_set.class_indices.items()}
pred_class = target_map[pred_proba > 0.5]

if pred_class == target_map[0]:
    pred_proba = 1 - pred_proba

print(pred_proba)
print(pred_class)

# Push files to Repo

* If you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
    # create here your folder
    # os.makedirs(name='')
except Exception as e:
    print(e)
