# Modelling and Evaluating Notebook

## Objectives

* Answer business requirement 2:
    * The client is interested to tell whether a given leaf contains a powdery mildew or not.

## Inputs

* inputs/cherry_mildew_dataset/cherry-leaves/train
* inputs/cherry_mildew_dataset/cherry-leaves/test
* inputs/cherry_mildew_dataset/cherry-leaves/validation
* image shape embeddings.

## Outputs

* Images distribution plot in train, validation, and test set.
* Image augmentation.
* Class indices to change prediction inference in labels.
* Machine learning model creation and training.
* Save model.
* Learning curve plot for model performance.
* Model evaluation on pickle file.
* Prediction on the random image file.

## Additional Comments

* No comment



---

## Import regular packages

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread

# Change working directory

***

## Set working directory

In [None]:
current_dir = os.getcwd()

In [None]:
os.chdir('/workspaces/milestone-project-cherry-leaves-mildew-detection')
print("You set a new current directory")

In [None]:
current_dir = os.getcwd()
current_dir

***

## Set input directories

Set train, validation and test paths

In [None]:
# This code snippet was adapted/updated from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

my_data_dir = os.path.abspath('inputs/cherry_mildew_dataset/cherry-leaves')
train_path = os.path.join(my_data_dir, 'train')
val_path = os.path.join(my_data_dir, 'validation')
test_path = os.path.join(my_data_dir, 'test')

## Set output directory

In [None]:
# This code snippet was adapted/updated from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(current_dir) and version in os.listdir(current_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

## Set labels

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

label_list = os.listdir(train_path)

print(
    f"Project Labels: {label_list}"
)

## Set image shape

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

## Import saved image shape embedding
import joblib
version = 'v1'
image_shape = joblib.load(filename=f"outputs/{version}/image_shape.pkl")
image_shape

# Number of images in train, test and 
# validation data

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

# Initialize an empty DataFrame to store the data
df_freq = pd.DataFrame(columns=['Set', 'Label', 'Frequency'])


folders = ['train', 'validation', 'test']

# Iterate through folders and labels to count images
for folder in folders:
    for label in label_list:
        folder_path = os.path.join(my_data_dir, folder, label)
        num_images = len(os.listdir(folder_path))
        df_freq = df_freq.append({'Set': folder, 'Label': label, 'Frequency': num_images}, ignore_index=True)
        print(f"* {folder} - {label}: {num_images} images")

# Plot the distribution of labels
sns.set_style("whitegrid")
plt.figure(figsize=(8, 5))
sns.barplot(data=df_freq, x='Set', y='Frequency', hue='Label')
plt.savefig(f'{file_path}/labels_distribution.png', bbox_inches='tight', dpi=150)
plt.show()


***

## Image data augmentation

***

### ImageDataGenerator

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

from tensorflow.keras.preprocessing.image import ImageDataGenerator

 + ### Initialize ImageDataGenerator

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255
                                          )

* ## Augment training image dataset

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

batch_size = 20  # Set batch size
train_set = augmented_image_data.flow_from_directory(train_path,
                                                     target_size=image_shape[:2],
                                                     color_mode='rgb',
                                                     batch_size=batch_size,
                                                     class_mode='binary',
                                                     shuffle=True
                                                     )

train_set.class_indices

* ## Augment validation image dataset

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(val_path,
                                                                        target_size=image_shape[:2],
                                                                        color_mode='rgb',
                                                                        batch_size=batch_size,
                                                                        class_mode='binary',
                                                                        shuffle=False
                                                                        )

validation_set.class_indices

* ## Augment test image dataset

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(test_path,
                                                                  target_size=image_shape[:2],
                                                                  color_mode='rgb',
                                                                  batch_size=batch_size,
                                                                  class_mode='binary',
                                                                  shuffle=False
                                                                  )

test_set.class_indices

## Plot augmented training image

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

for _ in range(3):
    img, label = train_set.next()
    print(img.shape)
    plt.imshow(img[0])
    plt.show()

## Plot augmented validation and test images

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

for _ in range(3):
    img, label = validation_set.next()
    print(img.shape)
    plt.imshow(img[0])
    plt.show()

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

for _ in range(3):
    img, label = test_set.next()
    print(img.shape)
    plt.imshow(img[0])
    plt.show()

## Save class_indices

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

joblib.dump(value=train_set.class_indices,
            filename=f"{file_path}/class_indices.pkl")

***

# **Model creation**

***

## ML model

* ## Import model packages

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

* ## Model

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

def create_tf_model(input_shape, n_labels):
    model = Sequential([
        Conv2D(16, (3, 3), input_shape=input_shape, activation='relu'),
        MaxPooling2D((2, 2)),
        Conv2D(32, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(64, activation='relu'),
        Dense(128, activation='relu'),  
        Dropout(0.5),
        Dense(1, activation='sigmoid')  
    ])

    model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
    return model

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

n_labels = len(label_list)  
model = create_tf_model(input_shape=image_shape, n_labels=n_labels)
model.summary()

* ### Early Stopping

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb


from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=7)

# Fit the model for training purposes

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

model.fit(train_set,
          epochs=25,
          steps_per_epoch=len(train_set.classes) // batch_size,
          validation_data=validation_set,
          callbacks=[early_stop],
          verbose=1
          )

* ## Save Model

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/b90e9193974b30d5a20dcf3a013c8e819cefebfd/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

model.save('outputs/v1/cherry_mildew_detector_model.h5')

***

# **Model Performance**

***

## Model Learning Curve

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/main/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

losses = pd.DataFrame(model.history.history)

sns.set_style("whitegrid")
losses[['loss', 'val_loss']].plot(style='.-')
plt.title("Loss")
plt.savefig(f'{file_path}/model_training_losses.png',
            bbox_inches='tight', dpi=150)
plt.show()

print("\n")
losses[['accuracy', 'val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.savefig(f'{file_path}/model_training_acc.png',
            bbox_inches='tight', dpi=150)
plt.show()

### Model Evaluation

Load Saved Model

In [None]:
import requests
from keras.models import load_model
from io import BytesIO
from tempfile import NamedTemporaryFile

# model url
model_url = 'https://drive.google.com/uc?id=1jMOU1eHCgkZsEHF5_VBiPm916xVl2gqn'

# Download the model file from the URL
response = requests.get(model_url)

# Create a temporary file to save the model
with NamedTemporaryFile(delete=False, suffix=".h5") as tmp_file:
    tmp_file.write(response.content)
    tmp_file_path = tmp_file.name

# Load the Keras model from the temporary file
model = load_model(tmp_file_path, compile=False)
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [None]:
mod_evaluation = model.evaluate(test_set)

### Save Evaluation Test Pickle File

In [None]:
joblib.dump(value=mod_evaluation,
            filename=f"outputs/v1/evaluation.pkl")

## Predict on New Data

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/main/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

from tensorflow.keras.preprocessing import image

pointer = 66
label = label_list[0]  # select healthy or mildew

leaf_image = image.load_img(test_path + '/' + label + '/' + os.listdir(test_path+'/' + label)[pointer],
                           target_size=image_shape, color_mode='rgb')
print(f'Image shape: {leaf_image.size}, Image mode: {leaf_image.mode}')
leaf_image

### Convert image to array and prepare for prediction

In [None]:

my_image = image.img_to_array(leaf_image)
my_image = np.expand_dims(my_image, axis=0)/255
print(my_image.shape)

### Predict class probabilities

In [None]:
# This code snippet was adapted from Code Institue Malaria Detector Walkthrough Sample Project
# https://github.com/Code-Institute-Solutions/WalkthroughProject01/blob/main/jupyter_notebooks/03%20-%20Modelling%20and%20Evaluating.ipynb

pred_proba = model.predict(my_image)[0, 0]

target_map = {v: k for k, v in train_set.class_indices.items()}
pred_class = target_map[pred_proba > 0.5]

if pred_class == target_map[0]:
    pred_proba = 1 - pred_proba

print(pred_proba)
print(pred_class)

## Push Files To Repo

### Push new/generated files from this session to github repo

* .gitignore

In [None]:
!cat .gitignore

* Git Status

In [None]:
!git status

* Git add

In [None]:
!git add .

* Git Commit

In [None]:
!git commit -am " Add new plots"

* Git push

In [None]:
!git push origin main

# Conclusions and Next Steps

## Conclusion

* ### The image dataset underwent augmentation to fulfill the substantial image quantity requirement of a convolutional neural network (CNN).

* ### The deep-learning Convolutional Neural Network was employed to train and fine-tune the model.

* ### The model's performance was assessed using new data, and it met the client's requirements for accuracy.

* ### The Model and its evaluation were saved as a pickle file.

* ### The model fulfilled the business requirement for accuracy.

## Next Steps:

* ### Fulfill dashboard interface business requirement for the client.