### Table of Content

<i><p style="border-width:2px; border-style:solid; border-color:#808080; padding: 0.5em;text-align:left;">PART I: Project Overview</p></i>

> **1. [Introduction](#intro)**
	> > 1.1. [Literature Review](#1.1)
>
<i><p style="border-width:2px; border-style:solid; border-color:#808080; padding: 0.5em;text-align:left;">PART II: Data Analysis</p></i>
> 
> **2. [Methodology](#methodology)**
	> > 2.1 [Approach](#2.1)   
	> > 2.2 [Dataset](#2.2)     
	> > 2.3 [Dependencies](#2.3)   
	> > 2.4 [Getting Started](#2.4)
>
> **3. [Data Preprocessing](#preprocessing)**
	> > 3.1 [A First Look At The Data](#3.1)  
	> > 3.2 [Identifying Image Imbalance](#3.2)  
	> > 3.3 [Plotting Image Size](#3.3)     
>
> **4. [Exploratory Data Analysis](#eda)**   
> > 4.1. [Orientation](#4.1)  
> > 4.2. [RGB Channels](#4.2)  
>
> **5. [Modeling and Evaluation](#model_eval)**
>
<i><p style="border-width:2px; border-style:solid; border-color:#808080; padding: 0.5em;text-align:left;">PART III: Findings</p></i>
> 
> **6. [Discussion](#discussion)**  
    </font>

In [1]:
## IMPORT LIBRARIES ##

import numpy as np #numpy for 

import pandas as pd # pandas

from matplotlib import pyplot as plt # matplot library
import seaborn as sns

In [1]:
## IMPORT LIBRARIES ##

import os  
from matplotlib import image as mpimg
from random import randint
from PIL import Image
from skimage import io, img_as_float, img_as_ubyte
from skimage.io import imread, imshow
import cv2
from glob import glob
from sklearn.decomposition import PCA
import warnings 
warnings.filterwarnings('ignore')

##########

In [2]:
## Load the image directories ##

dir = 'dataset' # paste your folder directory

# returns a list containing the names of the images in the `healthy` folder
training_healthy_data= os.listdir(dir + '/training/healthy') # training dataset
testing_healthy_data = os.listdir(dir + '/training/healthy') # testing dataset

# returns a list containing the names of the images in the `aculus olearius` folder
training_aculus_data= os.listdir(dir + '/training/aculus_olearius') # training dataset 
testing_aculus_data = os.listdir(dir + '/training/aculus_olearius') # testing dataset

# returns a list containing the names of the images in the `peacock spot` folder
training_peacock_data= os.listdir(dir + '/training/peacock_spot') # training dataset 
testing_peacock_data = os.listdir(dir + '/training/peacock_spot/') # testing dataset

#all_data = [healthy_data, aculus_olearius_data, peacock_disease_data]

In [3]:
training_dir = {
    'Healthy Leaves': os.path.join(dir, 'training/healthy'), # Path to the directory containing the Healthy Leaves dataset
    'Aculus Olearius Leaves': os.path.join(dir, 'training/aculus_olearius'), # Path to the directory containing images with Aculus Olearius Leaves
    'Peacock Spot Leaves': os.path.join(dir, 'training/peacock_spot') # Path to the directory containing images with Peacock Spot Leaves
}

testing_dir = {
    'Healthy Leaves': os.path.join(dir, 'testing/healthy'), # Path to the directory containing the Healthy Leaves dataset
    'Aculus Olearius Leaves': os.path.join(dir, 'testing/aculus_olearius'), # Path to the directory containing images with Aculus Olearius Leaves
    'Peacock Spot Leaves': os.path.join(dir, 'testing/peacock_spot') # Path to the directory containing images with Peacock Spot Leaves
}

training_data = {
    'Healthy Leaves': training_healthy_data, # Path to the directory containing the Healthy Leaves dataset
    'Aculus Olearius Leaves': training_aculus_data, # Path to the directory containing images with Aculus Olearius Leaves
    'Peacock Disease Leaves': training_peacock_data # Path to the directory containing images with Peacock Spot Leaves
}

testing_data = {
    'Healthy Leaves': testing_healthy_data, # Path to the directory containing the Healthy Leaves dataset
    'Aculus Olearius Leaves': testing_aculus_data, # Path to the directory containing images with Aculus Olearius Leaves
    'Peacock Disease Leaves': testing_peacock_data # Path to the directory containing images with Peacock Spot Leaves
}
##########

Now that the dataset has been successfully loaded from the corresponding directories, we can proceed with the preprocessing stage. 
<br><br>

<hr>

*PART II: Data Analysis* 

<a id='model_eval'></a>
## Baseline Models and Evaluation

A baseline model is developed here to set a benchmark for classification performance. The evaluation framework focused on accuracy, precision, recall, and F1 score metrics to assess the model's effectiveness in correctly classifying the leaf images. Initial results indicated promising classification capabilities, with specific attention paid to minimizing false negatives to avoid missed detections of diseased or infested leaves.


### Logistic Regression

(use more words to explain outline of what I'm doing)
The first model we will build is a Logistic Regression model as seen below: 

#### Import Libraries

For a Logistic Regression model I will be importing the following libraries from `scikit-learn` library:

In [6]:
# Import Libraries

from sklearn.metrics import classification_report, ConfusionMatrixDisplay
from sklearn.linear_model import LogisticRegression

#### Instantiate and Fit the Model

STEP 1:
Iterate through the different classes to :

In [9]:
# HEALTHY 
# create empty list for the concatenated counts - big list

big_list = []

for i in training_healthy_data:
    print("Working on: ", i, end="\r")
    try:
        img = mpimg.imread(dir + '/healthy/' + i)
    except:
        continue

    # create empty list for the 3 counts
    img_rgb = []
    channels = [0,1,2]
    
    # loop through the 3 color channels
    for  channel in channels:
        
        # get histogram and counts for the channel
        counts, bins = np.histogram(img[:,:,channel].ravel(), bins = np.linspace(0, 255, 51))
        
        # put counts into the list
        img_rgb.append(counts)
        
        # concat the 3 counts into a single array of length 150
        large_counts = np.concatenate(img_rgb)
    
    # save the large array into the big list
    big_list.append(large_counts)
    
# we get a large list with as many arrays as images, each array is length 150


Working on:  B-624.jpg807_144239.jpg

In [8]:
# ACULUS OLEARIUS
# create empty list for the concatenated counts - big list

big_list = []

for i in training_aculus_data:
    print("Working on: ", i, end="\r")
    try:
        img = mpimg.imread(dir + '/training/aculus_olearius/' + i)
    except:
        continue

    # create empty list for the 3 counts
    img_rgb = []
    channels = [0,1,2]
    # loop through the 3 color channels
    for  channel in channels:
        # get histogram and counts for the channel
        counts, bins = np.histogram(img[:,:,channel].ravel(), bins=np.linspace(0, 255, 51))
        
        # put counts into the list
        img_rgb.append(counts)
        
        # concat the 3 counts into a single array of length 150
        large_counts = np.concatenate(img_rgb)
        
    # save the large array into the big list
    big_list.append(large_counts)
    
# we get a large list with as many arrays as images, each array is length 150


NameError: name 'aculus_olearius_data' is not defined

In [10]:
# PEACOCK DISEASE 
# create empty list for the concatenated counts - big list

big_list = []

for i in training_peacock_data:
    print("Working on: ", i, end = "\r")
    try:
        img = mpimg.imread(dir + '/training/peacock_spot/' + i)
    except:
        continue

    # create empty list for the 3 counts
    img_rgb = []
    channels = [0,1,2]
    # loop through the 3 color channels
    for  channel in channels:
        
        # get histogram and counts for the channel
        counts, bins = np.histogram(img[:,:,channel].ravel(), bins=np.linspace(0, 255, 51))
        
        # put counts into the list
        img_rgb.append(counts)
        
        # concat the 3 counts into a single array of length 150
        large_counts = np.concatenate(img_rgb)
        
    # save the large array into the big list
    big_list.append(large_counts)
    
# we get a large list with as many arrays as images, each array is length 150


Working on:  IMG_20190806_143133.jpg

NameError: name 'np' is not defined

STEP 2: Now to join the large array:

In [None]:
# JOIN THE LARGE ARRAY

healthy_hist = np.stack(big_list)
healthy_label = np.full(healthy_hist.shape[0], 0)

aculus_olearius_hist = np.stack(big_list)
aculus_olearius_label = np.full(aculus_olearius_hist.shape[0], 1)

peacock_disease_hist = np.stack(big_list)
peacock_disease_label = np.full(peacock_disease_hist.shape[0], 2)


In [None]:
# SANITY CHECK

display(healthy_label.shape, aculus_olearius_hist.shape, peacock_disease_hist.shape)

In [None]:
# Concatenate 

x = np.concatenate([healthy_hist, aculus_olearius_hist, peacock_disease_hist])
y = np.concatenate([healthy_label,aculus_olearius_label,peacock_disease_label])


# SANITY CHECK
x.shape,y.shape

STEP 3:

In [None]:
# Logistic Regression

model = LogisticRegression()
model.fit(x,y)

talk about .. idk something

#### Model Evaluation

For further evaluation I will also compute the precision, recall, and f1-score.

In [None]:
# Classification Report

report_initial = classification_report(y, model.predict(x))
print(report_initial)

This model achieves an accuracy of 76.6%!

In [None]:
# Confusion Matrix

ConfusionMatrixDisplay.from_estimator(model, x, y)

The confusion matrix shows us how many images from each classes were predicted correctly or incorrectly. We can see that this model performed pretty well in classifying the first and third classes but not as well in classifying leaves with aculus olearius which it mostly classified incorrectly as being healthy. This is expected because as we saw in the pre-processing stage the leaves with aculus olearius look a lot like healthy leaves.

<hr>

*PART III: Findings* 

<a id='results'></a>
## Results

<a id='discussion'></a>
## Discussion

The project's findings showcase the potential of image processing and machine learning in transforming agricultural practices through enhanced disease detection. Future work will focus on improving model accuracy and enhancing the model's classification capabilities by exploring transfer learning and deep learning techniques.
