<a href="https://colab.research.google.com/github/reehambasheer/opencv-image-classifier/blob/main/Plant_Disease_Classifier_OpenCV_Sklearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🌱 Plant Disease Classification
---
This notebook demonstrates how to use **OpenCV** for image preprocessing and **scikit-learn** for training a classifier on the [Plant Diseases Dataset](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset).

### Steps:
1. Download dataset from Kaggle
2. Preprocess images using OpenCV
3. Extract features
4. Train scikit-learn classifier
5. Evaluate performance
6. Save model for deployment


In [15]:
# Install dependencies (run only in Colab)
!pip install opencv-python scikit-learn matplotlib tqdm



## 📂 Step 1: Mount Google Drive & Load Dataset

In [16]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("vipoooool/new-plant-diseases-dataset")

print("Path to dataset files:", path)

Using Colab cache for faster access to the 'new-plant-diseases-dataset' dataset.
Path to dataset files: /kaggle/input/new-plant-diseases-dataset


## 🖼 Step 2: Preprocess Images with OpenCV

In [17]:
import cv2, os
import numpy as np
from sklearn.model_selection import train_test_split
from tqdm import tqdm

IMG_SIZE = 64
X, y = [], []

# Construct the path to the training data within the downloaded dataset
# Search for the training data directory within the downloaded path
train_data_path = None
for root, dirs, files in os.walk(path):
    if 'train' in dirs:
        train_data_path = os.path.join(root, 'train')
        print(f"Found training data path: {train_data_path}")
        break

if train_data_path:
    print(f"Contents of {train_data_path}: {os.listdir(train_data_path)}") # Debugging print
    classes = os.listdir(train_data_path)
    # Filter out any hidden files or non-directory entries
    classes = [cls for cls in classes if os.path.isdir(os.path.join(train_data_path, cls))]
    label_map = {cls:i for i, cls in enumerate(classes)}

    print(f"Found {len(classes)} classes after filtering.") # Debugging print

    # Check if any classes were found
    if not classes:
        print(f"Error: No class directories found in {train_data_path}")
    else:
        for cls in tqdm(classes):
            cls_path = os.path.join(train_data_path, cls)
            # Ensure cls_path is a directory before listing files
            if os.path.isdir(cls_path):
                # Limit per class for speed, but ensure there are files to iterate
                img_list = os.listdir(cls_path)
                print(f"Processing class {cls} with {len(img_list)} images found.")
                if img_list:
                    for img_name in img_list[:200]: # Limiting to 200 images per class for faster execution
                        img_path = os.path.join(cls_path, img_name)
                        # Ensure img_path is a file before reading
                        if os.path.isfile(img_path):
                            img = cv2.imread(img_path)
                            if img is None:
                                continue
                            img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
                            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                            X.append(gray.flatten())
                            y.append(label_map[cls])
                else:
                    print(f"Warning: No images found in directory {cls_path}")
            else:
                 print(f"Warning: Expected directory but found file at {cls_path}")


X = np.array(X)
y = np.array(y)

# Check if X and y are not empty before splitting
if len(X) > 0 and len(y) > 0:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    print("Dataset prepared:", X_train.shape, X_test.shape)
else:
    print("Error: No image data was loaded. X and y are empty.")

Found training data path: /kaggle/input/new-plant-diseases-dataset/New Plant Diseases Dataset(Augmented)/New Plant Diseases Dataset(Augmented)/train
Contents of /kaggle/input/new-plant-diseases-dataset/New Plant Diseases Dataset(Augmented)/New Plant Diseases Dataset(Augmented)/train: ['Tomato___Late_blight', 'Tomato___healthy', 'Grape___healthy', 'Orange___Haunglongbing_(Citrus_greening)', 'Soybean___healthy', 'Squash___Powdery_mildew', 'Potato___healthy', 'Corn_(maize)___Northern_Leaf_Blight', 'Tomato___Early_blight', 'Tomato___Septoria_leaf_spot', 'Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot', 'Strawberry___Leaf_scorch', 'Peach___healthy', 'Apple___Apple_scab', 'Tomato___Tomato_Yellow_Leaf_Curl_Virus', 'Tomato___Bacterial_spot', 'Apple___Black_rot', 'Blueberry___healthy', 'Cherry_(including_sour)___Powdery_mildew', 'Peach___Bacterial_spot', 'Apple___Cedar_apple_rust', 'Tomato___Target_Spot', 'Pepper,_bell___healthy', 'Grape___Leaf_blight_(Isariopsis_Leaf_Spot)', 'Potato___Late

  0%|          | 0/38 [00:00<?, ?it/s]

Processing class Tomato___Late_blight with 1851 images found.


  3%|▎         | 1/38 [00:00<00:21,  1.70it/s]

Processing class Tomato___healthy with 1926 images found.


  5%|▌         | 2/38 [00:01<00:21,  1.71it/s]

Processing class Grape___healthy with 1692 images found.


  8%|▊         | 3/38 [00:01<00:23,  1.52it/s]

Processing class Orange___Haunglongbing_(Citrus_greening) with 2010 images found.


 11%|█         | 4/38 [00:02<00:25,  1.35it/s]

Processing class Soybean___healthy with 2022 images found.


 13%|█▎        | 5/38 [00:03<00:26,  1.25it/s]

Processing class Squash___Powdery_mildew with 1736 images found.


 16%|█▌        | 6/38 [00:04<00:25,  1.24it/s]

Processing class Potato___healthy with 1824 images found.


 18%|█▊        | 7/38 [00:05<00:24,  1.27it/s]

Processing class Corn_(maize)___Northern_Leaf_Blight with 1908 images found.


 21%|██        | 8/38 [00:06<00:24,  1.23it/s]

Processing class Tomato___Early_blight with 1920 images found.


 24%|██▎       | 9/38 [00:06<00:22,  1.31it/s]

Processing class Tomato___Septoria_leaf_spot with 1745 images found.


 26%|██▋       | 10/38 [00:07<00:21,  1.33it/s]

Processing class Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot with 1642 images found.


 29%|██▉       | 11/38 [00:08<00:22,  1.23it/s]

Processing class Strawberry___Leaf_scorch with 1774 images found.


 32%|███▏      | 12/38 [00:09<00:21,  1.20it/s]

Processing class Peach___healthy with 1728 images found.


 34%|███▍      | 13/38 [00:10<00:20,  1.25it/s]

Processing class Apple___Apple_scab with 2016 images found.


 37%|███▋      | 14/38 [00:10<00:19,  1.26it/s]

Processing class Tomato___Tomato_Yellow_Leaf_Curl_Virus with 1961 images found.


 39%|███▉      | 15/38 [00:11<00:17,  1.31it/s]

Processing class Tomato___Bacterial_spot with 1702 images found.


 42%|████▏     | 16/38 [00:12<00:15,  1.38it/s]

Processing class Apple___Black_rot with 1987 images found.


 45%|████▍     | 17/38 [00:12<00:14,  1.47it/s]

Processing class Blueberry___healthy with 1816 images found.


 47%|████▋     | 18/38 [00:13<00:13,  1.52it/s]

Processing class Cherry_(including_sour)___Powdery_mildew with 1683 images found.


 50%|█████     | 19/38 [00:13<00:11,  1.61it/s]

Processing class Peach___Bacterial_spot with 1838 images found.


 53%|█████▎    | 20/38 [00:14<00:10,  1.73it/s]

Processing class Apple___Cedar_apple_rust with 1760 images found.


 55%|█████▌    | 21/38 [00:14<00:09,  1.77it/s]

Processing class Tomato___Target_Spot with 1827 images found.


 58%|█████▊    | 22/38 [00:15<00:09,  1.74it/s]

Processing class Pepper,_bell___healthy with 1988 images found.


 61%|██████    | 23/38 [00:16<00:08,  1.77it/s]

Processing class Grape___Leaf_blight_(Isariopsis_Leaf_Spot) with 1722 images found.


 63%|██████▎   | 24/38 [00:16<00:08,  1.72it/s]

Processing class Potato___Late_blight with 1939 images found.


 66%|██████▌   | 25/38 [00:17<00:07,  1.68it/s]

Processing class Tomato___Tomato_mosaic_virus with 1790 images found.


 68%|██████▊   | 26/38 [00:17<00:07,  1.70it/s]

Processing class Strawberry___healthy with 1824 images found.


 71%|███████   | 27/38 [00:18<00:06,  1.69it/s]

Processing class Apple___healthy with 2008 images found.


 74%|███████▎  | 28/38 [00:19<00:06,  1.62it/s]

Processing class Grape___Black_rot with 1888 images found.


 76%|███████▋  | 29/38 [00:19<00:05,  1.60it/s]

Processing class Potato___Early_blight with 1939 images found.


 79%|███████▉  | 30/38 [00:20<00:05,  1.58it/s]

Processing class Cherry_(including_sour)___healthy with 1826 images found.


 82%|████████▏ | 31/38 [00:21<00:04,  1.58it/s]

Processing class Corn_(maize)___Common_rust_ with 1907 images found.


 84%|████████▍ | 32/38 [00:21<00:03,  1.57it/s]

Processing class Grape___Esca_(Black_Measles) with 1920 images found.


 87%|████████▋ | 33/38 [00:22<00:03,  1.61it/s]

Processing class Raspberry___healthy with 1781 images found.


 89%|████████▉ | 34/38 [00:22<00:02,  1.60it/s]

Processing class Tomato___Leaf_Mold with 1882 images found.


 92%|█████████▏| 35/38 [00:23<00:01,  1.62it/s]

Processing class Tomato___Spider_mites Two-spotted_spider_mite with 1741 images found.


 95%|█████████▍| 36/38 [00:24<00:01,  1.64it/s]

Processing class Pepper,_bell___Bacterial_spot with 1913 images found.


 97%|█████████▋| 37/38 [00:24<00:00,  1.66it/s]

Processing class Corn_(maize)___healthy with 1859 images found.


100%|██████████| 38/38 [00:25<00:00,  1.51it/s]

Dataset prepared: (6080, 4096) (1520, 4096)





## 🤖 Step 3: Train scikit-learn Classifier

In [18]:
from sklearn.svm import SVC
clf = SVC(kernel='linear', probability=True)
clf.fit(X_train, y_train)
print("Model trained successfully!")

Model trained successfully!


## 📊 Step 4: Evaluate Model

In [19]:
from sklearn.metrics import classification_report, accuracy_score

y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.3190789473684211
              precision    recall  f1-score   support

           0       0.17      0.19      0.18        48
           1       0.28      0.32      0.30        41
           2       0.42      0.57      0.49        49
           3       0.24      0.51      0.33        37
           4       0.33      0.55      0.41        31
           5       0.36      0.40      0.38        52
           6       0.35      0.23      0.28        35
           7       0.35      0.27      0.31        55
           8       0.23      0.17      0.20        47
           9       0.09      0.07      0.08        42
          10       0.50      0.36      0.42        33
          11       0.17      0.12      0.14        42
          12       0.30      0.19      0.23        47
          13       0.22      0.26      0.24        35
          14       0.20      0.31      0.24        36
          15       0.32      0.45      0.38        40
          16       0.34      0.35      0.35        3

## 💾 Step 5: Save Model

In [20]:
import pickle
pickle.dump(clf, open("plant_disease_clf.pkl", "wb"))
print("Model saved as plant_disease_clf.pkl")

Model saved as plant_disease_clf.pkl


## 🔍 Step 6: Test on New Image

In [21]:
test_img_path = os.path.join(train_data_path, classes[0], os.listdir(os.path.join(train_data_path, classes[0]))[0])
img = cv2.imread(test_img_path)
img_resized = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (IMG_SIZE, IMG_SIZE))
prediction = clf.predict([img_resized.flatten()])[0]
print("Predicted class:", list(label_map.keys())[list(label_map.values()).index(prediction)])

Predicted class: Raspberry___healthy
