<a href="https://colab.research.google.com/github/jegadeesh17/Event-1/blob/main/capstone_Copy1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Jegadeesh 20MIS1173 Capstone Project.

# Rice Leaf Image classification and Recommendation

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras import layers, models, Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.preprocessing import image







This project demonstrates how to classify images of rice leaf diseases and design a recommendation system. It uses a `tf.keras.Sequential` model with a cascading architecture and loads data using `tf.keras.utils.image_dataset_from_directory`. It demonstrates the following concepts:

*   Splitting data into a 70:30 ratio for training and validation.
*   Efficiently loading a dataset off disk.
*   Data augmentation to expand the dataset size and improve model generalization.
*   Implementing a cascading model architecture.
*   Utilizing a pre-trained ResNet50 model as a base within the cascading model.
*   Building a custom CNN model to complement the ResNet50 base.
*   Designing a recommender system based on image classification results.

This tutorial follows an advanced machine learning workflow:

1.  Examine and understand the data.
2.  Split the data into 70:30 ratio.
3.  Build an input pipeline with data augmentation.
4.  Build a cascading model:
    *   ResNet50 (base model)
    *   Custom CNN model
5.  Train the model.
6.  Test the model.
7.  Design a recommendation system:
    *   Use sample images to predict the class.
    *   Provide recommendations for plant disease conditions.
8.  Improve the model and the recommendation system

This tutorial uses a dataset of about 5,392 photos of rice leaf images . The dataset contains four sub-directories, one per class:

```
Bacterialblight
Blast
Brownspot
Tungro
```

### Create a dataset

Define some parameters for the loader:

In [None]:
batch_size = 32
img_height = 180
img_width = 180

It's good practice to use a validation split when developing your model. Use 80% of the images for training and 20% for validation.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
)

val_datagen = ImageDataGenerator(rescale=1./255)

test_datagen = ImageDataGenerator(rescale=1./255)

# Update the path to your training dataset
train_ds = train_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/train', # Replace with the correct path
    target_size=(180, 180),
    batch_size=32,
    class_mode='sparse',color_mode='rgb',
    shuffle=True
)

val_ds = val_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/val', # Replace with the correct path
    target_size=(180, 180),
    batch_size=32,
    class_mode='sparse',
    shuffle=False
)

test_ds = test_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/test', # Replace with the correct path
    target_size=(180, 180),
    batch_size=32,
    class_mode='sparse',
    shuffle=False
)

You can find the class names in the `class_names` attribute on these datasets. These correspond to the directory names in alphabetical order.

In [None]:
class_indices = train_ds.class_indices
class_names = list(class_indices.keys())
print(class_names)


## Visualize the data

Here are the first nine images from the training dataset:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(14, 5))
images, labels = next(train_ds)  # Get the next batch of images and labels

for i in range(min(10,len(train_ds))):  # Iterate through the first 9 images in the batch (or fewer if the batch has less than 9)
    ax = plt.subplot(2, 5, i + 1)
    plt.imshow((images[i]*255).astype("uint8"))  # Display the image
    plt.title(class_names[labels[i].astype(int)])  # Set the title with the class name
    plt.axis("off")
plt.show()


You will pass these datasets to the Keras `Model.fit` method for training later in this tutorial. If you like, you can also manually iterate over the dataset and retrieve batches of images:

In [None]:
for image_batch, labels_batch in train_ds:
  print(image_batch.shape)
  print(labels_batch.shape)
  break

The `image_batch` is a tensor of the shape `(32, 180, 180, 3)`. This is a batch of 32 images of shape `180x180x3` (the last dimension refers to color channels RGB). The `label_batch` is a tensor of the shape `(32,)`, these are corresponding labels to the 32 images.



## Configure the dataset for performance

Make sure to use buffered prefetching, so you can yield data from disk without having I/O become blocking. These are two important methods you should use when loading data:

- `Dataset.cache` keeps the images in memory after they're loaded off disk during the first epoch. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache.
- `Dataset.prefetch` overlaps data preprocessing and model execution while training.



In [None]:
import tensorflow as tf

AUTOTUNE = tf.data.AUTOTUNE

def prepare_ds(ds, shuffle=False):
    if shuffle:
        ds = ds.shuffle(1000)
    return ds.cache().prefetch(buffer_size=AUTOTUNE)

train_ds = tf.data.Dataset.from_generator(
    lambda: train_ds,
    output_signature=(
        tf.TensorSpec(shape=(None, 180, 180, 3), dtype=tf.float32),
        tf.TensorSpec(shape=(None,), dtype=tf.int32)
    )
)

val_ds = tf.data.Dataset.from_generator(
    lambda: val_ds,
    output_signature=(
        tf.TensorSpec(shape=(None, 180, 180, 3), dtype=tf.float32),
        tf.TensorSpec(shape=(None,), dtype=tf.int32)
    )
)

test_ds = tf.data.Dataset.from_generator(
    lambda: test_ds,
    output_signature=(
        tf.TensorSpec(shape=(None, 180, 180, 3), dtype=tf.float32),
        tf.TensorSpec(shape=(None,), dtype=tf.int32)
    )
)

train_ds = prepare_ds(train_ds, shuffle=True)
val_ds = prepare_ds(val_ds)
test_ds = prepare_ds(test_ds)


## Standardize the data


The RGB channel values are in the `[0, 255]` range. This is not ideal for a neural network; in general you should seek to make your input values small.

Here, you will standardize values to be in the `[0, 1]` range by using `tf.keras.layers.Rescaling`:

In [None]:
normalization_layer = layers.Rescaling(1./255)

Note: We previously resized images using the `image_size` argument of `tf.keras.utils.image_dataset_from_directory`. If we want to include the resizing logic in your model as well, we can use the `tf.keras.layers.Resizing` layer.

## Visualize training results

Creating plots of the loss and accuracy on the training and validation sets:

The plots show that training accuracy and validation accuracy are off by large margins, and the model has achieved only around 60% accuracy on the validation set.



## Overfitting

In the plots above, the training accuracy is increasing linearly over time, whereas validation accuracy stalls around 60% in the training process. Also, the difference in accuracy between training and validation accuracy is noticeable—a sign of [overfitting](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit).

When there are a small number of training examples, the model sometimes learns from noises or unwanted details from training examples—to an extent that it negatively impacts the performance of the model on new examples. This phenomenon is known as overfitting. It means that the model will have a difficult time generalizing on a new dataset.

There are multiple ways to fight overfitting in the training process. In this tutorial, We'll use *data augmentation* and add *dropout* to our model.

## Data augmentation

Overfitting generally occurs when there are a small number of training examples. [Data augmentation](./data_augmentation.ipynb) takes the approach of generating additional training data from your existing examples by augmenting them using random transformations that yield believable-looking images. This helps expose the model to more aspects of the data and generalize better.

You will implement data augmentation using the following Keras preprocessing layers: `tf.keras.layers.RandomFlip`, `tf.keras.layers.RandomRotation`, and `tf.keras.layers.RandomZoom`. These can be included inside our model like other layers, and run on the GPU.

In [None]:

# Assuming you have defined img_height and img_width
img_height = 180
img_width = 180

# Define the data augmentation pipeline
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal",
                      input_shape=(img_height,
                                   img_width,
                                   3)),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
])


We Visualize a few augmented examples by applying data augmentation to the same image several times:

Training the model for 10 epochs with the Keras `Model.fit` method:

## Dropout

Another technique to reduce overfitting is to introduce [dropout](https://developers.google.com/machine-learning/glossary#dropout_regularization) regularization to the network.

When we apply dropout to a layer, it randomly drops out (by setting the activation to zero) a number of output units from the layer during the training process. Dropout takes a fractional number as its input value, in the form such as 0.1, 0.2, 0.4, etc. This means dropping out 10%, 20% or 40% of the output units randomly from the applied layer.

Create a new neural network with `tf.keras.layers.Dropout` before training it using the augmented images:

### Train the model

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras import layers, Input
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix

# Define class names
class_names = ['Bacterialblight', 'Blast', 'Brownspot', 'Tungro']
num_classes = len(class_names)

# ================== Enhanced Data Pipeline ==================
# Data augmentation and preprocessing
train_datagen = ImageDataGenerator(
    preprocessing_function=tf.keras.applications.resnet50.preprocess_input,
    rotation_range=30,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(
    preprocessing_function=tf.keras.applications.resnet50.preprocess_input
)

# Load datasets (ensure paths are correct)
train_ds = train_datagen.flow_from_directory(
   '/content/drive/MyDrive/output/train',
    target_size=(180, 180),
    batch_size=16,  # Reduced batch size for better stability
    class_mode='sparse'
)

val_ds = val_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/val',
    target_size=(180, 180),
    batch_size=16,
    class_mode='sparse'
)

test_ds = val_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/test',
    target_size=(180, 180),
    batch_size=16,
    class_mode='sparse',
    shuffle=False
)


# Enhanced Model Architecture with Regularization
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(180, 180, 3))

# Strategic Fine-tuning (reduced trainable layers)
for layer in base_model.layers:
    layer.trainable = False
for layer in base_model.layers[-8:]:  # Reduced from 10 to 8 layers
    layer.trainable = True

# Added L2 Regularization and increased Dropout
inputs = Input(shape=(180, 180, 3))
x = base_model(inputs)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, kernel_regularizer='l2')(x)  # Added L2 regularization
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dropout(0.5)(x)  # Increased dropout
x = layers.Dense(256, kernel_regularizer='l2')(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Dropout(0.4)(x)
outputs = layers.Dense(4, activation='softmax')(x)

model = Model(inputs, outputs)

# Optimized Training Configuration
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)  # Reduced initial LR
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Enhanced Callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_accuracy',
        patience=5,  # Original was 6
        min_delta=0.001,
        restore_best_weights=True
    ),
     keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,  # More aggressive reduction
        patience=2,
        min_lr=1e-6
    )
]

# Model Training with Visualization
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=25,  # Increased ceiling but early stopping will intervene
    callbacks=callbacks
)

# Visualization of Training Metrics
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Curves')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy Curves')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()

plt.tight_layout()
plt.savefig('training_metrics.png')
plt.show()



# Save the model directly to a folder in Google Drive
model.save('/content/drive/My Drive/trained_model.keras')


# Evaluation and testing

In [None]:
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns # Import the seaborn library

# Load model and test data
model = tf.keras.models.load_model('/content/drive/My Drive/trained_model.keras')

# Assuming 'val_datagen' is your ImageDataGenerator for the test set
test_ds = val_datagen.flow_from_directory(
    '/content/drive/MyDrive/output/test',  # Replace with the correct path
    target_size=(180, 180),
    batch_size=16,
    class_mode='sparse',
    shuffle=False
)
# Evaluation
test_loss, test_acc = model.evaluate(test_ds)
print(f"\nTest Accuracy: {test_acc:.2%}")

# Detailed Reporting
y_true = test_ds.classes
y_pred = np.argmax(model.predict(test_ds), axis=1)

print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=class_names))

# Confusion Matrix Visualization
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names) # Now sns is defined and can be used
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

We will add data augmentation to your model before training in the next step.

### Compile the model

For this tutorial, choose the `tf.keras.optimizers.Adam` optimizer and `tf.keras.losses.SparseCategoricalCrossentropy` loss function. To view training and validation accuracy for each training epoch, pass the `metrics` argument to `Model.compile`.

## Results



## This part is for classification and Recommendation

In [None]:
import cv2
import numpy as np
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

def preprocess_image(img_path):
    """Preprocess the image for classification by resizing and normalizing it."""
    img = image.load_img(img_path, target_size=(180, 180))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    return preprocess_input(img_array)

def classify_disease(model, img_path, class_names):
    """Classify the disease using the provided pre-trained model."""
    img_array = preprocess_image(img_path)
    predictions = model.predict(img_array)
    confidence = np.max(predictions) * 100  # Convert to percentage
    predicted_class = class_names[np.argmax(predictions)]
    return predicted_class, confidence

def analyze_severity(img_path):
    """Analyze disease severity based on color analysis in HSV space."""
    # Read and resize the image
    img = cv2.imread(img_path)
    img = cv2.resize(img, (180, 180))
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

    # Define color ranges for infected areas (adjust these as needed)
    lower_brown = np.array([10, 50, 50])
    upper_brown = np.array([30, 255, 255])
    lower_black = np.array([0, 0, 0])
    upper_black = np.array([180, 255, 50])
    lower_yellow = np.array([20, 100, 100])
    upper_yellow = np.array([40, 255, 255])

    # Create masks for infected areas
    brown_mask = cv2.inRange(hsv, lower_brown, upper_brown)
    black_mask = cv2.inRange(hsv, lower_black, upper_black)
    yellow_mask = cv2.inRange(hsv, lower_yellow, upper_yellow)

    # Combine masks
    disease_mask = brown_mask + black_mask + yellow_mask

    # Calculate severity percentage
    total_pixels = img.shape[0] * img.shape[1]
    infected_pixels = np.sum(disease_mask > 0)
    severity_percentage = (infected_pixels / total_pixels) * 100

    # Categorize severity
    if severity_percentage < 20:
        severity = "Low"
    elif severity_percentage < 50:
        severity = "Medium"
    else:
        severity = "High"

    return severity, severity_percentage

def get_disease_recommendations(disease, severity):
    """Provide recommendations based on disease and severity."""
    recommendations = {
        "Tungro": {
            "Low": ["High humidity (>80%)", "Monitor fields regularly", "Early/Potential"],
            "Medium": ["Temperature: 25-28°C", "Remove infected plants", "Developing"],
            "High": ["Asynchronous planting", "Implement strict vector control", "Advanced"]
        },
        "Bacterialblight": {
            "Low": ["High humidity (>80%)", "Use disease-free seeds", "Early/Potential"],
            "Medium": ["Rainfall or irrigation", "Apply balanced fertilizers", "Developing"],
            "High": ["Flooding conditions", "Use resistant varieties", "Advanced"]
        },
        "Blast": {
            "Low": ["High humidity (>90%)", "Monitor fields regularly", "Early/Potential"],
            "Medium": ["Cloudy skies", "Avoid excessive nitrogen", "Developing"],
            "High": ["Drought stress", "Apply fungicides at boot stage", "Advanced"]
        },
        "Brownspot": {
            "Low": ["High humidity (86-100%)", "Use certified seeds", "Early/Potential"],
            "Medium": ["Prolonged leaf wetness", "Apply balanced fertilizers", "Developing"],
            "High": ["Water stress", "Apply fungicides", "Advanced"]
        }
    }
    # Default response if disease or severity isn’t found
    return recommendations.get(disease, {}).get(severity, ["No data", "No data", "No data"])

# Main execution block
if __name__ == "__main__":
    # Define class names for your model
    class_names = ['Bacterialblight', 'Blast', 'Brownspot', 'Tungro']

    # Specify your image path (update this to your image location)
    img_path = '/content/training_metrics.png'

    model = tf.keras.models.load_model('/content/drive/My Drive/trained_model.keras')
    # Classify the disease
    disease, confidence = classify_disease(model, img_path, class_names)

    # Analyze severity
    severity, severity_percentage = analyze_severity(img_path)

    # Get recommendations
    recommendations = get_disease_recommendations(disease, severity)

    # Display results
    print(f"Predicted Disease: {disease} with confidence {confidence:.2f}%")
    print(f"Severity: {severity}")
    print(f"Affected Area: {severity_percentage:.2f}%")
    print(f"Conditions favoring the disease: {recommendations[0]}")
    print(f"Disease Stage: {recommendations[2]}")
    print(f"Recommended Measures: {recommendations[1]}")