# Healthy Grape Identifier
Completed by Peiqian Chen, Ian Brown and Danny DeMelfi

This project is a machine learning algorithm that uses data that is labelled and trained from scratch, through Mask-RCNN. It finds individual grapes in an image of a grape bunch and determines if they are healthy based on color and shape. If they are too light or too dark, or possibly mishapen and unrecognizable, then they would be bad and should be removed from the bunch. Our algorithm should tell us which grapes are bad based on a training and validation set that we made. This is a supervised model, making use of the Mask-RCNN, a convolutional neural network. 

This project makes use of a pre-trained model for object detection and Mask-RCNN for the creation of boundary boxes. Using validation images, there will be a performance measure using Intersection over Union to test the algorithm and its performance. 

We chose green grapes since they are easy to tell and there are many images to choose from for training purposes. To run, make sure you are using the GPU (runtime -> change runtime type -> GPU)

*The following sections were inspired and uses some of the code and text from:

- Géron, A. (2019) 2nd Ed. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.( ISBN-10: 1491962291) Chapter 10
-Christian Lopez's Github, [NN Regularization with Keras Python Notebook ](https://colab.research.google.com/github/lopezbec/intro_python_notebooks/blob/master/NN_Regularization_with_Keras.ipynb?authuser=1#scrollTo=_HuUEdPgNCSo)

#Setup#

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    !pip install tensorflow
except Exception:
    pass

# TensorFlow ≥2.0 is required
import tensorflow as tf
assert tf.__version__ >= "2.0"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

import time

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)


# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")
import logging

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # FATAL
logging.getLogger('tensorflow').setLevel(logging.FATAL)

from IPython.display import clear_output

!pip install Pillow
from PIL import Image
import imageio

# GPU memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
clear_output()

#More Setup#
The reason we have the next few code blocks is to confirm that we have the right versions, imports, and downloads of:
- Keras 
- Coco
- Mask-RCNN

We ran into a few errors where tensorflow and keras had conflicting versions that would not work with Mask-RCNN and so we had to manually adjust the code for the neural network to allow some excptions while making sure that the versions were correct. 

In [2]:
import os
import tensorflow as tf
!git clone https://github.com/ianlikescoding/Mask_RCNN
os.chdir('./Mask_RCNN')
!python3 setup.py install
!pip install mrcnn
os.chdir('../')
clear_output()

In [3]:
#All to make Keras work with MRCNN 
#!tf_upgrade_v2 --intree Mask_RCNN --inplace --reportfile report.txt
!pip install keras==2.12.0
clear_output()

In [4]:
#Download and install Coco
!git clone https://github.com/waleedka/coco
!make install -C coco/PythonAPI
clear_output()

In [5]:
os.chdir('./Mask_RCNN/')
from Mask_RCNN.mrcnn.config import Config
from Mask_RCNN.mrcnn import model as modellib, utils

#Grape Configuration Class#
Here, we finally start setting up the training. 

First, since we are using a custom data set, we have to define a `GrapeConfig` Class that can be used as a subclass of the R-CNN library `Config` Class. 

The settings are made to correctly confgure a sample dataset by making use of a class object to split the training and validation sets of images for the model. 

We made this decision since we are trianing the model from scratch with data that we labeled and processed manually so this should help the loading and training process.

In [6]:
import os
import sys
import random
import math
import numpy as np
import cv2
import xml.etree.ElementTree as ET
import skimage.draw
from mrcnn.config import Config
from mrcnn import model as modellib, utils

class GrapeConfig(Config):
    """Configuration for training on the grape dataset."""
    # Give the configuration a recognizable name
    NAME = "grape"

    # Train on 1 GPU and 2 images per GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 2

    # Number of classes (including background)
    NUM_CLASSES = 1 + 2  # Background + grape (healthy and unhealthy)

    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 100

    # Use small validation steps since the epoch is small
    VALIDATION_STEPS = 5
    
    # Set the path to the pretrained weights if any
    WEIGHTS_PATH = 'mask_rcnn_coco.h5'

    # We specify the path directories for the training and validation as well. 
    VAL_DATASET_DIR = os.path.join(os.getcwd(), "grape_data/val")
    TRAIN_DATASET_DIR = os.path.join(os.getcwd(), "grape_data/train")

#Grape Dataset Class#

This class defines hwo to load and manipulate data for the grape detection model. 

The first method we have is the `load_dataset` method that is responsible for loading a directory, including images and annotations. It adds two classes for the dataset, "healthy" and "unhealthy" as a way to label the data. Then it adds each image and its annotation to the dataset. 

In [7]:
class GrapeDataset(utils.Dataset):
    def load_dataset(self, dataset_dir, subset):
        #Load a subset of the grape dataset
        # Add classes. We have only two classes: healthy and unhealthy grapes
        self.add_class("grape", 1, "Healthy")
        self.add_class("grape", 2, "Unhealthy")

        # Train or validation dataset?
        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        # Load annotations
        # print(dataset_dir)
        annotations = os.listdir(os.path.join(dataset_dir, 'annotations'))
        annotations = [file for file in annotations if file.endswith('.xml')]

        # Add images and annotations to the dataset
        for i, file in enumerate(annotations):
            image_path = os.path.join(dataset_dir, 'images', file.replace('.xml', '.jpg'))
            annotation_path = os.path.join(dataset_dir, 'annotations', file)
            # print(image_path)
            self.add_image(
                "grape",
                image_id=i,
                path=image_path,
                annotation=annotation_path)


---


Next, we have the `load_mask` method which given an image_id, it reads the annotation for the image. Thsi annotation should scify the location and shape of the bounding boxes of the grapes in the image and it creates a binary mask for each grape instance in the image. 

In [8]:
    def load_mask(self, image_id):
         # If not a grape dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "grape":
            return super(self.__class__, self).load_mask(image_id)

        # Get annotations of the image.
        annotations = self.image_info[image_id]['annotations']
        # Convert XML annotations to bounding boxes
        boxes = []
        for annot in annotations:
            xmin = int(annot['bbox']['xmin'])
            ymin = int(annot['bbox']['ymin'])
            xmax = int(annot['bbox']['xmax'])
            ymax = int(annot['bbox']['ymax'])
            boxes.append([xmin, ymin, xmax, ymax])

        # Load masks of the image
        mask = np.zeros([image_info["height"], image_info["width"], len(annotations)], dtype=np.uint8)
        class_ids = np.zeros(len(annotations), dtype=np.int32)
        for i, annot in enumerate(annotations):
            # Get label of the annotation
            class_name = annot['name']
            if class_name == 'Healthy':
                class_ids[i] = 1
            elif class_name == 'Unhealthy':
                class_ids[i] = 2
            # Draw a binary mask for the annotation
            mask[:, :, i:i+1] = self.draw_mask(image_info["height"], image_info["width"], boxes[i])

        return mask, class_ids

---
Finally, we have the `split_dataset`method that splits the loaded datset into training and validation sets based on a ratio (80/20 for our purposes)

Also, the `load_image` method is self-explanatory and is used to load the image given an image_id. 

In [9]:
    def split_dataset(self, split_ratio=0.8):
        # Split the image IDs into training and validation sets
        image_ids = self.image_ids
        split = int(len(image_ids) * split_ratio)
        train_ids = image_ids[:split]
        val_ids = image_ids[split:]

        # Assign the images to the corresponding sets
        self.train_ids = train_ids
        self.val_ids = val_ids

    def load_image(self, image_id):
        # Return the file path of the specified image.
        return self.image_info[image_id]['path']

#Training the Model#

Now we train the model for the grape detection and segmentation task. 

In [10]:
# Root directory of the project
ROOT_DIR = os.path.abspath("/contnt/")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library

# Initialize the model with the GrapeConfig configuration
model = modellib.MaskRCNN(mode="training", config=GrapeConfig(), model_dir=os.getcwd())

# Load the weights for the model if available
model_path = tf.train.latest_checkpoint('Mask_RCNN/')
if model_path is not None:
    model.load_weights(model_path, by_name=True)

directory = os.path.join(os.getcwd(), "grape_data/")

# Load the dataset
trainingSet = GrapeDataset()
valSet = GrapeDataset()
valSet.load_dataset(directory, "val")
#dataset.load_dataset(GrapeConfig.VAL_DATASET_DIR,'CSV2.csv')
trainingSet.load_dataset(directory , "train")
trainingSet.prepare()

# Train the model
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
model.train(trainingSet, valSet,
            learning_rate=GrapeConfig.LEARNING_RATE,
            epochs=5,
            layers='heads',
            )

# Evaluate the model on the validation set
val_file_paths = [valSet.image_info[id]['path'] for id in valSet.image_ids]
mAP = model.evaluate(val_file_paths, verbose=0)
# print("mAP: %.2f" % (mAP))

# Save the weights of the trained model
model_path = os.path.join(model.model_dir, "mask_rcnn_grape.h5")
model.keras_model.save_weights(model_path)




Starting at epoch 0. LR=0.001

Checkpoint Path: /content/Mask_RCNN/grape20230511T0250/mask_rcnn_grape_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
rpn_model              (Functional)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDis

KeyboardInterrupt: ignored