# Fruit Classifier Model Training

This notebook trains a machine learning model to classify different types of fruits based on their images. It begins by loading fruit images from a specified directory, extracting the mean RGB color values from each image as features, and associating these features with the corresponding fruit label. Using this data, the notebook trains a Random Forest Classifier to recognize fruits, then saves the trained model as `fruit_classifier_model.pkl` for future use, allowing for quick fruit recognition in applications without needing to retrain the model each time.

In [11]:
# Importing libraries

import numpy as np  # We are bringing in a library called "numpy," which helps with numbers and large lists.
import os  # This library helps us look through folders and files on our computer.
import cv2  # We’re using this library to work with images, like reading and changing colors.
import joblib  # This library helps us save our trained model so we can use it later without training again.
from sklearn.ensemble import RandomForestClassifier  # We are importing a tool to make decisions, like guessing which fruit is in a picture.
from sklearn.model_selection import train_test_split  # This helps us split data into training and testing groups.
from sklearn.metrics import accuracy_score  # This will help us check how well our model is performing.

In [7]:
# Directory containing the fruit images
basepath = '../data'  # Here we tell the code where to look for our fruit images on the computer.

In [8]:
# Function to extract features from images (mean RGB)
def extract_features_from_images(basepath):  # We are creating a function to get important information from each image.
    data = []  # This list will hold the information (features) from each image.
    labels = []  # This list will hold the name of the fruit for each image.
    
    for folder in os.scandir(basepath):  # We go through each folder in the main folder.
        if folder.is_dir():  # We check if this is a folder.
            fruit_name = folder.name  # We get the folder's name, which tells us the fruit's name.
            for file in os.scandir(folder.path):  # Now, we go through each file in this fruit’s folder.
                if file.is_file() and file.name.lower().endswith(('.png', '.jpg', '.jpeg')):  
                    # We check if the file is a picture by its name ending (like .jpg).
                    
                    # Read image
                    img = cv2.imread(file.path)  # Open the picture and read it as a computer sees it.
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Change the colors from BGR (Blue-Green-Red) to RGB (Red-Green-Blue) for consistency.
                    
                    # Extract mean RGB values as features
                    features = np.mean(img, axis=(0, 1)).tolist()  # We find the average color values (R, G, B) across the whole image.
                    data.append(features)  # Add this color information to our list.
                    labels.append(fruit_name)  # Add the fruit name to our labels list.
    
    return np.array(data), np.array(labels)  # Finally, we return the lists of color data and labels as arrays.

In [12]:
def train_and_save_model():
    # Extract features and labels from images
    data, labels = extract_features_from_images(basepath)
    
    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)
    
    # Train the RandomForest model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)  # Train the model with the training data
    
    # Save the trained model
    joblib.dump(model, 'fruit_classifier_model.pkl')
    print("Model training complete and saved as 'fruit_classifier_model.pkl'.")
    
    # Predict the labels for the test data
    y_pred = model.predict(X_test)  # Get the predicted labels for the test data
    
    # Calculate the accuracy of the model
    accuracy = accuracy_score(y_test, y_pred)  # Compare the predicted labels with the true labels
    print(f"Model accuracy on the test data: {accuracy * 100:.2f}%")  # Display the accuracy as a percentage


In [16]:
# Train and save the model
train_and_save_model()  # Finally, we run our function to start the training and saving process.



Model training complete and saved as 'fruit_classifier_model.pkl'.
Model accuracy on the test data: 75.00%
