## Load and Inspect Apple Image Dataset
Step 1: Define the Goal

In this step, we load a small, controlled image dataset containing only apples.
The aim is to:

Verify the dataset structure

Ensure Keras can infer labels correctly

Prepare the pipeline for incremental expansion (more fruits, more categories)

At this stage, no model is trained yet.

## Setup and Prepare Dataset

We will train a CNN to classify **Apple vs Banana** using Keras. First, we need to load the images from the directories and normalize them for training. Each subfolder represents a class.

In [16]:
# Step 1: Imports and setup
import tensorflow as tf
from tensorflow import keras
import os

# Dataset paths
DATA_DIR = r"C:\coding5final\coding5\data\images\fruit_recognition"
IMAGE_SIZE = (150, 150)
BATCH_SIZE = 8


# Step 2: Load and Normalize Images

Use `image_dataset_from_directory` to load images automatically. Keras will infer the labels from folder names. We will normalize the images to the `[0, 1]` range.


In [18]:
# Step 2: Load datasets
train_dataset = keras.preprocessing.image_dataset_from_directory(
    DATA_DIR,
    labels="inferred",
    label_mode="categorical",  # categorical for multi-class
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    validation_split=0.2,
    subset="training",
    seed=42
)

validation_dataset = keras.preprocessing.image_dataset_from_directory(
    DATA_DIR,
    labels="inferred",
    label_mode="categorical",
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    validation_split=0.2,
    subset="validation",
    seed=42
)

# Save class names and number of classes
class_names = train_dataset.class_names
NUM_CLASSES = len(class_names)
print(f"Classes: {class_names}, Number of classes: {NUM_CLASSES}")

# Normalize datasets
normalization_layer = keras.layers.Rescaling(1.0 / 255)
train_dataset = train_dataset.map(lambda x, y: (normalization_layer(x), y))
validation_dataset = validation_dataset.map(lambda x, y: (normalization_layer(x), y))


Found 200 files belonging to 2 classes.
Using 160 files for training.
Found 200 files belonging to 2 classes.
Using 40 files for validation.
Classes: ['apple', 'bannana'], Number of classes: 2


# Step 3: Define the CNN Model

We define a simple CNN for 2-class classification. The network consists of convolutional layers, max-pooling layers, dropout for regularization, and a dense softmax output layer.


In [19]:
# Step 3: Define CNN
IMAGE_SHAPE = (150, 150, 3)

model = keras.Sequential([
    keras.Input(shape=IMAGE_SHAPE),
    
    keras.layers.Conv2D(32, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    
    keras.layers.Conv2D(128, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    
    keras.layers.Dense(NUM_CLASSES, activation='softmax')  # softmax for multi-class
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Model summary
model.summary()


# Step 4: Train the CNN Model

We train the model using the Apple + Banana dataset. Validation accuracy will give insight into how well the model generalizes.


In [20]:
# Step 4: Train the model
EPOCHS = 15

history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=EPOCHS
)

# Evaluate model
val_loss, val_accuracy = model.evaluate(validation_dataset)
print(f"Validation Accuracy: {val_accuracy*100:.2f}%")

# Save the trained model
model_save_path = r"C:\coding5final\coding5\data\processed\apple_banana_cnn_model.keras"
model.save(model_save_path)
print(f"Trained model saved at: {model_save_path}")


Epoch 1/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 78ms/step - accuracy: 0.6750 - loss: 0.5788 - val_accuracy: 0.8750 - val_loss: 0.3214
Epoch 2/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 79ms/step - accuracy: 0.9187 - loss: 0.1865 - val_accuracy: 0.9500 - val_loss: 0.1465
Epoch 3/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 75ms/step - accuracy: 0.9688 - loss: 0.0755 - val_accuracy: 1.0000 - val_loss: 0.0424
Epoch 4/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 87ms/step - accuracy: 0.9937 - loss: 0.0231 - val_accuracy: 1.0000 - val_loss: 0.0279
Epoch 5/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 78ms/step - accuracy: 1.0000 - loss: 0.0125 - val_accuracy: 1.0000 - val_loss: 0.0124
Epoch 6/15
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 71ms/step - accuracy: 1.0000 - loss: 0.0090 - val_accuracy: 1.0000 - val_loss: 0.0059
Epoch 7/15
[1m20/20[0m [32m━━━━

## Trained on 40 images per data type 
Epoch 1/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 2s 94ms/step - accuracy: 0.5156 - loss: 0.7120 - val_accuracy: 0.5000 - val_loss: 0.6479
Epoch 2/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 97ms/step - accuracy: 0.8750 - loss: 0.5585 - val_accuracy: 0.7500 - val_loss: 0.4844
Epoch 3/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 84ms/step - accuracy: 0.8594 - loss: 0.3385 - val_accuracy: 1.0000 - val_loss: 0.1396
Epoch 4/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 88ms/step - accuracy: 0.9688 - loss: 0.1534 - val_accuracy: 1.0000 - val_loss: 0.0477
Epoch 5/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - accuracy: 0.9844 - loss: 0.0710 - val_accuracy: 1.0000 - val_loss: 0.0460
Epoch 6/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 92ms/step - accuracy: 0.9844 - loss: 0.0482 - val_accuracy: 1.0000 - val_loss: 0.0058
Epoch 7/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 69ms/step - accuracy: 1.0000 - loss: 0.0103 - val_accuracy: 1.0000 - val_loss: 0.0017
Epoch 8/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 65ms/step - accuracy: 1.0000 - loss: 0.0012 - val_accuracy: 1.0000 - val_loss: 0.0035
Epoch 9/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 67ms/step - accuracy: 1.0000 - loss: 0.0015 - val_accuracy: 1.0000 - val_loss: 0.0018
Epoch 10/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 94ms/step - accuracy: 1.0000 - loss: 3.3300e-04 - val_accuracy: 1.0000 - val_loss: 6.7298e-04
Epoch 11/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 92ms/step - accuracy: 1.0000 - loss: 2.6332e-04 - val_accuracy: 1.0000 - val_loss: 4.7121e-04
Epoch 12/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 77ms/step - accuracy: 1.0000 - loss: 1.8015e-04 - val_accuracy: 1.0000 - val_loss: 3.5924e-04
Epoch 13/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 89ms/step - accuracy: 1.0000 - loss: 1.7802e-04 - val_accuracy: 1.0000 - val_loss: 2.7370e-04
Epoch 14/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 67ms/step - accuracy: 1.0000 - loss: 1.7219e-04 - val_accuracy: 1.0000 - val_loss: 2.6784e-04
Epoch 15/15
8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 85ms/step - accuracy: 1.0000 - loss: 1.1774e-04 - val_accuracy: 1.0000 - val_loss: 3.0617e-04
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 1.0000 - loss: 3.0617e-04
Validation Accuracy: 100.00%
Trained model saved at: C:\coding5final\coding5\data\processed\apple_banana_cnn_model.keras

## Observations

Training accuracy is hitting 100% quickly

Epoch 3–4: validation accuracy is already 1.0.

This happens because your dataset is very small (only >40 images per class) and simple. The model can memorize the training images.

Loss dropping to almost zero

Again, because the model is overfitting: it “learned” the training images perfectly.

Validation accuracy = 100%

This looks great, but with such a small dataset, it’s not a reliable measure, your model may fail on new/unseen images.

What this means

The training process is working technically, but the model is overfitting heavily, it memorizes the tiny dataset rather than learning general features.

### Dataset Expansion

The dataset has now been expanded to include **100 images per class**, more than doubling the size of the training data compared to the initial setup.  

This increase in data will help the model:  

- Learn more robust features for each fruit type  
- Reduce overfitting to the small initial sample  
- Improve generalization to unseen images during inference  

With more varied examples, the model should now be better equipped to classify new images accurately.


### Step 5: Test Model with New Images

We can input any image path and the model will predict the class along with its confidence. This allows testing on images outside the dataset.


In [27]:
import numpy as np
from tensorflow import keras

# Make sure IMAGE_SIZE matches what your model was trained on
IMAGE_SIZE = (150, 150)

def predict_image(model, img_path, class_names):
    # Load and preprocess the image
    img = keras.preprocessing.image.load_img(img_path, target_size=IMAGE_SIZE)
    img_array = keras.preprocessing.image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)  # make batch
    img_array = img_array / 255.0  # normalize

    # Get predictions
    predictions = model.predict(img_array)
    class_index = np.argmax(predictions, axis=1)[0]
    predicted_class = class_names[class_index]

    print(f"Predicted class: {predicted_class}")

# Example usage
img_path = input("Enter image path to classify (any fruit or vegetable): ")
predict_image(model, img_path, class_names)



Enter image path to classify (any fruit or vegetable):  C:\coding5final\coding5\data\outsidedata\appleredyellowgreen.jpg


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step
Predicted class: bannana


## Bannana test
Given a picture of a bannana on a wooden board (unlike the test data), the model was accurate.

    Enter image path to classify (any fruit or vegetable):  C:\coding5final\coding5\data\outsidedata\banana-7h4m9.webp
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 88ms/step
Predicted class: bannana

## Apple test
Given a picture of an apple on a white background (again, unlike the test data), model was accurate 

Enter image path to classify (any fruit or vegetable):  C:\coding5final\coding5\data\outsidedata\Apple.webp

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step

Predicted class: apple

## Yellow, green and red apple test
Given a more confusing dataset, it got it wrong! Assuming it's based it on colour! 


Enter image path to classify (any fruit or vegetable):  C:\coding5final\coding5\data\outsidedata\appleredyellowgreen.jpg

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 63ms/step

Predicted class: bannana
