### Problem Definition

In this project, we build an image classifier that predicts the ripeness level of bananas from a single RGB image.

The model predicts one of three classes:
- **Unripe** - mostly green peel
- **Ripe** - mostly yellow peel, suitable for eating
- **Overripe** - dark spotted/brown peel, very soft and close to or beyond ideal eating point

#### Real-world motivation

Banana ripeness is a practical problem in agriculture, retail and at home. Being able to automatically estimate ripeness from images can help with:

- **Quality control** in supermarkets (sorting bananas into “ready to sell today” vs “too green” vs “too late”).
- **Food waste reduction**, by detecting overripe fruit earlier and discounting or redirecting it to other uses (e.g. baking, smoothies).
- **Assisting consumers** in choosing bananas according to their preference (some people like greener bananas, some prefer very ripe).

In this project we do not try to solve the full industrial problem, but we build a small, reproducible prototype that shows how a deep learning model can classify banana ripeness from images using transfer learning.

#### Expected Challenges

- **Ambiguous boundaries between classes**  
  There is no sharp line between “ripe” and “overripe” – some bananas are in-between, which makes labels somewhat subjective.

- **Lighting and background conditions**  
  Images may be taken under different illumination (indoor, outdoor, shadows, warm/cold light) and on different backgrounds, which changes the perceived color.

- **Multiple bananas in one image**  
  Some images may contain more than one banana with slightly different ripeness levels. The dataset label is still a single class for the whole image.

- **Color similarity between classes**  
  Slightly yellow-green bananas can look similar to ripe ones, and dark-spotted ripe bananas can look similar to overripe ones, which may cause confusion for the model.

- **Dataset shift to real world**  
  The training images are relatively clean and focused on bananas. In real supermarket shelves, bananas might be partially occluded, far from the camera, or mixed with other fruits. The model may not generalize perfectly to such settings.


### Dataset source

We used the public **Banana Classification** dataset from Kaggle, which contains 4 categories: *Unripe*, *Ripe*, and *Rotten*.

Dataset citation:
Jorgusheska, I. (2022). *Banana Ripeness Level Recognition Dataset*  
GitHub repository: https://github.com/IvaJorgusheska/Banana_Ripeness_Level_Recognition

In [None]:
from fastai.vision.all import *
from pathlib import Path

DATA_ROOT = Path("data")
BANANA_PATH = DATA_ROOT / "bananas"

DATA_ROOT, BANANA_PATH

if not BANANA_PATH.exists():
    raise FileNotFoundError(
        f"Didn't find {BANANA_PATH}. "
        "Make sure you have data/bananas/unripe, ripe, rotten inside the repo."
    )

for cls_dir in sorted(BANANA_PATH.iterdir()):
    if cls_dir.is_dir():
        n = len(list(cls_dir.glob("*")))
        print(f"{cls_dir.name}: {n} images")

In [None]:
path = BANANA_PATH

dls = ImageDataLoaders.from_folder(
    path,
    valid_pct=0.2,   # 80% train, 20% validation
    seed=42,
    item_tfms=Resize(224)
)

dls.show_batch(max_n=9, figsize=(6,6))
print("Classes:", dls.vocab)

learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.fine_tune(5)

acc = learn.validate()[1]
print(f"Validation accuracy: {acc:.4f}")

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(4,4))

from random import choice

valid_items = dls.valid.items

for _ in range(5):
    img_path = choice(valid_items)
    img = PILImage.create(img_path)
    pred, pred_idx, probs = learn.predict(img)

    display(img.to_thumb(256, 256))
    print(f"True label    : {img_path.parent.name}")
    print(f"Predicted     : {pred}")
    print(f"Probabilities : {probs}")
    print("-" * 50)

# learn.export("banana_ripeness_resnet18.pkl")
# print("Model exported to banana_ripeness_resnet18.pkl")