# Classification Notebook: Detecting Grapes and Classifying by Sugar Level Using Living Optics Export Reader
This example aims to experiment with various machine learning models based on .lo data and perform and classification.

## Goals
- **Classification**: Distinguishing *grapes* from *non-grape* objects based on their spectral features.
- **Clustering & Class-based Classification**: Grouping sugar levels into classes using unsupervised learning and classifying spectral data accordingly.

## Steps
- Read the exported group from data analysis tool
- Train regressor / classifier based on divided features & labels
- Compare model performance by visualising the results
- Perform cross-validation check

## 1. Import Libraries and Setup
Import all required libraries and modules for classification and clustering.

In [None]:
# Basic tools
import numpy as np
import matplotlib.pyplot as plt

# Machine learning models and evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.cluster import KMeans

# lo-sdk
from lo.sdk.api.acquisition.data.coordinates import SceneToSpectralIndex

# Dataset loader
from lo_dataset_reader import DatasetReader, rle_to_mask

## 2. Load Dataset
Load the grape dataset which contains annotations, spectral data, and metadata such as sugar content and position.

In [None]:
# Load dataset using the custom DatasetReader class
path = "/path/to/Grapes-Dataset.zip"
reader = DatasetReader(dataset_path=path)

## 3. Metadata Extraction: Sugar Content
Extract sugar content and define a classification system with 4 sugar levels.

In [None]:
sugar_contents = []

for (info, scene, spectra, _), converted_spectra, annotations, *_ in reader:
    for ann in annotations:
        if ann['metadata'] and ann['category_name'] == 'grapes':
            meta = {item['field']: item['value'] for item in ann['metadata']}
            sugar_contents.append(float(meta['sugar-content']))

def sugar_to_class(sugar: float) -> int:
    """
    Convert sugar content to discrete class labels.
    """
    if sugar < 15: return 0
    elif sugar < 17: return 1
    elif sugar < 19: return 2
    else: return 3

## 4. Helper Functions
Utilities for converting labels and masks to proper formats.

In [None]:
def spectra_to_scene(mask, sc):
    """
    Convert a binary mask and sampling coordinates into a list of spectral indexes.
    """
    spectral_indexer = SceneToSpectralIndex(info.sampling_coordinates)
    sampling = np.zeros_like(mask)
    sampling[sc[:, 0], sc[:, 1]] = 1
    indexes = spectral_indexer(np.argwhere((mask & sampling)))
    return indexes

def is_grape(label: str) -> int:
    """
    Convert label to binary class (1 if grape, 0 otherwise).
    """
    return 1 if label == 'grapes' else 0


## 5. Train Grape Detection Model (Binary Classification)
Train a binary classifier to detect whether a sample represents grapes or not.

In [None]:
X, y = [], []

for (info, scene, spectra, _), annotations, *_ in reader:
    sc = np.array(info.sampling_coordinates, dtype=int)
    for ann in annotations:
        mask = rle_to_mask(ann['segmentation'], scene.shape)
        indexes = spectra_to_scene(mask, sc)
        mean_spectrum = spectra[indexes].mean(axis=0)
        X.append(mean_spectrum)
        y.append(is_grape(ann['extern']['category']))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
grape_model = RandomForestClassifier()
grape_model.fit(X_train, y_train)
y_pred = grape_model.predict(X_test)
print(classification_report(y_test, y_pred))

## 6. Evaluate via Cross-Validation
Use 5-fold cross-validation to evaluate generalization performance.

In [None]:
scores = cross_val_score(grape_model, X, y, cv=5)
print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean():.4f}")

## 7. Cluster Sugar Content and Classify
Use KMeans clustering on sugar values to generate pseudo-classes and evaluate a classifier's ability to distinguish between sugar levels.

In [None]:
# Visualise distribution of sugar content
plt.hist(sugar_contents, bins=5)
plt.title('Distribution of Sugar Content')
plt.xlabel('Sugar (Brix)')
plt.ylabel('Count')
plt.show()

# Apply k-means clustering to define sugar classes
sugar_array = np.array(sugar_contents).reshape(-1, 1)
kmeans = KMeans(n_clusters=4, random_state=42)
labels = kmeans.fit_predict(sugar_array)

# Train classifier on clustered labels
X_train, X_test, y_train, y_test = train_test_split(sugar_array, labels, test_size=0.2, stratify=labels, random_state=42)
sugar_model = RandomForestClassifier()
sugar_model.fit(X_train, y_train)
y_pred = sugar_model.predict(X_test)
print(classification_report(y_test, y_pred))

# Cross-validation
scores = cross_val_score(sugar_model, sugar_array, labels, cv=5)
print(f"Cross-validation score: {scores}")
print(f"Mean accuracy: {scores.mean():.4f}")


## Summary
- Extracted both positional and spectral features from *.lo* data
- Successfully trained binary classifiers to detect *grapes* vs. *non-grapes*
- Clustered sugar content and evaluated class-based classifiers
- Visualised predictions and performance metrics in detail