# 🧠 Chapter 14: Semantic Segmentation with Machine Learning

We can classify points into categories (Ground, Building, Vegetation, Car) using Supervised Machine Learning. We define features (Geometric + Color) and train a classifier.

**Workflow:**
1.  **Feature Extraction**: Compute features like Planarity, Sphericity, Verticality, etc.
2.  **Training**: Train a Random Forest classifier on a labeled dataset.
3.  **Prediction**: Predict classes on a test set.
4.  **Evaluation**: Check performance (Accuracy, IoU).

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns

# Set seeds for reproducibility
np.random.seed(42)

## 1. Load Data

We use a pre-computed dataset where features (geometric descriptors) are already calculated.
Columns likely include: X, Y, Z, R, G, B, ... GeometricFeatures ... Classification

In [None]:
data_path = "../DATA/3DML_urban_point_cloud.xyz"

# Read csv-like file (space delimiter)
try:
    df = pd.read_csv(data_path, delimiter=' ')
    print("Original Data:")
    print(df.head())
    print(f"Total points: {len(df)}")
    
    # Check for NaNs
    df.dropna(inplace=True)
    
except Exception as e:
    print(f"Error loading data: {e}")
    # Dummy dataframe
    df = pd.DataFrame(np.random.rand(100, 10), columns=['X','Y','Z','R','G','B','Linearity','Planarity','Scattering','Classification'])

## 2. Prepare Training Data

We select input features (X) and target labels (y).

In [None]:
# Define features to use. Often raw XYZ are NOT good features (we want invariance to position), 
# but Z (relative height) can be useful.
feature_cols = ['R', 'G', 'B', 'Z'] 
# Ideally add geometric features if available in the CSV (e.g., 'Linearity', 'Planarity'). 
# Let's check available columns
available_cols = df.columns.tolist()
print("Available columns:", available_cols)

# Automatically select likely geometric features
geo_features = [c for c in available_cols if c not in ['X', 'Y', 'Z', 'R', 'G', 'B', 'Classification', 'Label']]
feature_cols += geo_features

print("Using features:", feature_cols)

X = df[feature_cols]
y = df['Classification']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## 3. Train Classifier (Random Forest)

Random Forest is robust and handles non-linear relationships well.

In [None]:
clf = RandomForestClassifier(n_estimators=50, random_state=42, n_jobs=-1)
print("Training Random Forest...")
clf.fit(X_train, y_train)
print("Training done.")

## 4. Evaluation

We predict on the test set and calculate metrics.

In [None]:
y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))

# Feature Importance
importances = clf.feature_importances_
indices = np.argsort(importances)[::-1]

plt.figure(figsize=(10, 6))
plt.title("Feature Importances")
plt.bar(range(X.shape[1]), importances[indices], align="center")
plt.xticks(range(X.shape[1]), [feature_cols[i] for i in indices], rotation=90)
plt.tight_layout()
plt.show()

## 5. Visualize Results in 3D

We visualize the test set with predicted labels.

In [None]:
import open3d as o3d

# Reconstruct the test point cloud
# We need X, Y, Z from the original separate dataframe or split index
test_indices = X_test.index
pcd_test = o3d.geometry.PointCloud()
pcd_test.points = o3d.utility.Vector3dVector(df.loc[test_indices, ['X', 'Y', 'Z']].values)

# Color by predicted label
# Map labels to colors (assuming generic labels 0, 1, 2...)
unique_labels = np.unique(y_pred)
colors = plt.get_cmap("tab10")(y_pred / (max(unique_labels) if max(unique_labels)>0 else 1))

pcd_test.colors = o3d.utility.Vector3dVector(colors[:, :3])
o3d.visualization.draw_geometries([pcd_test], window_name="Predicted Labels")