# 📊 Chapter 6: Precision 3D - Part 1: PCA Analysis

Welcome to Chapter 6! In this chapter, we dive into the analytical side of 3D Data Science. We will explore **Principal Component Analysis (PCA)** to understand the geometric properties of point neighborhoods.

**Objectives:**
1.  **Load Labeled Data**: Using Pandas to handle `.xyz` files with classification labels.
2.  **Segment Processing**: Grouping points by their label.
3.  **PCA Computation**: Calculating Eigenvalues and Eigenvectors to extract features (Normal, Planarity, Linearity).
4.  **Visualizing Vectors**: Plotting the principal directions of 3D clusters.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time

# Adjust figure resolution for high-quality plots
plt.rcParams['figure.dpi'] = 150

## 1. Load and Inspect Data

We use a dataset (`velodyne_pca.xyz`) that contains labeled LiDAR segments.

In [None]:
data_path = "../DATA/velodyne_pca.xyz"

# Load data. Columns: X, Y, Z, Label
try:
    pcd = pd.read_csv(data_path, delimiter=";", names=['X', 'Y', 'Z', 'LABEL'], header=None)
    pcd['LABEL'] = pcd['LABEL'].astype(int)
    print(f"Loaded {len(pcd)} points.")
    print(f"Unique Clusters: {pcd['LABEL'].nunique()}")
except FileNotFoundError:
    print(f"⚠️ Error: {data_path} not found.")

## 2. Grouping by Segment

We can split the dataframe into groups based on the `LABEL` column. This allows us to process each object (car, pedestrian, sign) independently.

In [None]:
segments = pcd.groupby(['LABEL'])

# Example: Get points for Cluster 3
cluster_id = 3
cluster_data = segments.get_group(cluster_id)[['X', 'Y', 'Z']]
print(f"Cluster {cluster_id} has {len(cluster_data)} points.")

## 3. Principal Component Analysis (PCA)

PCA helps us find the main directions of variance in the data. 
*   **Eigenvalues ($λ_1, λ_2, λ_3$)**: Magnitude of variance along axes.
*   **Eigenvectors ($v_1, v_2, v_3$)**: Direction of axes.

The smallest eigenvector ($v_3$) often represents the **Normal Vector** of a surface.

In [None]:
def compute_pca(points):
    # 1. Center the data
    mean = np.mean(points, axis=0)
    norm_points = points - mean
    
    # 2. Compute Covariance Matrix
    cov_matrix = np.cov(norm_points.T)
    
    # 3. Compute Eigenvalues and Eigenvectors
    eig_val, eig_vec = np.linalg.eig(cov_matrix.T)
    
    # 4. Sort from largest to smallest variance
    sorted_indices = np.argsort(eig_val)[::-1]
    eig_val = eig_val[sorted_indices]
    eig_vec = eig_vec[:, sorted_indices]
    
    return mean, eig_val, eig_vec

# Test on our cluster
mean, vals, vecs = compute_pca(cluster_data)
print("Eigenvalues:\n", vals)
print("Eigenvectors:\n", vecs)

## 4. Visualizing Eigenvectors

Let's create a 3D plot to see how these vectors align with the point cloud.

In [None]:
def plot_pca(points, mean, vecs, title):
    fig = plt.figure(figsize=(8, 8))
    ax = fig.add_subplot(111, projection='3d')
    
    # Plot points
    ax.scatter(points['X'], points['Y'], points['Z'], color='steelblue', alpha=0.2, s=5)
    
    # Plot vectors (scaled for visibility)
    colors = ['r', 'g', 'b'] # 1st, 2nd, 3rd component
    scale = 1.5
    for i in range(3):
        vec = vecs[:, i]
        ax.quiver(mean[0], mean[1], mean[2], vec[0], vec[1], vec[2], color=colors[i], length=scale, linewidth=2, label=f'PC{i+1}')

    # Formatting
    ax.set_title(title)
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Z')
    plt.legend()
    plt.show()

plot_pca(cluster_data, mean, vecs, f"PCA of Cluster {cluster_id}")

## 5. Feature Extraction Loop

We can now iterate through all clusters and calculate these features automatically. This is the basis for machine learning classification on 3D data.

In [None]:
# Initialize new feature columns
pcd['eig_1'], pcd['eig_2'], pcd['eig_3'] = 0.0, 0.0, 0.0
pcd['nx'], pcd['ny'], pcd['nz'] = 0.0, 0.0, 0.0

start_time = time.time()

for label, group in segments:
    points = group[['X', 'Y', 'Z']]
    try:
        m, val, vec = compute_pca(points)
        
        # Store results back into the main dataframe
        idx = group.index
        pcd.loc[idx, ['eig_1', 'eig_2', 'eig_3']] = val
        
        # Store Normal (3rd eigenvector)
        pcd.loc[idx, ['nx', 'ny', 'nz']] = vec[:, 2]
    except Exception as e:
        print(f"Skipping Cluster {label}: {e}")

print(f"Processed {len(segments)} clusters in {time.time() - start_time:.2f} seconds.")
pcd.head()