# 📊 Chapter 8: Exploratory Data Analysis (EDA) for 3D

Before diving into machine learning, we must understand our data. EDA involves analyzing distributions, correlations, and geometric properties to detect patterns and anomalies.

**Objectives:**
1.  **Statistical Profiling**: Using Pandas to summarize large 3D feature sets.
2.  **Geometric Analysis**: Inspecting bounding boxes and convex hulls.
3.  **Visualization**: Creating Histograms, Boxplots, and Correlation Heatmaps for 3D attributes.

In [None]:
import numpy as np
import pandas as pd
import open3d as o3d
import matplotlib.pyplot as plt
import seaborn as sns

# Styling
plt.style.use('bmh')
plt.rcParams['figure.dpi'] = 100

## 1. Loading Feature Data

We often work with CSV files where each row is a point and columns are pre-computed features (Geometric, Color, etc.).

In [None]:
file_path = "../DATA/verviers_features.csv"

try:
    # Load CSV (delimiter is space based on original file)
    df = pd.read_csv(file_path, delimiter=' ')

    print("Dataset Shape:", df.shape)
    print("\nFirst 5 rows:")
    display(df.head())
    
    print("\nStatistical Summary:")
    display(df.describe())
except FileNotFoundError:
    print(f"⚠️ Error: {file_path} not found.")
    # Create dummy data if file missing for demonstration
    df = pd.DataFrame(np.random.rand(100, 6), columns=['X', 'Y', 'Z', 'R', 'G', 'B'])

## 2. Visualizing Distributions

Let's inspect the distribution of a specific feature using Histograms and Boxplots.

In [None]:
def plot_distribution(data, column_name):
    plt.figure(figsize=(10, 4))
    
    # Histogram
    plt.subplot(1, 2, 1)
    sns.histplot(data[column_name], kde=True, color='skyblue')
    plt.title(f'Distribution of {column_name}')

    # Boxplot
    plt.subplot(1, 2, 2)
    sns.boxplot(y=data[column_name], color='lightgreen')
    plt.title(f'Boxplot of {column_name}')
    
    plt.tight_layout()
    plt.show()

# Check 'Z' distribution (Height)
plot_distribution(df, 'Z')

# Check 'Planarity' if it exists, otherwise R
target_col = 'Planarity_(0.1)' if 'Planarity_(0.1)' in df.columns else 'R'
plot_distribution(df, target_col)

## 3. Correlation Analysis

Are features correlated? For example, is Red color correlated with Height?

In [None]:
# Select a subset of numeric columns
cols = ['X', 'Y', 'Z', 'R', 'G', 'B']
if 'Planarity_(0.1)' in df.columns:
    cols.append('Planarity_(0.1)')

corr_matrix = df[cols].corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Feature Correlation Matrix")
plt.show()

## 4. Geometric Inspection

We can convert the dataframe back to an Open3D point cloud to visualize Bounding Boxes and Hulls.

In [None]:
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(df[['X', 'Y', 'Z']].values)
if {'R', 'G', 'B'}.issubset(df.columns):
    # Normalize colors if mostly > 1
    if df['R'].max() > 1.0:
        pcd.colors = o3d.utility.Vector3dVector(df[['R', 'G', 'B']].values / 255.0)
    else:
        pcd.colors = o3d.utility.Vector3dVector(df[['R', 'G', 'B']].values)

# 1. Axis Aligned Bounding Box (AABB)
aabb = pcd.get_axis_aligned_bounding_box()
aabb.color = (1, 0, 0) # Red

# 2. Minimal Oriented Bounding Box (OBB)
obb = pcd.get_minimal_oriented_bounding_box()
obb.color = (0, 1, 0) # Green

print(f"AABB Volume: {aabb.volume():.2f}")
print(f"OBB Volume: {obb.volume():.2f}")

# Visualize
o3d.visualization.draw_geometries([pcd, aabb, obb], window_name="Geometric Analysis")