# 📈 Chapter 3: 3D Data Processing with Python

Welcome to Chapter 3! In this notebook, we will establish the foundation for 3D Data Science: **Numpy**. We will learn how to create, load, manipulate, and visualize 3D point clouds efficiently.

**Core Concepts:**
1.  **Numpy Arrays**: The standard for 3D coordinate storage $(x, y, z)$.
2.  **I/O**: Reading `.xyz` files.
3.  **Spatial Queries**: Filtering points based on geometric properties.
4.  **Visualization**: Examining point clouds.

## 1. Setting up the Environment

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

# Optional: Pandas can be useful for tabular data
import pandas as pd

%matplotlib inline

## 2. Numpy for 3D Points

A point cloud is essentially a list of $(x, y, z)$ coordinates. In Numpy, this represents a matrix of shape $(N, 3)$, where $N$ is the number of points.

In [None]:
# Manually defining a small point cloud
point_cloud = np.array([
    [1, 2, 3], 
    [4, 5, 6], 
    [7, 8, 9]
])

print("Point Cloud:\n", point_cloud)
print("Shape:", point_cloud.shape)

## 3. Generating Synthetic Data

Before we touch real data, let's generate a random cube of points. This is useful for testing algorithms.

In [None]:
# Generate 1000 points with random coordinates between 0 and 1
n_points = 1000
random_points = np.random.rand(n_points, 3)

print(f"Generated {n_points} random points.")

# Visualize
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(random_points[:,0], random_points[:,1], random_points[:,2], s=5)
ax.set_title("Random Point Cloud")
plt.show()

## 4. Inspecting Real 3D Data

We will load a sample `.xyz` file. XYZ files are simple text files where each line represents a point.

In [None]:
# Define path to data
file_data_path = "../DATA/sample.xyz"

# Load data using numpy (fast and simple for text files)
# We limit to 100,000 points to keep things fast for this demo
point_cloud_data = np.loadtxt(file_data_path, skiprows=1, max_rows=100000)

print(f"Loaded point cloud with {point_cloud_data.shape[0]} points.")

### 4.1 Data Structure Analysis
The loaded data usually contains spatial coordinates $(X, Y, Z)$ and often color information $(R, G, B)$.

In [None]:
# Separate coordinates and colors
# Assuming the columns are X, Y, Z, R, G, B
xyz = point_cloud_data[:, :3]
rgb = point_cloud_data[:, 3:]

# Calculate the mean height (Z axis)
mean_z = np.mean(xyz[:, 2])
print(f"Mean Height (Z): {mean_z:.2f}")

## 5. Spatial Querying

One of the most powerful features of Numpy is **boolean indexing**. We can filter points that meet specific criteria.

**Goal:** Extract a slice of the point cloud around the mean height.

In [None]:
# Create a mask: True for points where Z is within 1 unit of the mean
mask = np.abs(xyz[:, 2] - mean_z) < 1.0

# Apply the mask to get filtered points
filtered_xyz = xyz[mask]
filtered_rgb = rgb[mask]

print(f"Filtered down to {filtered_xyz.shape[0]} points (from {xyz.shape[0]}).")

## 6. Visualization

Let's visualize the slice we just extracted.

In [None]:
fig = plt.figure(figsize=(10, 8))
ax = plt.axes(projection='3d')

# Scatter plot
# Scale RGB to 0-1 range for matplotlib
ax.scatter(filtered_xyz[:,0], filtered_xyz[:,1], filtered_xyz[:,2], c=filtered_rgb/255.0, s=0.1)

ax.set_title("Sliced Point Cloud (Mean Z ± 1.0)")
plt.show()