# LiDAR Panoptic Segmentation Demo

This notebook demonstrates how to use the LiDAR Panoptic Segmentation system for:
1. Configuration and setup
2. Installing MinkowskiEngine on Databricks
3. Running inference on LiDAR point clouds
4. Extracting tree polygons
5. MLflow experiment tracking

**Prerequisites:**
- Azure Databricks 15.4 LTS GPU cluster (single-node)
- Unity Catalog access for data storage
- Azure DevOps PAT for private MinkowskiEngine repository (if applicable)

## 1. Environment Setup

First, let's set up the environment and install MinkowskiEngine.

In [None]:
# Check Python and PyTorch versions
import sys
import torch

print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA capability: {torch.cuda.get_device_capability(0)}")

### 1.1 Install MinkowskiEngine (Databricks Only)

Run this section only on Databricks GPU clusters. This installs MinkowskiEngine from source with CUDA support.

In [None]:
# Get PAT from Azure Key Vault (Databricks secrets)
# Uncomment and modify for your secret scope

# ADO_PAT = dbutils.secrets.get("azure", "ado_pat")
# with open("/tmp/ado_pat", "w") as f:
#     f.write(ADO_PAT)

In [None]:
# Install MinkowskiEngine from source
# This will take several minutes on first run

# For private Azure DevOps repo:
# %sh
# export ADO_PAT=$(cat /tmp/ado_pat)
# export ME_REPO="https://dev.azure.com/org/project/_git/MinkowskiEngine"
# export ME_REF="main"
# bash ./scripts/install_minkowski.sh

# For public GitHub repo:
# %sh
# export ME_REPO="https://github.com/NVIDIA/MinkowskiEngine.git"
# export ME_REF="master"
# bash ./scripts/install_minkowski.sh

In [None]:
# Verify MinkowskiEngine installation
try:
    import MinkowskiEngine as ME
    ME.print_diagnostics()
    print("\nMinkowskiEngine installed successfully!")
except ImportError as e:
    print(f"MinkowskiEngine not available: {e}")
    print("The system will use fallback mode (not recommended for production)")

## 2. Configuration

Load and configure the system using the unified YAML configuration.

In [None]:
from lidar_panoptic_segmentation.config import load_config, validate_config

# Load configuration
config = load_config("../config.yaml")

# Validate configuration
warnings = validate_config(config)
if warnings:
    print("Configuration warnings:")
    for w in warnings:
        print(f"  - {w}")
else:
    print("Configuration validated successfully!")

print(f"\nEnvironment: {config.env.name}")
print(f"Debug mode: {config.env.debug}")

In [None]:
# Override configuration for Databricks
overrides = {
    "env": {
        "name": "databricks",
        "debug": False,
    },
    "paths": {
        "data_root": "abfss://forest-data@yourstorageaccount.dfs.core.windows.net/",
    },
}

# Reload with overrides
# config = load_config("../config.yaml", overrides=overrides)

## 3. Load Model

Load a trained model from MLflow registry or local checkpoint.

In [None]:
from lidar_panoptic_segmentation.model import create_model, load_model

# Create a new model (for demonstration)
model = create_model(config)
print(f"Model created: {type(model).__name__}")
print(f"Number of classes: {model.num_classes}")
print(f"Embedding dimension: {model.embed_dim}")

# Count parameters
n_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {n_params:,}")

In [None]:
# Load from MLflow (when available)
# model = load_model("models:/LidarPanopticSegmentation/latest")

# Or load from local checkpoint
# model = load_model("./models/checkpoint_best.pt", config=config)

## 4. Data Loading

Demonstrate loading point cloud data from Unity Catalog.

In [None]:
from lidar_panoptic_segmentation.dataset import (
    read_point_cloud,
    CloudStorageHandler,
    create_dataloader,
)

# Example: Read a local point cloud
# data = read_point_cloud("./sample_data/sample.las")
# print(f"Loaded {len(data.points)} points")
# print(f"Bounds: {data.bounds}")

In [None]:
# Create synthetic sample data for demonstration
import numpy as np

n_points = 10000

# Create two clusters of points (simulating trees)
cluster1 = np.random.randn(n_points // 2, 3).astype(np.float32)
cluster1[:, 2] = np.abs(cluster1[:, 2]) * 15  # Trees are tall

cluster2 = np.random.randn(n_points // 2, 3).astype(np.float32) + [10, 10, 0]
cluster2[:, 2] = np.abs(cluster2[:, 2]) * 12

sample_points = np.vstack([cluster1, cluster2])
sample_semantic = np.ones(n_points, dtype=np.int64)  # All tree class

print(f"Created sample data: {sample_points.shape}")
print(f"Point cloud bounds: X=[{sample_points[:, 0].min():.1f}, {sample_points[:, 0].max():.1f}]")
print(f"                    Y=[{sample_points[:, 1].min():.1f}, {sample_points[:, 1].max():.1f}]")
print(f"                    Z=[{sample_points[:, 2].min():.1f}, {sample_points[:, 2].max():.1f}]")

## 5. Run Inference

Process point cloud through the model.

In [None]:
import torch

# Prepare input tensors
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Create batched coordinates (batch_idx, x, y, z)
batch_indices = np.zeros((n_points, 1), dtype=np.float32)
coords = np.hstack([batch_indices, sample_points]).astype(np.float32)

# Create features (xyz + intensity placeholder)
intensity = np.random.rand(n_points, 1).astype(np.float32)
features = np.hstack([sample_points, intensity]).astype(np.float32)

# Convert to tensors
coords_tensor = torch.from_numpy(coords).to(device)
features_tensor = torch.from_numpy(features).to(device)

print(f"Input shapes: coords={coords_tensor.shape}, features={features_tensor.shape}")

In [None]:
# Run inference
with torch.no_grad():
    output = model(coords_tensor, features_tensor)

print(f"Semantic predictions: {output.semantic_pred.shape}")
print(f"Unique classes: {torch.unique(output.semantic_pred).cpu().numpy()}")

if output.embedding is not None:
    print(f"Embeddings: {output.embedding.shape}")

if output.offset_pred is not None:
    print(f"Offset predictions: {output.offset_pred.shape}")

## 6. Postprocessing and Polygon Extraction

Convert predictions to tree instances and extract polygons.

In [None]:
from lidar_panoptic_segmentation.postprocess import (
    postprocess_predictions,
    save_geojson,
)

# Get predictions as numpy arrays
semantic_pred = output.semantic_pred.cpu().numpy()
embeddings = output.embedding.cpu().numpy() if output.embedding is not None else None
offset_pred = output.offset_pred.cpu().numpy() if output.offset_pred is not None else None

# Run postprocessing
result = postprocess_predictions(
    points=sample_points,
    semantic_pred=semantic_pred,
    embeddings=embeddings,
    offset_pred=offset_pred,
    config=config,
)

print(f"Found {len(result.instances)} tree instances")
for inst in result.instances:
    print(f"  Tree {inst.instance_id}: {len(inst.points)} points, height={inst.height:.1f}m")

In [None]:
# Save results as GeoJSON
import tempfile
import os

output_dir = tempfile.mkdtemp()
geojson_path = os.path.join(output_dir, "trees.geojson")

save_geojson(result.instances, geojson_path)
print(f"Saved GeoJSON to: {geojson_path}")

# Display GeoJSON content
import json
with open(geojson_path) as f:
    geojson_data = json.load(f)
    
print(f"\nGeoJSON contains {len(geojson_data['features'])} features")
if geojson_data['features']:
    print("\nSample feature properties:")
    print(json.dumps(geojson_data['features'][0]['properties'], indent=2))

## 7. Visualization

In [None]:
import matplotlib.pyplot as plt
from lidar_panoptic_segmentation.utils import colorize_labels, get_color_map

# Colorize instance predictions
colors = colorize_labels(result.instance_pred)

# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# XY view with instance colors
ax1 = axes[0]
ax1.scatter(
    result.points[::10, 0],
    result.points[::10, 1],
    c=colors[::10] / 255.0,
    s=1,
    alpha=0.5,
)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_title('Point Cloud - XY View (Colored by Instance)')
ax1.set_aspect('equal')

# XZ view (side view)
ax2 = axes[1]
ax2.scatter(
    result.points[::10, 0],
    result.points[::10, 2],
    c=colors[::10] / 255.0,
    s=1,
    alpha=0.5,
)
ax2.set_xlabel('X')
ax2.set_ylabel('Z (Height)')
ax2.set_title('Point Cloud - XZ View (Side)')

plt.tight_layout()
plt.show()

In [None]:
# Visualize polygons
fig, ax = plt.subplots(figsize=(10, 10))

for inst in result.instances:
    if inst.polygon is not None:
        x, y = inst.polygon.exterior.xy
        ax.fill(x, y, alpha=0.3, label=f'Tree {inst.instance_id}')
        ax.plot(x, y, linewidth=2)
        ax.scatter(*inst.center[:2], marker='x', s=100, c='red')

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_title('Extracted Tree Crown Polygons')
ax.set_aspect('equal')
ax.legend()
plt.tight_layout()
plt.show()

## 8. MLflow Integration

Track experiments and log models to MLflow.

In [None]:
from lidar_panoptic_segmentation.logging_utils import (
    MLflowLogger,
    experiment_context,
)

# Example: Log experiment metrics
# with experiment_context(config, run_name="demo_run") as exp_logger:
#     exp_logger.log_params(config.to_dict())
#     exp_logger.log_metrics({"n_instances": len(result.instances)})
#     exp_logger.log_artifact(geojson_path)
#     exp_logger.log_model(model, "model")

print("MLflow integration ready!")
print(f"Experiment name: {config.logging.mlflow.experiment_name}")

## 9. Batch Processing

Process multiple files from Unity Catalog.

In [None]:
from lidar_panoptic_segmentation.infer import InferencePipeline

# Create inference pipeline
# pipeline = InferencePipeline(config, model=model)

# Process a directory
# results = pipeline.process_directory(
#     input_dir="abfss://container@account.dfs.core.windows.net/lidar/",
#     output_dir="abfss://container@account.dfs.core.windows.net/predictions/",
# )

print("Batch processing ready!")

## 10. Cleanup

In [None]:
import shutil

# Clean up temporary files
shutil.rmtree(output_dir, ignore_errors=True)

# Clear GPU cache
from lidar_panoptic_segmentation.utils import clear_gpu_cache
clear_gpu_cache()

print("Cleanup complete!")

---

## Summary

This notebook demonstrated:
1. **Environment Setup**: Installing MinkowskiEngine on Databricks
2. **Configuration**: Loading YAML config with Pydantic validation
3. **Model Loading**: Creating and loading panoptic segmentation models
4. **Data Loading**: Reading point clouds from Unity Catalog
5. **Inference**: Running predictions on LiDAR data
6. **Postprocessing**: Extracting tree instances and polygons
7. **Output Formats**: Saving results as GeoJSON
8. **MLflow Integration**: Experiment tracking and model registry
9. **Batch Processing**: Processing multiple files

For more information, see:
- `README.md`: Project overview and setup
- `cluster_config_guidance.md`: Databricks cluster configuration
- `README_Model_Improvements.md`: Model optimization tips