# F3RM Pointcloud Tools Demo

This notebook demonstrates how to load and use F3RM pointcloud data and semantic similarity tools.

## Prerequisites
1. Export pointcloud data using `export_feature_pointcloud.py`
2. Optionally align using `align_pointcloud.py`


In [None]:
import numpy as np
import open3d as o3d
import json
import pickle
from pathlib import Path
from f3rm.semantic_similarity_utils import SemanticPointcloudAnalyzer
from f3rm.visualize_feature_pointcloud import FeaturePointcloudData, SemanticSimilarityUtils

# Set your data directory
data_dir = Path("exports/your_pointcloud_data")  # Change this to your exported data
print(f"Data directory: {data_dir}")


## 1. Understanding the Data Structure

F3RM pointcloud export creates several files:


In [None]:
# List all files in the exported directory
if data_dir.exists():
    files = list(data_dir.glob("*"))
    for file in sorted(files):
        size_mb = file.stat().st_size / (1024 * 1024)
        print(f"{file.name:<30} {size_mb:>8.1f} MB")
else:
    print(f"Data directory {data_dir} not found. Please export pointcloud data first.")


## 2. Loading Metadata

The metadata contains export information and data structure details:


In [None]:
# Load and examine metadata
with open(data_dir / "metadata.json", 'r') as f:
    metadata = json.load(f)

print("Metadata structure:")
for key, value in metadata.items():
    if isinstance(value, (list, dict)):
        print(f"{key}: {type(value).__name__} with {len(value)} items")
    else:
        print(f"{key}: {value}")


## 3. Loading Point Coordinates

The `points.npy` file contains 3D coordinates:


In [None]:
# Load point coordinates
points = np.load(data_dir / "points.npy")

print(f"Points shape: {points.shape}")
print(f"Points dtype: {points.dtype}")
print(f"Bounding box: [{points.min(axis=0)}] to [{points.max(axis=0)}]")
print(f"Mean position: {points.mean(axis=0)}")


## 4. Loading Feature Vectors

Features are stored compressed (float16) by default:


In [None]:
# Load feature vectors
features_file = metadata['files']['features']
features = np.load(data_dir / features_file)

print(f"Features shape: {features.shape}")
print(f"Features dtype: {features.dtype}")
print(f"Feature range: [{features.min():.3f}, {features.max():.3f}]")
print(f"Memory usage: {features.nbytes / (1024**2):.1f} MB")

# Convert to float32 if needed for processing
if features.dtype == np.float16:
    features = features.astype(np.float32)
    print("Converted to float32 for processing")


## 5. Loading PCA Parameters

PCA parameters ensure consistent feature visualization:


In [None]:
# Load PCA parameters
with open(data_dir / "pca_params.pkl", 'rb') as f:
    pca_params = pickle.load(f)

print("PCA parameters:")
for key, value in pca_params.items():
    if isinstance(value, np.ndarray):
        print(f"{key}: shape {value.shape}, dtype {value.dtype}")
    else:
        print(f"{key}: {value}")


## 6. Loading PLY Pointclouds

RGB and PCA pointclouds are saved as PLY files for visualization:


In [None]:
# Load RGB pointcloud
rgb_pcd = o3d.io.read_point_cloud(str(data_dir / "pointcloud_rgb.ply"))
rgb_points = np.asarray(rgb_pcd.points)
rgb_colors = np.asarray(rgb_pcd.colors)

print(f"RGB pointcloud: {len(rgb_points)} points")
print(f"RGB colors shape: {rgb_colors.shape}, range: [{rgb_colors.min():.3f}, {rgb_colors.max():.3f}]")

# Load PCA pointcloud
pca_pcd = o3d.io.read_point_cloud(str(data_dir / "pointcloud_feature_pca.ply"))
pca_points = np.asarray(pca_pcd.points)
pca_colors = np.asarray(pca_pcd.colors)

print(f"PCA pointcloud: {len(pca_points)} points")
print(f"PCA colors shape: {pca_colors.shape}, range: [{pca_colors.min():.3f}, {pca_colors.max():.3f}]")

# Verify consistency
print(f"Points match: {np.allclose(points, rgb_points)}")
print(f"RGB/PCA points match: {np.allclose(rgb_points, pca_points)}")


## 7. Using the High-Level Data Loader

The `FeaturePointcloudData` class provides convenient access:


In [None]:
# Use the high-level data loader
data = FeaturePointcloudData(data_dir)

print(data.get_info())

# Access data through properties (lazy loading)
print(f"Points shape: {data.points.shape}")
print(f"Features shape: {data.features.shape}")
print(f"RGB pointcloud: {len(data.rgb_pointcloud.points)} points")
print(f"PCA pointcloud: {len(data.pca_pointcloud.points)} points")


## 8. Basic Semantic Similarity

Using the semantic similarity utilities for queries:


In [None]:
# Initialize semantic analyzer
analyzer = SemanticPointcloudAnalyzer(data.features, data.points)

# Query for objects (following opt.py approach)
query = "chair"
similarities = analyzer.query_similarity(query, negatives=["object"], softmax_temp=1.0)

print(f"Query: '{query}'")
print(f"Similarities shape: {similarities.shape}")
print(f"Similarity range: [{similarities.min():.3f}, {similarities.max():.3f}]")
print(f"Mean similarity: {similarities.mean():.3f} ± {similarities.std():.3f}")

# Apply threshold (same as opt.py)
threshold = 0.502
above_threshold = similarities > threshold
print(f"Points above threshold {threshold}: {above_threshold.sum():,} / {len(similarities):,} ({100*above_threshold.mean():.1f}%)")


## 9. Object Instance Detection

Finding discrete object instances using spatial clustering:


In [None]:
# Find object instances
clusters = analyzer.find_object_instances(query, threshold=0.502, min_cluster_size=10, eps=0.05)

print(f"Found {len(clusters)} {query} instances:")
for i, cluster in enumerate(clusters):
    cluster_points = data.points[cluster]
    cluster_sims = similarities[cluster]
    center = cluster_points.mean(axis=0)
    
    print(f"  {query} {i+1}: {len(cluster)} points at [{center[0]:.2f}, {center[1]:.2f}, {center[2]:.2f}]")
    print(f"    Similarity: {cluster_sims.mean():.3f} ± {cluster_sims.std():.3f}")
    print(f"    Bbox: [{cluster_points.min(axis=0)}] to [{cluster_points.max(axis=0)}]")


## 10. Multi-Query Comparison

Comparing multiple semantic queries:


In [None]:
# Compare multiple queries
queries = ["chair", "table", "book", "magazine", "wall", "floor"]
results = analyzer.compare_queries(queries, threshold=0.502, softmax_temp=1.0)

print("Query comparison:")
print(f"{'Query':<12} {'Above Thresh':<12} {'Clusters':<8} {'Max Sim':<8} {'Mean Sim':<8}")
print("-" * 60)

for query, result in results.items():
    similarities = result['similarities']
    above_thresh = result['above_threshold']
    num_clusters = len(result['clusters'])
    
    print(f"{query:<12} {above_thresh:<12,} {num_clusters:<8} {similarities.max():<8.3f} {similarities.mean():<8.3f}")


## 11. Spatial Analysis

Analyzing spatial distribution of semantic matches:


In [None]:
# Spatial analysis for a query
spatial_info = analyzer.spatial_analysis("chair", threshold=0.502, softmax_temp=1.0, grid_resolution=0.1)

if not spatial_info.get('empty', False):
    print(f"Spatial analysis for '{spatial_info['query']}':")
    print(f"  Points: {spatial_info['num_points']:,}")
    print(f"  Bounding box: {spatial_info['bbox_min']} to {spatial_info['bbox_max']}")
    print(f"  Volume: {spatial_info['volume']:.3f} cubic units")
    print(f"  Density: {spatial_info['density']:.3f} points per cubic unit")
    print(f"  Grid cells: {len(spatial_info['grid_stats'])}")
    print(f"  Score range: [{spatial_info['score_stats']['min']:.3f}, {spatial_info['score_stats']['max']:.3f}]")
else:
    print("No points found above threshold for spatial analysis")


## 12. Direct Semantic Similarity Utils

Using the lower-level semantic similarity utilities:


In [None]:
# Direct usage of SemanticSimilarityUtils
semantic_utils = SemanticSimilarityUtils()

# Compute similarities directly
text_queries = ["magazine", "object"]  # positive + negatives
similarities = semantic_utils.compute_text_similarities(
    data.features, text_queries, 
    has_negatives=True, 
    softmax_temp=1.0
)

print(f"Direct semantic similarity computation:")
print(f"  Query: {text_queries[0]} vs {text_queries[1:]}")
print(f"  Shape: {similarities.shape}")
print(f"  Range: [{similarities.min():.3f}, {similarities.max():.3f}]")

# Create similarity pointcloud
sim_pcd = semantic_utils.create_similarity_pointcloud(
    data.points, similarities, threshold=0.502, colormap="turbo"
)

print(f"  Similarity pointcloud: {len(sim_pcd.points)} points above threshold")


## 13. Creating Custom Pointclouds

Building custom pointclouds with your own colors/filters:


In [None]:
# Create custom pointcloud with semantic coloring
query = "table"
similarities = analyzer.query_similarity(query, negatives=["object"])

# Create mask and colors
threshold = 0.502
mask = similarities > threshold

# Custom coloring: red for matches, blue for non-matches
colors = np.zeros((len(data.points), 3))
colors[mask] = [1.0, 0.0, 0.0]  # Red for matches
colors[~mask] = [0.0, 0.0, 1.0]  # Blue for non-matches

# Create pointcloud
custom_pcd = o3d.geometry.PointCloud()
custom_pcd.points = o3d.utility.Vector3dVector(data.points)
custom_pcd.colors = o3d.utility.Vector3dVector(colors)

print(f"Custom pointcloud created:")
print(f"  Total points: {len(custom_pcd.points):,}")
print(f"  Matches (red): {mask.sum():,}")
print(f"  Non-matches (blue): {(~mask).sum():,}")

# Save custom pointcloud
output_path = data_dir / f"custom_{query}_semantic.ply"
o3d.io.write_point_cloud(str(output_path), custom_pcd)
print(f"  Saved to: {output_path}")


## 14. Export Analysis Results

Saving comprehensive analysis for external use:


In [None]:
# Export comprehensive analysis
analysis_queries = ["chair", "table", "book", "magazine"]
output_file = data_dir / "semantic_analysis.json"

analyzer.export_semantic_analysis(
    analysis_queries, 
    str(output_file), 
    threshold=0.502, 
    softmax_temp=1.0
)

# Load and examine the exported analysis
with open(output_file, 'r') as f:
    analysis = json.load(f)

print("Exported analysis structure:")
print(f"  Metadata keys: {list(analysis['metadata'].keys())}")
print(f"  Query results: {list(analysis['query_results'].keys())}")

# Show example query result structure
first_query = list(analysis['query_results'].keys())[0]
query_result = analysis['query_results'][first_query]
print(f"\nExample query '{first_query}' result structure:")
for key, value in query_result.items():
    if isinstance(value, dict):
        print(f"  {key}: {list(value.keys())}")
    else:
        print(f"  {key}: {type(value).__name__}")


## Summary

This notebook demonstrated:

1. **Data Structure**: Understanding exported files and their contents
2. **Loading Data**: Points, features, PLY files, metadata, and PCA parameters
3. **High-Level API**: Using `FeaturePointcloudData` for convenient access
4. **Semantic Queries**: Computing similarities following opt.py approach
5. **Object Detection**: Finding discrete instances with spatial clustering
6. **Multi-Query Analysis**: Comparing multiple semantic concepts
7. **Spatial Analysis**: Understanding spatial distribution of matches
8. **Custom Processing**: Building your own pointclouds and analysis
9. **Export Results**: Saving analysis for external algorithms

All tools follow the same approach as `opt.py` with direct threshold control and no adaptive logic.
