# Testing the scaling limits of GTSAM on 1dsfm dataset

![image](media/1dsfmRef.png)

The dataset can be found here: https://www.cs.cornell.edu/projects/1dsfm/

## Pre-Processing 

###  Preprocessing Pipeline

Before performing Bundle Adjustment (BA) scaling experiments in GTSAM, the raw, unordered image collections from the **1DSfM dataset** must be transformed into a structured factor graph. We utilize **COLMAP**, a state-of-the-art Structure-from-Motion (SfM) pipeline, as the frontend to achieve this. The 

### Why is this step necessary?

GTSAM is a backend optimization library; it requires a defined **NonlinearFactorGraph** and good **Initial Values** to converge. COLMAP solves three critical problems:

1.  **Data Association (Feature Matching):** We must determine which 2D pixel $(u,v)$ in Image $A$ corresponds to the same 3D landmark as pixel $(u',v')$ in Image $B$. COLMAP uses SIFT descriptors and geometric verification to generate these feature tracks.
2.  **Initialization (Sparse Reconstruction):** Bundle Adjustment is a non-convex optimization problem. If initialized with identity poses and zero-point coordinates, the optimizer will get stuck in local minima. COLMAP provides a robust initial estimate for Camera Poses $(R, t)$ and 3D Points $(X, Y, Z)$.
3.  **Graph Topology:** The reconstruction determines the sparsity pattern of the graphâ€”defining exactly which cameras observe which landmarks.

---

### The Execution Pipeline

The preprocessing was executed via a shell script using a Dockerized version of COLMAP (for GPU acceleration). The pipeline consists of four sequential stages:

```bash
# 1. Feature Extraction
# Detects SIFT features. 'single_camera 0' allows for varying image dimensions.
colmap feature_extractor \
    --database_path database.db \
    --image_path /images \
    --ImageReader.single_camera 0

# 2. Feature Matching
# Associates features across images. We utilize a Vocabulary Tree (FAISS) 
# for O(N) efficiency on large datasets (Montreal/Piccadilly).
colmap vocab_tree_matcher \
    --database_path database.db \
    --VocabTreeMatching.vocab_tree_path vocab_tree_faiss.bin

# 3. Sparse Reconstruction (The SfM Frontend)
# Incrementally registers images to build the initial 3D model.
colmap mapper \
    --database_path database.db \
    --image_path /images \
    --output_path sparse

# 4. Conversion to Bundler Format
# Exports the binary COLMAP model to a text-based format parseable by our scripts.
colmap model_converter \
    --input_path sparse/0 \
    --output_path colmap_bundle.out \
    --output_type Bundler

### ðŸ“‚ Output Artifacts

The pipeline generates two critical files that serve as the interface between the COLMAP frontend and the GTSAM backend:


| File Name | Content Description | Role in GTSAM |
| :--- | :--- | :--- |
| **`colmap_bundle.out.bundle.out`** | **The Geometry File** (Bundler Format)<br>â€¢ **Header:** Camera count & Point count.<br>â€¢ **Cameras:** Focal length ($f$), Distortion ($k_1, k_2$), Rotation ($R$), Translation ($t$).<br>â€¢ **Structure:** 3D landmarks $(X, Y, Z)$, colors, and the list of 2D feature observations per point. | **Core Input:** Parsed to create the initial `gtsam.Values` (Pose3, Point3) and the `gtsam.NonlinearFactorGraph`. |
| **`colmap_bundle.out.list.txt`** | **The Association Index**<br>â€¢ A plain text list mapping the internal Bundle index (e.g., Camera 0) to the original image filename (e.g., `image_0005.jpg`). | **Metadata:** Used to verify the ordering of cameras and associate optimized poses back to the source images for visualization. |

## GTSAM test bench

In [None]:
import gtsam
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import helper 

In [None]:

# Cell 2: Load the Full Data
bundle_file = "colmap_bundle.out.bundle.out"
list_file = "colmap.bundle.out.list.txt"

print("Parsing Bundler file...")
cameras, points, observations = helper.read_bundler_file(bundle_file, list_file)
print(f"Total Cameras: {len(cameras)}")
print(f"Total Points: {len(points)}")

# Build the full GTSAM objects first (efficient in memory vs re-parsing)
full_graph, full_estimate = helper.build_bundler_gtsam_graph(cameras, points, observations)

In [None]:
# Cell 3: Define Scaling Steps
# Define the number of images (cameras) for each test run
# Adjust these numbers based on your total cameras. 
# Example: [10, 50, 100, 200, 500]
step_sizes = np.linspace(10, len(cameras), num=5, dtype=int) 
results = []

print(f"Running scaling tests on steps: {step_sizes}")

In [None]:
# Cell 4: Run Incremental Tests
for n_cams in step_sizes:
    print(f"\n--- Testing with {n_cams} cameras ---")
    
    # 1. Create Subgraph
    sub_graph, sub_estimate = helper.create_subgraph(full_graph, full_estimate, n_cams)
    num_params = sub_estimate.size()
    num_factors = sub_graph.size()
    
    # 2. Get Sparsity Metric
    # We use the number of entries in the linearized Hessian as a proxy for Schur sparsity complexity
    try:
        linear_graph = sub_graph.linearize(sub_estimate)
        hessian = linear_graph.augmentedHessian() # Dense matrix output - CAREFUL on large sizes
        # For large scaling, simply counting factor connections is safer than building dense Hessian
        # Using Graph size as proxy for 'sparsity' related load
        sparsity_metric = linear_graph.keys().size() # simple node count
    except Exception as e:
        sparsity_metric = 0
        print("Skipping Hessian formation (too large)")

    # 3. Optimize with Resource Monitoring
    params = gtsam.LevenbergMarquardtParams()
    params.setVerbosityLM("SUMMARY")
    optimizer = gtsam.LevenbergMarquardtOptimizer(sub_graph, sub_estimate, params)

    try:
        with helper.ResourceMonitor() as monitor:
            result = optimizer.optimize()
        
        stats = monitor.get_stats()
        
        results.append({
            "num_images": n_cams,
            "num_points": len([k for k in range(sub_graph.size())]), # approx
            "ram_max_mb": stats['max_ram_mb'],
            "cpu_avg": stats['avg_cpu_percent'],
            "time_sec": stats['time_sec'],
            "sparsity_proxy": sparsity_metric
        })
        print(f"Completed in {stats['time_sec']:.2f}s, Max RAM: {stats['max_ram_mb']:.2f}MB")
        
    except RuntimeError as e:
        print(f"Optimization FAILED/OOM for {n_cams} cameras: {e}")
        break

In [None]:
# Cell 5: Visualize Results
df = pd.DataFrame(results)

fig, ax1 = plt.subplots()

color = 'tab:red'
ax1.set_xlabel('Number of Images')
ax1.set_ylabel('Time (s)', color=color)
ax1.plot(df['num_images'], df['time_sec'], color=color, marker='o', label='Time')
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  
color = 'tab:blue'
ax2.set_ylabel('Max RAM (MB)', color=color)  
ax2.plot(df['num_images'], df['ram_max_mb'], color=color, marker='x', linestyle='--', label='RAM')
ax2.tick_params(axis='y', labelcolor=color)

plt.title("Scaling Limits: 1DSfM Dataset")
fig.tight_layout()
plt.show()

# Sparsity Plot
plt.figure()
plt.plot(df['num_images'], df['sparsity_proxy'], marker='o')
plt.title("Sparsity (Graph Nodes) vs Images")
plt.xlabel("Number of Images")
plt.ylabel("System Size (Variables)")
plt.show()