# Overview

This notebook demonstrates how we triangulate predicted manhole detections from multi-view street imagery to obtain robust 3D candidate locations.

## What this notebook covers
- Load COCO-style predictions and image metadata
- Project per-detection centroids to world-space rays
- Compute pairwise closest-approach intersections between rays
- Peeling clustering to suppress ambiguous crossings and consolidate consistent hypotheses
- Export final 3D candidates as a GeoDataFrame

## Inputs
- COCO predictions: polygons, score, area
- Camera metadata: per-frame pose and intrinsics (cylindrical pano or cube-map)
- Parameters: intersection threshold, clustering threshold, update radius, max missing frames

## Key functions (from `triangulation.py` and `projection.py`)
- `cylin_pano_proj_ray` / `cubemap_pano_proj_ray`: build rays per frame
- `compute_ray_intersection`: closest-approach test and filtering
- `triangulation_peeling`: iterative clustering and candidate maintenance

## Expected outputs
- GeoDataFrame of candidate points with attributes:
  - elevation, number of supporting intersections
  - intersection length statistics (diagnostics)

Tip: Start by running the parameter/config cells, then proceed section-by-section. Adjust thresholds if results look too sparse/dense.


In [4]:
import os
import sys

# Add the project root to the Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

import numpy as np
import geopandas as gpd
from src.geomatic.projection import *
from src.geomatic.triangulation import *

# Metadata Pre-process

Load camera trajectory and detected object mask polygon. Prepare temporal-spatial iteration geopandas DataFrame in camera(image) frame unit.

Cubemap panoramas have six images (cube faces) for each frame, and different camera model varies in resolution and intrisic parameters. Here we suppose the file name of each image follows rule below:
 
 | Extract '102-35582' as frame_id, 'lb4' as camera_model, and '0' as cube_idx from '102-lb4-0-35582.jpg'



trajectory dataframe contain following information for each spherial panoramic image:
 - Position (x, y, z in meters)
 - Orientation (euler angles rx/ry/rz in radian)
 - Time (gps acquisition time)
 - Frame_id (unique string index for a frame)
 - Camera model (one of 'lb4', 'lb7', 'lb8', and 'lb10')
 - Image resolution (Cubeface size in pixels)

In [None]:
# set file path
gpkg_traject = '../data/zurich/zh_gt_traject.gpkg'
coco_file_path = '../data/zurich/mix_panoptic_detections.json'

In [None]:
# Read the trajectory CSV as a DataFrame
traject = gpd.read_file(gpkg_traject)
traject.drop(columns=['imagePath', 'geometry'], inplace=True) 
traject.tail()

Unnamed: 0,dataSourceId,gpsWeekSeconds,x,y,z,rx,ry,rz,size,frame_id
9090,lb8,124606.116242,2682875.662,1247820.278,410.868,1.537087,1.188867,0.052404,4016,109-2059
9091,lb8,124606.769299,2682872.021,1247821.895,410.92,1.534365,1.154501,0.050934,4016,109-2060
9092,lb8,124607.466692,2682868.424,1247823.643,410.977,1.543542,1.116923,0.043316,4016,109-2061
9093,lb8,124608.184877,2682864.851,1247825.581,411.036,1.549519,1.070756,0.040241,4016,109-2062
9094,lb8,124608.872861,2682861.411,1247827.628,411.089,1.549821,1.034544,0.040708,4016,109-2063


image dataframe is consolidated from inference JSON file. Once the image detection following COCO-format, this dataframe should contain: 
 - file_name
 - id (numerical image id for annoatation reference)
 - width
 - height

So that correspondence between image file name in string and numerical image id can be associated.


In [None]:
images_df, ann_gdf = load_coco_inferences(coco_file_path, t_score=0.8)
images_df[['frame_id', 'camera_model', 'cube_idx']] = images_df['file_name'].apply(parse_file_name)
images_df.tail()

Unnamed: 0,file_name,width,height,AOI,id,basename,frame_id,camera_model,cube_idx
15171,109-lb8-2-2062.jpg,4016,4016,SZH,36374,109-lb8-2-2062.jpg,109-2062,lb8,2
15172,109-lb8-3-2062.jpg,4016,4016,SZH,36375,109-lb8-3-2062.jpg,109-2062,lb8,3
15173,109-lb8-0-2063.jpg,4016,4016,SZH,36376,109-lb8-0-2063.jpg,109-2063,lb8,0
15174,109-lb8-2-2063.jpg,4016,4016,SZH,36378,109-lb8-2-2063.jpg,109-2063,lb8,2
15175,109-lb8-3-2063.jpg,4016,4016,SZH,36379,109-lb8-3-2063.jpg,109-2063,lb8,3


annoation dataframe comes from inference JSON file as well. In COCO-format, annotation block refers to polygon mask of detection/ground turth in image. This dataframe should contain: 
 - image_id: numerical id refer to image dataframe
 - annotation_id: unique numerical id for image detection
 - category_id: numerical id of detected object category
 - score: confidence score for image detection
 - area: detection mask area in pixel unit
 - geometry: polygon in pixel coordinates of the detection

Each element is a detected object on the corresponding image. 

In [4]:
# Merge gdf with coco_images_df to get the COCO image id
ann_gdf = ann_gdf.merge(
    images_df[['id', 'frame_id', 'camera_model', 'cube_idx', 'width', 'height']],
    left_on='image_id',
    right_on='id',
    how='left'
)
ann_gdf.drop(columns=['id'], inplace=True)
ann_gdf = ann_gdf[ann_gdf.area.values > (np.max([ann_gdf.height.values, ann_gdf.width.values], axis=0) / 2)] 
ann_gdf.tail()

Unnamed: 0,image_id,category_id,geometry,annotation_id,score,area,frame_id,camera_model,cube_idx,width,height
17479,7331,0,"POLYGON ((684.001 1300.001, 684 1300.025, 684 ...",182406,0.815,1402.508525,99-91846,lb4,3,2048,2048
17480,7335,0,"POLYGON ((813.001 1303.999, 813.025 1304, 813....",182410,0.83,1391.964233,99-91847,lb4,3,2048,2048
17481,7339,0,"POLYGON ((942.001 1294, 942 1294.025, 942 1300...",182414,0.815,1311.008495,99-91848,lb4,3,2048,2048
17482,7343,0,"POLYGON ((1067.001 1292, 1067 1292.025, 1067 1...",182416,0.808,1309.964233,99-91849,lb4,3,2048,2048
17483,7347,0,"POLYGON ((1183.001 1289.001, 1183 1289.025, 11...",182420,0.818,1323.986379,99-91850,lb4,3,2048,2048


Merge and group all dataframe to perform geo-localization with a dynamic pool (slide window) according to all aquisition trajectory.

You might need to change the column name to run the following blocks depending on your custimzed dataset.

Adjustable argument:
 - radius in **spatial_temporal_group_sort**: should be the minimum spatial distance in meters to isolate two nearby trajectory. Images acquired within this radius will be processed by the triangulation algorithmn at raughly the same time.

In [None]:
# Merge gdf with ann_gdf to get prediction mask
gdf = traject.merge(
    ann_gdf[['geometry', 'score', 'area', 'frame_id','cube_idx', 'camera_model']],
    left_on='frame_id',
    right_on='frame_id',
    how='right'
)

In [6]:
grouped = spatial_temporal_group_sort(
    gdf,
    groupby_col='frame_id',
    time_column="gpsWeekSeconds",
    x_col="x",
    y_col="y",
    z_col="z",
    radius=10.0)

Parameter to finetune in localization:
 - pano_proj_ray: 2D to 3D projection function for different image type.
 - offset: constant offset applied to camera position and orientation during projection
 - intersection_threshold: maximum distance for two rays to formulate an intersection
 - clustering_threshold: eps of DBSCAN to create candidate from intersection cluster in dynamic pool
 - candidate_update_threshold: eps of DBSCAN to merge duplicate candidates after iteration finishs
 - candidate_missing_limit: life time a active candidate without detection in number of images 
 - radius: maximum distance of valid intersetions to correspond camera position. Further detection is not reliable and therefore discarded. 
 - mask_area_control: bool, if using mask area from image detection to filter potential false positive intersections.
 - height_control: bool, if using camera height and hard coded height constraints to fiter potential false positive intersections.

In [7]:
out_gdf, candidate_list, ray_list, intersection_objs = triangulation_peeling(
    grouped, 
    cubemap_pano_proj_ray,
    intersection_threshold=0.2,
    clustering_threshold=0.5,
    candidate_update_threshold=1.0,
    candidate_missing_limit=20,
    radius=20,
    mask_area_control=False,
    height_control=True)

Processing frames: 100%|██████████| 5902/5902 [02:36<00:00, 37.81it/s]
100%|██████████| 1233/1233 [00:00<00:00, 7330.05it/s]


In [None]:
out_gdf.to_file(f"../output/triangulation_peeling_ZH_20m.gpkg", driver="GPKG")

You can inspect intersection of each candidate to find the best parameters for you dataset. 

In [None]:
import geopandas as gpd
from shapely.geometry import Point

# Convert intersection_objs to GeoDataFrame
intersection_gdf = gpd.GeoDataFrame(
    [
        {
            "geometry": Point(inter.point),
            "ray_pair": inter.ray_pair,
            "dist": inter.dist,
            "length": inter.length
        }
        for inter in intersection_objs
    ],
    geometry="geometry",
    crs="epsg:2056"
)

intersection_gdf.to_file("../output/triangulation_peeling_intersections_ZH_20m.gpkg", driver="GPKG")

# Metrics

Definition: For each annotated ground truth, we regard it as detected if we have a candidate within 2 meters from its centoid.

Following blcoks calculate the quantity of detected objects (True Positives, TP), wrong detection (False Positives, FP), missed objects (False Negatives, FN) and statistics like precision, recall and F1-score.

For detected objects, distance statistics between candidate center and ground truth polygon center are calculated as well.

Parameters:
 - range_limit: only keep the ground truth and detection within the distance
 - min_intersects: minimum number of intersections for one detection. Usually, min_intersects = 1 keeps all candidates and have best recall. Increasing value will improve precision but reduce recall.
 - max_intersection_dist: maximum distance allowed for mean_intersection_distance of a candidate. 
 - res_file: output file path.


In [None]:
# parameters
range_limit = 15
min_intersects = 2
max_intersection_dist = 0.2
res_file = '../output/triangulation_peeling_ZH_rec.gpkg'


pred_path = '../output/triangulation_peeling_ZH_20m.gpkg'
traject = gpd.read_file(gpkg_traject)
gt_path = '../data/zurich/zh_GT_3D.gpkg'

# Also create a 2D geometry column for XY plane
traject['geometry_xy'] = traject.apply(lambda row: Point(row['x'], row['y']), axis=1)

gt_gdf = gpd.read_file(gt_path, layer='zh_GT_3D')

# --- Matrix calculation for filtering gt_gdf by nearby traject points in XY ---

# Get centroid XY coordinates for all GT geometries
gt_centroids = gt_gdf.geometry.centroid
gt_centroids_xy = np.array([[pt.x, pt.y] for pt in gt_centroids])

# Get all traject XY coordinates
traject_xy = np.array([[pt.x, pt.y] for pt in traject['geometry_xy']])

# Compute distance matrix: shape (n_gt, n_traject)
dists_matrix = np.linalg.norm(gt_centroids_xy[:, None, :] - traject_xy[None, :, :], axis=2)

# Count how many traject points are within 15 meters for each GT centroid
nearby_counts = (dists_matrix <= range_limit).sum(axis=1)

# Filter gt_gdf: keep only rows where at least 3 traject points are within 15m
gt_gdf = gt_gdf[nearby_counts >= 3].reset_index(drop=True)

pred_gdf = gpd.read_file(pred_path)
count_flag = np.array(pred_gdf.intersections) >= min_intersects
pred_gdf = pred_gdf[
    (pred_gdf.mean_intersection_length <= range_limit).values 
    & (pred_gdf.mean_intersection_dist <= max_intersection_dist).values 
    & count_flag]
pred_gdf.reset_index(inplace=True)
pred_gdf.to_file(res_file, driver="GPKG")


In [39]:
# --- Additional code block: metrics based on match distance between prediction 2D coordinates and centroid of 2D gt ---
# Only match distance below 2m can be defined as true positive.

# Prepare 2D coordinates for predictions and GT centroids
pred_points_2d = np.array([[geom.x, geom.y] for geom in pred_gdf.geometry])
gt_centroids_2d = np.array([[pt.x, pt.y] for pt in gt_gdf.geometry.centroid])

n_pred = len(pred_points_2d)
n_gt = len(gt_centroids_2d)

# Compute distance matrix: shape (n_gt, n_pred)
dist_matrix = np.linalg.norm(gt_centroids_2d[:, None, :] - pred_points_2d[None, :, :], axis=2)

# For each GT, find the closest prediction (greedy matching, each pred at most one GT)
gt_to_pred = {}
pred_matched = set()
match_distances = []

distance_threshold = 2.0  # Only matches below 2m are considered TP

for gt_idx in range(n_gt):
    # For this GT, get all pred indices sorted by distance
    pred_indices_sorted = np.argsort(dist_matrix[gt_idx])
    for pred_idx in pred_indices_sorted:
        if pred_idx not in pred_matched:
            dist = dist_matrix[gt_idx, pred_idx]
            if dist < distance_threshold:
                gt_to_pred[gt_idx] = pred_idx
                pred_matched.add(pred_idx)
                match_distances.append(dist)
            break  # Only consider the closest pred for each GT

# Now, metrics:
TP = len(gt_to_pred)  # Each GT matched to a unique pred within 2m
FP = n_pred - len(pred_matched)  # Predictions not matched to any GT (within 2m)
FN = n_gt - len(gt_to_pred)      # GTs not matched to any pred (within 2m)

precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f1_score = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

# Distance statistics for matches
if match_distances:
    match_distances_np = np.array(match_distances)
    match_dist_stats = {
        'mean': np.mean(match_distances_np),
        'std': np.std(match_distances_np),
        'min': np.min(match_distances_np),
        'max': np.max(match_distances_np),
        'median': np.median(match_distances_np),
        'count': len(match_distances_np)
    }
else:
    match_dist_stats = {}

print("=== Greedy GT-to-Pred Closest Matching Metrics (2m threshold) ===")
print(f"TP: {TP}, FP: {FP}, FN: {FN}")
print(f"Precision: {precision:.3f}, \nRecall: {recall:.3f}, \nF1: {f1_score:.3f}")
print("Match distance stats:")
for k, v in match_dist_stats.items():
    print(f"  {k}: {v}")

=== Greedy GT-to-Pred Closest Matching Metrics (2m threshold) ===
TP: 862, FP: 110, FN: 230
Precision: 0.887, 
Recall: 0.789, 
F1: 0.835
Match distance stats:
  mean: 0.06536831443368625
  std: 0.13751320861764327
  min: 0.001892349416238157
  max: 1.6430085506413148
  median: 0.040443444932066946
  count: 862


In [None]:
# Get indices for false positives (pred points not matched to any GT polygon)
all_pred_indices = set(pred_gdf.index)
matched_pred_indices = pred_matched
fp_indices = np.array(sorted(list(all_pred_indices - matched_pred_indices)))

# Get indices for false negatives (GT polygons not matched to any pred point)
all_gt_indices = set(gt_gdf.index)
matched_gt_indices = set(gt_to_pred.keys())
fn_indices = np.array(sorted(list(all_gt_indices - matched_gt_indices)))

# print(f"{len(fp_indices)} False Positive indices (pred points not matched): {fp_indices}")
# print(f"{len(fn_indices)} False Negative indices (GT polygons not matched): {fn_indices}")

In [None]:
pred_gdf.iloc[fp_indices]