# Overview

This notebook demonstrates how we triangulate predicted manhole detections from multi-view street imagery to obtain robust 3D candidate locations.

## What this notebook covers
- Load COCO-style predictions and image metadata
- Project per-detection centroids to world-space rays
- Compute pairwise closest-approach intersections between rays
- Peeling clustering to suppress ambiguous crossings and consolidate consistent hypotheses
- Export final 3D candidates as a GeoDataFrame

## Inputs
- COCO predictions: polygons, score, area
- Camera metadata: per-frame pose and intrinsics (cylindrical pano or cube-map)
- Parameters: intersection threshold, clustering threshold, update radius, max missing frames

## Key functions (from `triangulation.py` and `projection.py`)
- `cylin_pano_proj_ray` / `cubemap_pano_proj_ray`: build rays per frame
- `compute_ray_intersection`: closest-approach test and filtering
- `triangulation_peeling`: iterative clustering and candidate maintenance

## Expected outputs
- GeoDataFrame of candidate points with attributes:
  - elevation, number of supporting intersections
  - intersection length statistics (diagnostics)

Tip: Start by running the parameter/config cells, then proceed section-by-section. Adjust thresholds if results look too sparse/dense.


In [1]:
import os
import sys

# Add the project root to the Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '../..'))
sys.path.append(project_root)

import numpy as np
import geopandas as gpd
from scripts.utils.projection import *
from scripts.utils.triangulation import *

from functools import partial
from shapely.geometry import Point

# Metadata Pre-process

Load camera trajectory and detected object mask polygon. Prepare temporal-spatial iteration geopandas DataFrame in camera(image) frame unit.


In [None]:
# set file path
csv_traject = '../../data/neuchatel/ne_traject.csv'
coco_file_path = '../../data/infer/run02/infer_full/RCNE/oth_COCO_panoptic_detections.json'

In [None]:
# Read the trajectory CSV as a DataFrame
traject = pd.read_csv(csv_traject)

images_df, ann_gdf = load_coco_inferences(coco_file_path, t_score=0.7)

traject['file_name'] = traject['file_name'].str.replace(r'\.\w+$', '', regex=True)
images_df['file_name'] = images_df['file_name'].str.replace(r'\.\w+$', '', regex=True)

trajectory dataframe must contain following information for each spherial panoramic image:
 - Position (x, y, z in meters)
 - Orientation (yaw/heading, pitch, roll in degree)
 - Time (gps acquisition time)
 - File name (image file name with/without file type suffix)

In [3]:
traject.tail()

Unnamed: 0,gpsimgdirection,gpspitch,gpsroll,datetimeoriginal,gpslatitude,gpslongitude,gps_sec_s_,file_name,x_m_,y_m_,z_m_,x,y,z,URL
10833,62.717771,-3.818139,-1.482756,305978.99593,6.526556,46.874336,305979,20200408_135016_001091,2530469.497,1191873.376,803.424,2530469,1191873,803,https://sitn.ne.ch/web/images/photos360/assets...
10834,62.541726,-3.690216,-1.252783,305978.64595,6.526614,46.874358,305979,20200408_135016_001090,2530473.956,1191875.766,803.324,2530473,1191875,803,https://sitn.ne.ch/web/images/photos360/assets...
10835,242.095375,-1.169486,1.924816,306497.47495,6.526644,46.874342,306497,20200408_135016_002386,2530476.199,1191873.894,803.364,2530476,1191873,803,https://sitn.ne.ch/web/images/photos360/assets...
10836,242.005249,-1.289378,1.989834,306497.09495,6.526586,46.874319,306497,20200408_135016_002385,2530471.777,1191871.458,803.464,2530471,1191871,803,https://sitn.ne.ch/web/images/photos360/assets...
10837,241.937913,-1.651241,2.100855,306496.33496,6.526471,46.874274,306496,20200408_135016_002383,2530462.945,1191866.565,803.664,2530462,1191866,803,https://sitn.ne.ch/web/images/photos360/assets...


image dataframe is consolidated from inference JSON file. Once the image detection following COCO-format, this dataframe should contain: 
 - file_name
 - id (numerical image id for annoatation reference)
 - width
 - height

So that correspondence between image file name in string and numerical image id can be associated.


In [4]:
images_df.tail()

Unnamed: 0,file_name,width,height,AOI,id,basename
5304,20200408_135016_001083,8000,4000,RCNE,10818,20200408_135016_001083.jpg
5305,20200408_135016_001084,8000,4000,RCNE,10819,20200408_135016_001084.jpg
5306,20200408_135016_001085,8000,4000,RCNE,10820,20200408_135016_001085.jpg
5307,20200408_135016_002391,8000,4000,RCNE,10821,20200408_135016_002391.jpg
5308,20200408_135016_002390,8000,4000,RCNE,10824,20200408_135016_002390.jpg


annoation dataframe comes from inference JSON file as well. In COCO-format, annotation block refers to polygon mask of detection/ground turth in image. This dataframe should contain: 
 - image_id: numerical id refer to image dataframe
 - annotation_id: unique numerical id for image detection
 - category_id: numerical id of detected object category
 - score: confidence score for image detection
 - area: detection mask area in pixel unit
 - geometry: polygon in pixel coordinates of the detection

Each element is a detected object on the corresponding image. 

In [5]:
ann_gdf.tail()

Unnamed: 0,image_id,category_id,geometry,annotation_id,score,area
10183,10089,0,"POLYGON ((3690.001 2535.001, 3690 2535.025, 36...",36711,0.8895,8736.175326
10184,10089,0,"POLYGON ((4944.001 2359.999, 4944.025 2360, 49...",36713,0.837,2375.840265
10185,10089,0,"POLYGON ((6689.001 2529, 6689 2529.025, 6689 2...",36716,0.853,3798.583481
10186,10088,0,"POLYGON ((5644.001 2573.001, 5644 2573.025, 56...",36725,0.86,5846.848873
10187,10092,0,"POLYGON ((4872.001 2362.001, 4872 2362.025, 48...",36727,0.83,2268.906643


Merge and group all dataframe to perform geo-localization with a dynamic pool (slide window) according to all aquisition trajectory.

You might need to change the column name to run the following blocks depending on your custimzed dataset.

Adjustable argument:
 - radius in **spatial_temporal_group_sort**: should be the minimum spatial distance in meters to isolate two nearby trajectory. Images acquired within this radius will be processed by the triangulation algorithmn at raughly the same time.

In [None]:
gdf = gpd.GeoDataFrame(traject)
if 'file_name' not in gdf.columns:
    raise ValueError("gdf must have a 'file_name' column to match with COCO images.")

# Merge gdf with coco_images_df to get the COCO image id
gdf = gdf.merge(
    images_df[['id', 'file_name', 'width', 'height']],
    left_on='file_name',
    right_on='file_name',
    how='right'
)

# Merge gdf with ann_gdf to get prediction mask
gdf = gdf.merge(
    ann_gdf,
    left_on='id',
    right_on='image_id',
    how='right'
)

In [None]:
grouped = spatial_temporal_group_sort(
    gdf,
    groupby_col='image_id',
    time_column="gps_sec_s_",
    x_col="x_m_",
    y_col="y_m_",
    z_col="z_m_",
    radius=10.0)

Parameter to finetune in localization:
 - pano_proj_ray: 2D to 3D projection function for different image type.
 - offset: constant offset applied to camera position and orientation during projection
 - intersection_threshold: maximum distance for two rays to formulate an intersection
 - clustering_threshold: eps of DBSCAN to create candidate from intersection cluster in dynamic pool
 - candidate_update_threshold: eps of DBSCAN to merge duplicate candidates after iteration finishs
 - candidate_missing_limit: life time a active candidate without detection in number of images 
 - radius: maximum distance of valid intersetions to correspond camera position. Further detection is not reliable and therefore discarded. 
 - mask_area_control: bool, if using mask area from image detection to filter potential false positive intersections.
 - height_control: bool, if using camera height and hard coded height constraints to fiter potential false positive intersections.

In [30]:
# To pass an offset to cylin_pano_proj_ray while keeping the (frame_id, frame) signature,
# use a lambda or functools.partial to "bind" the offset argument:
out_gdf, candidate_list, ray_list, intersection_objs = triangulation_peeling(
    grouped, 
    partial(cylin_pano_proj_ray, offset=[0,0,0,0.5,0,-0.3]),
    intersection_threshold=0.5,
    clustering_threshold=1,
    candidate_update_threshold=1,
    candidate_missing_limit=5,
    radius=20,
    mask_area_control=False,
    height_control=True)

Processing frames: 100%|██████████| 4697/4697 [00:33<00:00, 141.65it/s]
100%|██████████| 1496/1496 [00:00<00:00, 217019.29it/s]


In [None]:
out_gdf.to_file(f"../../output/triangulation_pred_peeling_20m.gpkg", driver="GPKG")

You can inspect intersection of each candidate to find the best parameters for you dataset. 

In [None]:
# Convert intersection_objs to GeoDataFrame
intersection_gdf = gpd.GeoDataFrame(
    [
        {
            "geometry": Point(inter.point),
            "ray_pair": inter.ray_pair,
            "dist": inter.dist,
            "length": inter.length
        }
        for inter in intersection_objs
    ],
    geometry="geometry",
    crs="epsg:2056"
)

intersection_gdf.to_file("../../output/triangulation_peeling_intersections_NE.gpkg", driver="GPKG")

# Metrics

Definition: For each annotated ground truth, we regard it as detected if we have a candidate within 2 meters from its centoid.

Following blcoks calculate the quantity of detected objects (True Positives, TP), wrong detection (False Positives, FP), missed objects (False Negatives, FN) and statistics like precision, recall and F1-score.

For detected objects, distance statistics between candidate center and ground truth polygon center are calculated as well.

Parameters:
 - range_limit: only keep the ground truth and detection within the distance
 - min_intersects: minimum number of intersections for one detection. Usually, min_intersects = 1 keeps all candidates and have best recall. Increasing value will improve precision but reduce recall.
 - max_intersection_dist: maximum distance allowed for mean_intersection_distance of a candidate. 
 - res_file: output file path.


In [None]:
# parameters
range_limit = 15
min_intersects = 2
max_intersection_dist = 0.2
res_file = '../../output/triangulation_peeling_acc.gpkg'


pred_path = '../../output/triangulation_pred_peeling_20m.gpkg'
traject = pd.read_csv(csv_traject)
# Construct geometry from x_m_, y_m_, z_m_
traject['geometry'] = traject.apply(lambda row: Point(row['x_m_'], row['y_m_'], row['z_m_']), axis=1)

# Also create a 2D geometry column for XY plane
traject['geometry_xy'] = traject.apply(lambda row: Point(row['x_m_'], row['y_m_']), axis=1)

gt_path = '../../data/neuchatel/NE_GT_3D.gpkg'
gt_gdf = gpd.read_file(gt_path, layer='ne_gt_3d')

# --- Matrix calculation for filtering gt_gdf by nearby traject points in XY ---

# Get centroid XY coordinates for all GT geometries
gt_centroids = gt_gdf.geometry.centroid
gt_centroids_xy = np.array([[pt.x, pt.y] for pt in gt_centroids])

# Get all traject XY coordinates
traject_xy = np.array([[pt.x, pt.y] for pt in traject['geometry_xy']])

# Compute distance matrix: shape (n_gt, n_traject)
dists_matrix = np.linalg.norm(gt_centroids_xy[:, None, :] - traject_xy[None, :, :], axis=2)

# Count how many traject points are within 15 meters for each GT centroid
nearby_counts = (dists_matrix <= range_limit).sum(axis=1)

# Filter gt_gdf: keep only rows where at least 3 traject points are within 15m
gt_gdf = gt_gdf[nearby_counts >= 3].reset_index(drop=True)

pred_gdf = gpd.read_file(pred_path)
count_flag = np.array(pred_gdf.intersections) >= min_intersects
pred_gdf = pred_gdf[
    (pred_gdf.mean_intersection_length <= range_limit).values 
    & (pred_gdf.mean_intersection_dist <= max_intersection_dist).values 
    & count_flag]
pred_gdf.reset_index(inplace=True)
pred_gdf.to_file(res_file, driver="GPKG")


In [20]:
# --- Additional code block: metrics based on match distance between prediction 2D coordinates and centroid of 2D gt ---
# Only match distance below 2m can be defined as true positive.

# Prepare 2D coordinates for predictions and GT centroids
pred_points_2d = np.array([[geom.x, geom.y] for geom in pred_gdf.geometry])
gt_centroids_2d = np.array([[pt.x, pt.y] for pt in gt_gdf.geometry.centroid])

n_pred = len(pred_points_2d)
n_gt = len(gt_centroids_2d)

# Compute distance matrix: shape (n_gt, n_pred)
dist_matrix = np.linalg.norm(gt_centroids_2d[:, None, :] - pred_points_2d[None, :, :], axis=2)

# For each GT, find the closest prediction (greedy matching, each pred at most one GT)
gt_to_pred = {}
pred_matched = set()
match_distances = []

distance_threshold = 2.0  # Only matches below 2m are considered TP

for gt_idx in range(n_gt):
    # For this GT, get all pred indices sorted by distance
    pred_indices_sorted = np.argsort(dist_matrix[gt_idx])
    for pred_idx in pred_indices_sorted:
        if pred_idx not in pred_matched:
            dist = dist_matrix[gt_idx, pred_idx]
            if dist < distance_threshold:
                gt_to_pred[gt_idx] = pred_idx
                pred_matched.add(pred_idx)
                match_distances.append(dist)
            break  # Only consider the closest pred for each GT

# Now, metrics:
TP = len(gt_to_pred)  # Each GT matched to a unique pred within 2m
FP = n_pred - len(pred_matched)  # Predictions not matched to any GT (within 2m)
FN = n_gt - len(gt_to_pred)      # GTs not matched to any pred (within 2m)

precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f1_score = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

# Distance statistics for matches
if match_distances:
    match_distances_np = np.array(match_distances)
    match_dist_stats = {
        'mean': np.mean(match_distances_np),
        'std': np.std(match_distances_np),
        'min': np.min(match_distances_np),
        'max': np.max(match_distances_np),
        'median': np.median(match_distances_np),
        'count': len(match_distances_np)
    }
else:
    match_dist_stats = {}

print("=== Greedy GT-to-Pred Closest Matching Metrics (2m threshold) ===")
print(f"TP: {TP}, FP: {FP}, FN: {FN}")
print(f"Precision: {precision:.3f}, \nRecall: {recall:.3f}, \nF1: {f1_score:.3f}")
print("Match distance stats:")
for k, v in match_dist_stats.items():
    print(f"  {k}: {v}")

=== Greedy GT-to-Pred Closest Matching Metrics (2m threshold) ===
TP: 1225, FP: 77, FN: 163
Precision: 0.941, 
Recall: 0.883, 
F1: 0.911
Match distance stats:
  mean: 0.06755311947685445
  std: 0.13089793192964758
  min: 0.0018754955624827358
  max: 1.8708154224398337
  median: 0.04838566485435597
  count: 1225


In [None]:
# Get indices for false positives (pred points not matched to any GT polygon)
all_pred_indices = set(pred_gdf.index)
matched_pred_indices = pred_matched
fp_indices = np.array(sorted(list(all_pred_indices - matched_pred_indices)))

# Get indices for false negatives (GT polygons not matched to any pred point)
all_gt_indices = set(gt_gdf.index)
matched_gt_indices = set(gt_to_pred.keys())
fn_indices = np.array(sorted(list(all_gt_indices - matched_gt_indices)))

# print(f"{len(fp_indices)} False Positive indices (pred points not matched): {fp_indices}")
# print(f"{len(fn_indices)} False Negative indices (GT polygons not matched): {fn_indices}")

In [None]:
pred_gdf.iloc[fp_indices]