# Robust Brute-Force Multi-view Multi-point Triangulation

## Goal

Suppose $N_c$ calibrated cameras observe moving $N_p$ 3D points $P_i \; (i=1,\dots,N_p)$ over $N_f$ frames.  The points are not always visible from the cameras, and the observations are noisy.  That is, we assume

- **occlusion**: the cameras are randomly occluded at each frame for each 3D point,
- **observation noise**: the 2D projections of $P_i$ to each camera is affected by Gaussian noise $\mathcal{N}(0, \sigma)$, and
- **outliers**: some of 2D projections are totally unrelated to the ground-truth.

The goal of this example is to triangulate the 3D points from such noisy 2D observations based on consensus between views.  A possible scenario is

1. Capture some targets with calibrated cameras,
   - For static multi-view cameras, see [example_gopro_step3_ba.ipynb](./example_gopro_step3_ba.ipynb) for example.
   - For dynamic / moving cameras, do a visual SLAM to identify the camera poses.
2. Detect keypoints on the targets somehow, and
   - The detection can be noisy ... it cannot detect the target or it can return totally wrong results (outliers) for some frames.
3. Use this notebook to reconstruct the 3D structure of the targets.

## Note

- The function `pycalib.robust.triangulate_consensus()` used in this notebook does *brute-force*, not RANSAC.
- This notebooks uses a synthetic dataset.  To use your own data, prepare `pandas.DataFrame` of the same format. For example, you can prepare a CSV file and load it by `pd.read_csv(...).dropna()`.
  - The input CSV must have `frame`, `label`, `camera`, `x`, and `y` columns or equivalent.

## Libraries

In [1]:
import sys, os, cv2
import numpy as np
import pandas as pd

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.insert(0, module_path)

import pycalib


## Synthetic data

This cell generates a ground-truth synthetic dataset as a `pd.DataFrame` of the form

```:txt
               x            y  camera  frame label
0    2724.499909  1795.906886       0      0     A
1    2739.246765  1828.197799       0      0     B
2    2737.870194  1809.752993       0      0     C
3    2714.909853  1730.028589       0      1     A
4    2486.017394  1802.378535       0      1     B
..           ...          ...     ...    ...   ...
445  2860.263791  2604.285873      14      8     B
446  2748.469759  2547.504431      14      8     C
447  2727.006665  2548.685086      14      9     A
448  2520.319586  2580.829330      14      9     B
449  2847.109580  2636.002221      14      9     C
```

where

- `x`, `y`: 2D position in the image,
- `camera`: Camera ID (int, $[0:N_c-1]$),
- `frame` : Frame ID (int), and
- `label`: Label of the point (arbitrary strings).

Each set of the 2D points sharing a same pair of `frame` and `label` are projections of a single 3D point.  The column names can be changed to any strings, except for some special keywords used inside `pycalib.robust`. See `primary_key`, `key_cid`, `key_x`, and `key_y` options of `pycalib.robust.triangulate_consensus()`.


In [10]:
# calibration
K, D, R, T, _ = pycalib.util.load_calib('../data/ba/output.json')
P = []
for k, r, t in zip(K, R, T):
    p = k @ np.hstack([r, t])
    P.append(p)
P = np.array(P)
Nc = len(P)

# frames
Nf = 10

# keypoints
Np = 3
LABELS = [chr(ord('A') + i) for i in range(Np)]

# 3D positions
X_gt = ((np.random.random(Nf*Np*3)-0.5)*100).reshape((-1, 3))
X_gt[:,2] = np.abs(X_gt[:,2])

# 2D projections
df_gt = []
for c, (r, t, k, d) in enumerate(zip(R, T, K, D)):
    x, _ = cv2.projectPoints(X_gt.reshape((-1, 1, 3)), cv2.Rodrigues(r)[0], t, k, d)
    df_gt.append(x)
df_gt = pd.DataFrame(np.array(df_gt).reshape((-1, 2)), columns=['x', 'y'])
df_gt['camera'] = np.repeat(np.arange(Nc), Nf*Np)
df_gt['frame'] = np.tile(np.repeat(np.arange(Nf), Np), Nc)
df_gt['label'] = np.tile(LABELS, (Nf, Nc)).flatten()

print(df_gt)

assert df_gt['camera'].min() == 0
assert df_gt['camera'].max() == Nc-1


               x            y  camera  frame label
0    2772.516536  1787.419612       0      0     A
1    2441.345304  1970.267781       0      0     B
2    2598.919581  1878.666770       0      0     C
3    2790.010323  1919.305741       0      1     A
4    2432.601207  1791.026260       0      1     B
..           ...          ...     ...    ...   ...
445  2637.268239  2498.420926      14      8     B
446  2822.201354  2690.787005      14      8     C
447  2402.089874  2446.498896      14      9     A
448  2796.040765  2444.238744      14      9     B
449  2899.838088  2547.079736      14      9     C

[450 rows x 5 columns]


## Noisy data

This cell generates three datasets by injecting noise into the ground-truth dataset.

1. w/ occlusion,
2. w/ occlusion and 2D Gaussian noise, and
3. w/ occlusion, 2D Gaussian noise, and outliers.

The format (column names) is identical to the ground-truth dataset.

In [3]:
# Prepare noisy data
occlusion_ratio = 0.5
outlier_ratio = 0.1
outlier_min = 100
outlier_max = 1000
noise_px = 5

## drop some observations
df_subset = df_gt.drop(np.random.choice(df_gt.index, int(len(df_gt)*occlusion_ratio), replace=False)).reset_index()
df_occluded = df_subset.copy()

## inject gaussian noise
df_subset['x'] += np.random.normal(scale=noise_px, size=len(df_subset))
df_subset['y'] += np.random.normal(scale=noise_px, size=len(df_subset))
df_occluded_noisy = df_subset.copy()

## inject outliers
idx = np.random.choice(df_subset.index, int(len(df_subset)*outlier_ratio), replace=False)
df_subset.loc[idx,'x'] += np.random.uniform(low=outlier_min, high=outlier_max, size=len(idx)) * np.random.choice([-1,1], size=len(idx))
df_subset.loc[idx,'y'] += np.random.uniform(low=outlier_min, high=outlier_max, size=len(idx)) * np.random.choice([-1,1], size=len(idx))
df_occluded_noisy_outlier = df_subset.copy()


## Triangulation

This cell verifies that `pycalib.robust.triangulate_consensus()` can return the ground-truth 3D points from the ground-truth 2D points.  The output is a pair of two `pd.DataFrame`s.  The first one is the result of the triangulation given as follows.

- `frame`: frame ID,
- `label`: label of the point,
- `X`, `Y`, `Z`: 3D points with the highest consensus (== initial guess by binocular triangulation),
- `reproj`: tuple of length $N_c$ representing the reprojection errors,
- `outliers`: tuple of outlier camera IDs,
- `inliers`: tuple of inlier camera IDs,
- `n_outliers`: number of outliers,
- `n_inliers`: number of inliers, and
- `X_in`, `Y_in`, `Z_in`: 3D points triangulated w/ all inliers (== final output).

```:txt
    frame label          X          Y          Z   
0       0     A  -15.04173  45.726018  47.712414  \
1       0     B -11.975937  31.341524  48.409684   
2       0     C -42.903489   9.868657  33.510149   
3       1     A  39.137465 -22.033629  36.751023   
...
                                               reproj outliers   
0   [2.0114642751440117e-10, 1.237490539709991e-09...       ()  \
1   [2.5480223396404e-09, 4.211610054670847e-09, 5...       ()   
2   [6.382931077027202e-10, 3.2222949335949804e-10...       ()   
3   [9.037467216177659e-11, 3.0293490293876065e-09...       ()   
...
                                              inliers  n_outliers  n_inliers   
0   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...           0         15  \
1   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...           0         15   
2   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...           0         15   
3   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...           0         15   
...
         X_in       Y_in       Z_in  
0  -15.041730  45.726018  47.712414  
1  -11.975937  31.341524  48.409684  
2  -42.903489   9.868657  33.510149  
3   39.137465 -22.033629  36.751023  
```

The second one is a copy of the input dataframe `df` with lens undistortion.  Notice that in case of `distorted=False`, this is identical to `df`.

- options `primary_key`, `key_cid`, `key_x`, and `key_y` allow to change column names.
- option `nproc` specifies the number of concurrent processes (default: num of CPUs).
- option `reproj_th` specifies the max reprojection error in pixel to accept as an inlier.
- option `show_pbar` toggles the progress bar.


In [4]:
X, df_undistorted = pycalib.robust.triangulate_consensus(df_gt, P, distorted=True, camera_matrix=K, dist_coeffs=D)
np.testing.assert_equal(X['n_outliers'].to_numpy(), 0, err_msg="no outliers")
np.testing.assert_allclose(X[['X','Y','Z']].to_numpy().astype(float), X_gt, err_msg="should return GT values", rtol=1e-6)

  0%|          | 0/105 [00:00<?, ?it/s]

## Triangulation (w/ occlusion)

This cell verifies that
- occulsion does not introduce outliers, and
- the triangulated 3D points are identical to the ground truth.


In [5]:
X, df_occluded_undistorted = pycalib.robust.triangulate_consensus(df_occluded, P, distorted=True, camera_matrix=K, dist_coeffs=D)
np.testing.assert_equal(X['n_outliers'].to_numpy(), 0, err_msg="no outliers")
np.testing.assert_allclose(X[['X','Y','Z']].to_numpy().astype(float), X_gt, err_msg="should return GT values", rtol=1e-6)

  0%|          | 0/105 [00:00<?, ?it/s]

## Triangulation (w/ occlusion + Gaussian noise)

This cell verifies that

- noisy 2D input can reconstruct 3D points close enough to GT, and
- triangulation using all the inliers (`X_in`, `Y_in`, `Z_in`) may improve the reconstruction comapared to the binocular reconstruction (`X`, `Y`, `Z`) with the highest consensus.
  - the improvement can be very small ... `re_triangulate=False` can disable this step.


In [6]:
X, df_occluded_noisy_undistorted = pycalib.robust.triangulate_consensus(df_occluded_noisy, P, distorted=True, camera_matrix=K, dist_coeffs=D)
#np.testing.assert_equal(X['n_outliers'].to_numpy(), 0)

Y = X[['X','Y','Z']].to_numpy().astype(float)
e = np.mean(np.linalg.norm(X_gt - Y, axis=1))
l = np.mean(np.abs(X_gt))
print(f'Mean 3D displacement error = {e}, mean |X| = {l}, ratio={e/l}')
#np.testing.assert_allclose(X[['X','Y','Z']].to_numpy().astype(float), X_gt, rtol=1e-6)

Y = X[['X_in','Y_in','Z_in']].to_numpy().astype(float)
e = np.mean(np.linalg.norm(X_gt - Y, axis=1))
l = np.mean(np.abs(X_gt))
print(f'Mean 3D displacement error (w/ all inliers) = {e}, mean |X| = {l}, ratio={e/l}')


  0%|          | 0/105 [00:00<?, ?it/s]

Mean 3D displacement error = 1.0933263213558138, mean |X| = 25.10845093559927, ratio=0.04354415667298987
Mean 3D displacement error (w/ all inliers) = 0.8851353217124375, mean |X| = 25.10845093559927, ratio=0.03525248626379713


## Triangulation (w/ occlusion + Gaussian noise + outliers)

This cell verifies that

- noisy 2D input can reconstruct 3D points close enough to GT, and
- triangulation using all the inliers (`X_in`, `Y_in`, `Z_in`) may improve the reconstruction comapared to the binocular reconstruction (`X`, `Y`, `Z`) with the highest consensus.
  - the improvement can be very small ... `re_triangulate=False` can disable this step.


In [7]:
X, df_occluded_noisy_outlier_undistorted = pycalib.robust.triangulate_consensus(df_occluded_noisy_outlier, P, distorted=True, camera_matrix=K, dist_coeffs=D)
Y = X[['X','Y','Z']].to_numpy().astype(float)
e = np.mean(np.linalg.norm(X_gt - Y, axis=1))
l = np.mean(np.abs(X_gt))
print(f'Mean 3D displacement error = {e}, mean |X| = {l}, ratio={e/l}')

Y = X[['X_in','Y_in','Z_in']].to_numpy().astype(float)
e = np.mean(np.linalg.norm(X_gt - Y, axis=1))
l = np.mean(np.abs(X_gt))
print(f'Mean 3D displacement error (w/ all inliers) = {e}, mean |X| = {l}, ratio={e/l}')


  0%|          | 0/105 [00:00<?, ?it/s]

Mean 3D displacement error = 1.175635735943928, mean |X| = 25.10845093559927, ratio=0.04682231249388181
Mean 3D displacement error (w/ all inliers) = 0.9319267337224175, mean |X| = 25.10845093559927, ratio=0.03711605849810165


### False-positive / false-negative outliers

`df_occluded_noisy_outlier` is synthesized by injecting outliers with large (`outlier_min` px) displacements from the ground-truth 2D positions.  This cell checks if these outliers are correctly identified by `triangulate_consensus`.

- `FP` line: the camera is falsely labeled as outlier, due to the injected Gaussian noise.
- `FN` line: the camera is not detected correctly as outlier ... this is a bug.

In [8]:
# GT outliers = diff between df_occluded_noisy and df_occluded_noisy_outlier
e = df_occluded_noisy[['x', 'y']] - df_occluded_noisy_outlier[['x','y']]
outlier_gt = np.linalg.norm(e.to_numpy(), axis=1) > 0
outlier_gt = df_occluded_noisy_outlier.loc[outlier_gt].sort_values(by=['frame', 'label'])

# Estimated outliers
outlier_est = X[['frame', 'label', 'outliers', 'reproj']]

# Show differences between GT and estimated outliers
df = pd.merge(outlier_gt, outlier_est, how='outer', on=['frame', 'label'], suffixes=['', '_est'])
for g, d in df.groupby(by=['frame', 'label']):
    gt_is_empty = d['camera'].isnull().all()
    o_est = d['outliers'].tolist()[0]
    reproj = d['reproj'].tolist()[0]
    if gt_is_empty:
        o_gt = []
    else:
        o_gt = d['camera'].astype(int).tolist()

    for i in o_est:
        if i not in o_gt:
            print(f'FP: Frame {g[0]}, Label {g[1]}, Camera {i}, reproj={reproj[i]:.2f}px')
    for i in o_gt:
        if i in o_est:
            print(f'TP: Frame {g[0]}, Label {g[1]}, Camera {i}, reproj={reproj[i]:.2f}px')
        else:
            print(f'FN: Frame {g[0]}, Label {g[1]}, Camera {i}, reproj={reproj[i]:.2f}px')


TP: Frame 0, Label B, Camera 10, reproj=746.61px
FP: Frame 0, Label C, Camera 7, reproj=10.70px
TP: Frame 1, Label A, Camera 7, reproj=880.39px
FP: Frame 1, Label B, Camera 7, reproj=13.10px
TP: Frame 1, Label B, Camera 8, reproj=1406.81px
FP: Frame 1, Label C, Camera 2, reproj=12.92px
TP: Frame 1, Label C, Camera 10, reproj=1140.10px
TP: Frame 2, Label B, Camera 6, reproj=1119.07px
TP: Frame 2, Label B, Camera 11, reproj=1088.76px
FP: Frame 2, Label C, Camera 4, reproj=13.20px
TP: Frame 3, Label A, Camera 14, reproj=1253.99px
FP: Frame 3, Label B, Camera 7, reproj=13.70px
TP: Frame 3, Label B, Camera 2, reproj=897.40px
TP: Frame 3, Label B, Camera 5, reproj=613.04px
FP: Frame 3, Label C, Camera 10, reproj=11.57px
TP: Frame 3, Label C, Camera 9, reproj=762.22px
TP: Frame 3, Label C, Camera 12, reproj=776.71px
FP: Frame 4, Label A, Camera 2, reproj=11.07px
FP: Frame 4, Label A, Camera 8, reproj=16.88px
FP: Frame 4, Label A, Camera 9, reproj=17.14px
TP: Frame 4, Label B, Camera 12, repro

## Exercises

1. Implement RANSAC.  `triangulate_consensus()` triangulates 3D points using all possible pairs of the input cameras.
1. Implement bundle adjustment == non-linear optimization of reprojection errors.  `triangulate_consensus()` triangulates 3D points by DLT.