# Bundle adjustment of $n$-cameras

### Goal

Suppose $n$ cameras have been calibrated linearly beforehand, using 2D-2D correspondences for example.  We assume we have an initial guess of $K$, $R$ and $t$ for each camera satisfying $x \sim K (R X + t)$, the 2D observation $x$, and the 3D point $X$.

* The initial guess is not necessarily done by linear methods.  It can even be done by hand.
* The 3D point $X$ can be triangulated from $x$ using the initial $K, R, t$.
* The $n$ cameras do not necessarily observe all the points.

Given $K, d, R, t, x, X$, this notebook optimizes $K, d, R, t, X$ so as to minimize the reprojection error.

* Input:
  * initial guess of $K, d, R, t, X$ and 2D observations $x$
  * (optional) mask to specify the parameter to optimize
* Output:
  * optimal $A, d, R, t, X$
 

## Libraries

In [5]:
%matplotlib notebook
import sys, os, cv2
import numpy as np
from glob import glob
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from pycalib.plot import plotCamera
from pycalib.ba import bundle_adjustment, encode_camera_param, decode_camera_param, make_mask
from pycalib.calib import lookat, triangulate

np.random.seed(0)

## Synthetic data

In [6]:
# 3D points
# X_gt = (np.random.rand(16, 3) - 0.5)*5 # random points centered at [0, 0, 0]
X_gt = np.array(np.meshgrid(np.linspace(-1, 1, 3), np.linspace(-1, 1, 3), np.linspace(-1, 1, 3))).reshape((3, -1)).T  # 3D grid points
Np = X_gt.shape[0]
print('X_gt:', X_gt.shape)

# Camera intrinsics
K = np.array([[600, 0, 320], [0, 600, 240], [0, 0, 1]]).astype(np.float64)  # VGA camera

# Camera poses: cameras are at the vertices of a hexagon
t = 2 * np.pi / 5 * np.arange(5)
v_gt = np.vstack((10*np.cos(t), 10*np.sin(t), np.zeros(t.shape))).T
Nc = v_gt.shape[0]
R_gt = []
t_gt = []
P_gt = []
rvec_gt = []
for i in range(Nc):
    t = v_gt[i,:]
    R, t = lookat(t, np.zeros(3), np.array([0, 1, 0]))
    R_gt.append(R)
    t_gt.append(t)
    P_gt.append(K @ np.hstack((R, t)))
    rvec_gt.append(cv2.Rodrigues(R)[0])
R_gt = np.array(R_gt)
t_gt = np.array(t_gt)
P_gt = np.array(P_gt)
rvec_gt = np.array(rvec_gt)
print('R_gt:', R_gt.shape)
print('t_gt:', t_gt.shape)
print('P_gt:', P_gt.shape)
print('rvec_gt:', rvec_gt.shape)

# 2D observations points
x_gt = []
for i in range(Nc):
    xt = cv2.projectPoints(X_gt.reshape((-1, 1, 3)), rvec_gt[i], t_gt[i], K, None)[0].reshape((-1, 2))
    x_gt.append(xt)
x_gt = np.array(x_gt)
print('x_gt:', x_gt.shape)

# Verify triangulation
Y = []
for i in range(Np):
    y = triangulate(x_gt[:,i,:].reshape((-1,2)), P_gt)
    #print(y)
    Y.append(y)
Y = np.array(Y).T
Y = Y[:3,:] / Y[3,:]
assert np.allclose(0, X_gt - Y.T)

# Verify z > 0 at each camera
for i in range(Nc):
    Xc = R_gt[i] @ X_gt.T + t_gt[i]
    assert np.all(Xc[2, :] > 0)

    
# Inject gaussian noise to the inital guess
R_est = R_gt.copy()
t_est = t_gt.copy()
K_est = np.array([K for c in range(Nc)])
X_est = X_gt.copy()
x_est = x_gt.copy()

for i in range(Nc):
    R_est[i] = cv2.Rodrigues( cv2.Rodrigues(R_est[i])[0] + np.random.normal(0, 0.01, (3,1)) )[0]
    t_est[i] += np.random.normal(0, 0.01, (3,1))
    K_est[i][0,0] = K_est[i][1,1] = K_est[i][0,0] + np.random.normal(0, K_est[i][0,0]/10)

X_est += np.random.normal(0, 0.01, X_est.shape)
x_est += np.random.normal(0, 0.1, x_est.shape)

X_gt: (27, 3)
R_gt: (5, 3, 3)
t_gt: (5, 3, 1)
P_gt: (5, 3, 4)
rvec_gt: (5, 3, 1)
x_gt: (5, 27, 2)


## Bundle adjustment

The initial camera parameters `camera_params` is an $n \times 17$ matrix each row of which consists of a set of camera parameters

- `0:3`: $R$ (Rodrigues vector),
- `3:6`: $t$,
- `6`: $f$,
- `7`: $u_0$,
- `8`: $v_0$,
- `9`: $k_1$,
- `10`: $k_2$,
- `11`: $p_1$,
- `12`: $p_2$,
- `13`: $k_3$,
- `14`: $k_4$,
- `15`: $k_5$, and
- `16`: $k_6$.

To specify the camera parameters to be optimized, we can provide `mask` parameter.  `mask` can be either of

- a 17-dim `bool` vector,
  - which specifies the camera parameters to be optimized for all the cameras at once,
- an $n \times 17$ `bool` matrix,
  - which specifies the camera parameters to be optimized for each camera, or
- an $n \times 17$ `int` matrix,
  - which specifies the camera parameters to be optimized for each camera by positive intergers, and parameters sharing a same mask value will share a same parameter value.

That is, specifying a 17-dim `bool` vector `m` is equivallent to specifying an $n \times 17$ `bool` matrix `M` each of its row is `m`, and to specifying an $n \times 17$ `int` matrix which has zero where `M` is `False` and has unique values where `M` is `True`.

Possible use cases of this mask include:

1. Moving camera: optimize $R, t$ of each camera (= frame), while optimizing a single $K, d$ shared by all the cameras.
2. Pan-tilt camera: optimize $R$ of each camera (= frame), while optimizing a single $t, K, d$ shared by all the cameras.
3. Pan-tilt-zoom camera: optimize $R, K, d$ of each camera (= frame), while optimizing a single $t$ shared by all the cameras.

The following cell demonstrates the first scenario in which a single $K, d$ is shared by all the cameras.

In [None]:
# Initial camera parameters
camera_params = []
for i in range(Nc):
    c = encode_camera_param(R_est[i], t_est[i], K_est[i], np.zeros(5))
    camera_params.append(c)
camera_params = np.array(camera_params)

# camera_indices[i] == the camera observes point_2d[i,:]
camera_indices = np.repeat(np.arange(Nc), Np)

# point_indices[i] == the 3D point behind point_2d[i,:]
point_indices = np.tile(np.arange(Np), Nc)

# Optimization target: 1D mask == all cameras share a same mask
# R, t, f, u0, v0, k1, k2, p1, p2, k3 (no ratioral lens distortion model)
mask = make_mask(True, True, f=True, u0=True, v0=True, k1=True, k2=True, p1=True, p2=True, k3=True, k4=False, k5=False, k6=False)

# (Optional) 2D bool mask == Nc x mask == camera-wise mask
mask = np.tile(mask, (Nc, 1))
mask[1, 6:] = False  # fix the intrinsics of CAM1

# (Optional) 2D int mask to indicate shared params
m = np.arange(len(mask.flatten())).reshape(mask.shape) + 1
m[mask == False] = 0
mask = m
## share intrinsic params
mask[:, 6:] = mask[0, 6:]

print('Mask = Nc x 17 (R:3, t:3, f, u0, v0, v1, k2, p1, p2, k3, k4, k5, k6)')
print('0 = fixed, same int = shared')
print(mask)

# bundle_adjustment accepts both 1D and 2D masks
cam_opt, X_opt, e, ret = bundle_adjustment(camera_params, X_est, camera_indices, point_indices, x_est.reshape((-1, 2)), mask=mask)

#assert False

print('reproj error = ', e)
print(X_gt)
print(X_opt)

for i in range(Nc):
    print(f'\nCamera {i}')
    print('before:')
    print(camera_params[i])
    print('after:')
    print(cam_opt[i])

    assert np.allclose(camera_params[i,7:], camera_params[0,7:]), 'Camera intrinsics do not match'

assert e < 0.3014606499808387 # shared intrinsics


Mask = Nc x 17 (R:3, t:3, f, u0, v0, v1, k2, p1, p2, k3, k4, k5, k6)
0 = fixed, same int = shared
[[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14  0  0  0]
 [18 19 20 21 22 23  7  8  9 10 11 12 13 14  0  0  0]
 [35 36 37 38 39 40  7  8  9 10 11 12 13 14  0  0  0]
 [52 53 54 55 56 57  7  8  9 10 11 12 13 14  0  0  0]
 [69 70 71 72 73 74  7  8  9 10 11 12 13 14  0  0  0]]
   Iteration     Total nfev        Cost      Cost reduction    Step norm     Optimality   
       0              1         3.3163e+03                                    2.97e+03    
       1              2         2.1626e+00      3.31e+03       6.40e+01       1.67e+02    
       2              3         6.8656e-01      1.48e+00       3.40e+02       1.07e+00    
       3              4         6.8588e-01      6.77e-04       1.77e+01       9.12e-01    
       4              5         6.8545e-01      4.24e-04       3.14e+01       3.97e-02    
       5              6         6.8545e-01      3.47e-07       5.18e-01       4.13e-0