# Capturing Light: The Pinhole Camera Model 
To create images from our 3D representations, we need to understand how cameras capture light. The pinhole camera model is a simple yet powerful way to describe this process. Imagine a box with a tiny hole on one side (the pinhole) and a light-sensitive surface (the image plane) on the opposite side. Light rays from the 3D world pass through the pinhole and project an inverted image onto the image plane.

| ![2d-train](https://imgur.com/4hoTcNA.png) |
| :---: |
| **Figure 1**: Pinhole camera. |



## Camera Model
Coordinate Systems: We'll use two coordinate systems:

1. World Coordinates: $(x_w, y_w, z_w)$ to describe points in the 3D world.
2. Image Coordinates: $(x, y)$ to describe points on the 2D image plane.
3. Pinhole (Center of Projection):  Located at the origin of the world coordinates $(0, 0, 0)$.
4. Image Plane: Located at a distance f (focal length) along the Z-axis, with the center at $(0, 0, f)$.
5. Projection: A 3D point $P = (x_w, y_w, z_w)$ is projected onto the image plane at point $p = (x, y)$ through the pinhole.


The idea is to understand how to perform transformations between each of the frames.

| ![2d-train](https://imgur.com/EvLTR2w.png) |
| :---: |
| **Figure 2**: Camera model. |


## From World to Camera

### World Coordinates
We usually represent the **world** coordinates using a vector $\mathbf{X}_w\in\mathbb{R}^3$ defined as:
$$
\mathbf{X}_w=\begin{bmatrix}
x_w \\
y_w \\
z_w
\end{bmatrix}
$$.

### Camera Coordinates
This is a dynamic reference frame that moves as we move the camera to take pictures. If a point $P$ has coordinates $\mathbf{X}_w$ in the *world frame*, in the **camera frame** we assign the coordinates:
$$
\mathbf{X}_c=\begin{bmatrix}
x_c \\
y_c \\
z_c
\end{bmatrix}
$$

**The location of point $P$ does not change. Only the way we look at the point changes with the change in the camera's reference frame.**

### Coordinate Transformation
It is possible to transform the coordinates of point $P$ from world to camera using the following equation:
$$
\mathbf{X}_c=R\times(\mathbf{X}_w-\mathbf{C}_w),
$$
where:
1. $R$ is a matrix representing the **orientation** of the camera with respect to the *world* coordinates.
2. $\mathbf{C}_w$ is a vector representing the **position** of the camera with respect to the *world* coordinates.

The matrix $C_{ext}$ is known as **extrinsic parameter matrix** because it represents the rotation and translation values, which are external properties of the camera.


## Image Formation

Let's return to the reference frames of the camera and the image

| ![2d-train](https://imgur.com/GCMmOLa.png) |
| :---: |
| **Figure 1**: Modelo proyectivo. |


### Projection to Image Plane
The projection to the image plane is done using the intrinsic parameters of the camera. If a point $P$ has coordinates $\mathbf{X}_c$ in the camera frame, its projection onto the image plane is given by:
$$
\begin{align}
x_i=f\frac{x_c}{z_c} \\
y_i=f\frac{y_c}{z_c}
\end{align}
$$
It is common for the image origin to be the top-left corner, so it is convenient to add offsets with respect to the image plane center:
$$
\begin{align}
x_i=f\frac{x_c}{z_c}+\delta_x \\
y_i=f\frac{y_c}{z_c}+\delta_y
\end{align}
$$
We can rewrite the previous equations as:
\begin{equation}
\begin{bmatrix}
fx_c+ z_c\delta_x\\
fy_c+ z_c\delta_y\\
~z_c\\
\end{bmatrix}=
\begin{bmatrix}
f_x&0&\delta_x&0\\
0&f_y&\delta_y&0\\
0&0&1&0\\
\end{bmatrix} \begin{bmatrix}
x_{c}\\
y_{c}\\
z_{c}\\
1\\
\end{bmatrix}
\tag{3}
\end{equation}
The first matrix on the right-hand side of the equation is known as the intrinsic parameters matrix $K$. It is termed intrinsic because it includes the focal length $f$ and the image plane center, which are internal properties of the camera.


In [None]:
# COLLAPSED
import torch

from nerfstudio.cameras.cameras import Cameras, CameraType
from nerfstudio.utils import plotly_utils as vis

cx = 20.0
cy = 10.0
fx = 10.0
fy = 20.0

PERSPECTIVE = CameraType.PERSPECTIVE
FISHEYE = CameraType.FISHEYE

c2w = torch.eye(4)[None, :3, :]

camera = Cameras(fx=fx, fy=fy, cx=cx, cy=cy, camera_to_worlds=c2w, camera_type=FISHEYE)
fig = vis.vis_camera_rays(camera)
fig.show()

In [None]:
import plotly.graph_objects as go
import torch

from nerfstudio.cameras.cameras import Cameras, CameraType
from nerfstudio.utils import plotly_utils as vis

# OUTPUT_ONLY
height = 15
width = 15

from plotly.subplots import make_subplots

distortion_names = ("k1", "k2", "p1", "k3", "k4", "p2")
distortion_min = (0.01, 0.001, 0.01, 0.001, -0.05, -0.05)
distortion_max = (0.05, 0.05, 0.05, 0.05, 0.05, 0.05)
fig = make_subplots(rows=2, cols=3, vertical_spacing=0.1, subplot_titles=distortion_names)

num_steps = 19
all_steps = []
for i in range(len(distortion_names)):
    for step in torch.linspace(distortion_min[i], distortion_max[i], num_steps):
        distortion_params = torch.zeros(6)
        distortion_params[i] = step

        from nerfstudio.cameras import camera_utils

        coords = torch.meshgrid(torch.linspace(-1, 1, height), torch.linspace(-1, 1, width), indexing="ij")
        coords = torch.stack(coords, dim=-1)

        coords = camera_utils.radial_and_tangential_undistort(coords, distortion_params)

        fig.add_trace(
            go.Scatter(
                x=coords[..., 0].flatten(),
                y=coords[..., 1].flatten(),
                mode="markers",
                marker=dict(
                    size=10,
                ),
                visible=False,
            ),
            row=i % 2 + 1,
            col=i // 2 + 1,
        )

    fig.data[num_steps * i + num_steps // 2].visible = True

# Create and add slider
steps = []
for i in range(num_steps):
    step = dict(method="update", args=[{"visible": [False] * len(fig.data)}], label="")
    for d in range(len(distortion_names)):
        step["args"][0]["visible"][num_steps * d + i] = True
    steps.append(step)

sliders = [
    dict(
        active=num_steps // 2,
        pad={"t": 10},
        steps=steps,
        len=0.5,
        x=0.5,
        xanchor="center",
    )
]

webdocs_layout = go.Layout(
    height=600,
    sliders=sliders,
    margin=dict(r=0, b=0, l=0, t=20),
    hovermode=False,
    showlegend=False,
    paper_bgcolor="rgba(0,0,0,0)",
    plot_bgcolor="rgba(0,0,0,0)",
    font=dict(color="#a3a3a3", size=18),
)
fig.for_each_yaxis(lambda x: x.update(range=[-1.3, 1.3], constrain="domain", showgrid=False, visible=False))
fig.for_each_xaxis(lambda x: x.update(scaleanchor="y", scaleratio=1, showgrid=False, visible=False))
fig.update_layout(webdocs_layout)

fig.show()