# Day 4: Lesson: Camera Model Demo

CSSE 461 / Rose-Hulman Institute of Technology / Winter 2025-26

Kyle Wilson

## Plan for today: 

- There's a camera on a tripod in the front of the room
- We'll define a room (world) coordinate system
- We'll write code that takes 3D points in room coordinates and gives 2D pixels

In [1]:
# I like to group imports at the top of the file
import numpy as np

## Step 1: Camera 3D to Pixels

Let's do one part at a time. Let's think about intrinsics first, and extrinsics later. 

The identity transformation extrinsics are $\mathbf{R} = \mathbf{I}_3$ and $\mathbf{t} = [0,0,0]^\top$. Write the placeholder extrinsics matrix that represents this identity transformation: 

In [2]:
E = np.array([
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0]
])

### Camera Intrinsics

Our camera is my beloved Gen 1 Fujifilm X100. Here are some online [specs](https://www.dpreview.com/products/Fujifilm/compacts/fujifilm_x100/specifications). Build the camera intrinsic matrix:

In [None]:
width = 4288   # px
height = 2848   # px
focal_length = 23e-3   # m
sensor_width = 23.6e-3   # m
sensor_height = 15.8e-3  # m

fx = focal_length * width / sensor_width
fy = focal_length * height / sensor_height
cx = width / 2
cy = height / 2


K = np.array([
    [fx, 0, cx],
    [0, fy, cy],
    [0,  0,  1]
])

array([[4.17898305e+03, 0.00000000e+00, 2.14400000e+03],
       [0.00000000e+00, 4.14582278e+03, 1.42400000e+03],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

In [4]:
aspect_px = width / height
print(aspect_px)
aspect_mm = sensor_width / sensor_height
print(aspect_mm)

1.5056179775280898
1.4936708860759491


### Projection

Write a function that takes a 3D point, $\mathbf{K}$, and $\mathbf{E}$ and projects that point to a pixel coordinate.

In [7]:
def project_point(point_3d, K, E):
    point_3d_homogeneous = np.append(point_3d, 1)  # Convert to homogeneous coordinates
    point_2d_homogeneous = np.dot(E, point_3d_homogeneous)
    pixel_2d_homogeneous = np.dot(K, point_2d_homogeneous)
    pixel_2d = pixel_2d_homogeneous[:2] / pixel_2d_homogeneous[2]  # Convert back to Cartesian coordinates
    return pixel_2d
    

project_point(np.array([1, 0, 5]), K, E)

array([2979.79661017, 1424.        ])

In [9]:
(4288 / 2979) ** -1


0.6947294776119403

## Step 2: Real Extrinsics

Choose a room coordinate system. Describe it here:

Origin is NW corner of classroom on the floor.
- X axis towards main door
- Y axis front of classroom
- Z axis up

Now write an $\mathbf{R}$ and $\mathbf{t}$ that map world coordinates to camera coordinates

In [10]:
R = np.array([
    [-1, 0, 0],
    [0, 0, -1],
    [0, -1, 0]
])
t = np.array([4.25, 1.7, 8])


# Construct the extrinsic matrix E from R and t
E = np.zeros((3, 4))
E[:, :3] = R
E[:, 3] = t
print(E)

[[-1.    0.    0.    4.25]
 [ 0.    0.   -1.    1.7 ]
 [ 0.   -1.    0.    8.  ]]


In [14]:
# Now test it!
project_point(np.array([4.25, 9, 1.7]), K, E)

array([2144., 1424.])