Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Wiki Home ▸ Math Cookbook
The MVE camera conventions use common textbook notation, e.g. from the book "Multiple View Geometry in Computer Vision" by Hartley and Zisserman. The projection of a 3D point
X in world coordinates to a 2D point
x on the image plane in homogeneous coordinates computes as follows:
x = K * (R * X + t)
K is the calibration matrix,
R is the world to camera rotation matrix, and
t is the camera translation vector.
t are referred to as extrinsic camera parameters. The calibration matrix
K is assembled from quantities referred to as intrinsic camera parameters, described below. The inverse projection from a 2D image coordinate
x in homogeneous coordinates to a 3D point in world coordinates with respect to a depth
d is computed as:
X = R^T * (K^-1 * x * d - t)
The extrinsic parameters transform 3D points
X in world coordinates into 3D points in camera coordinates
X' = R * X + t. This transformation can also be applied using homogeneous coordinates
X' = (R|t) * X where
(R|t) is a 3x4 matrix. The translation vector is computed from the known camera center as
t = -R * c. The camera center is computed from the known translation as
c = -R-1 * t. The inverse of
R can be obtained by transposing
R (only if
R is a proper rotation matrix, i.e.
R-1 = RT). To transform a point in camera coordinates to world coordinates, the inverse world-to-camera, or camera-to-world, transformation is applied:
X = R-1 * (X' - t).
The extrinsic parameters perform a transformation into the camera coordinate system. The camera coordinate system conventions are those of Hartley and Zisserman: The camera is looking along the positive z-axis, the x-axis goes to the left and the y-axis goes upwards.
The calibration matrix is composed of the focal length of the camera, the principal point of the image plane, and the pixel aspect ratio. The focal length is normalized in the following way: Suppose the longer side of the image plane has length 1 in 3D space. Then the normalized focal length is the orthogonal distance from the camera center to the image plane. For example, the normalized focal length of a 70mm lens projecting on a 35mm sensor is 2.
For a 3D point
X' in camera coordinates, the projection on the image plane is computed as
x = p(K * X') where
p(x') is a function that performs the central projection, i.e. divides by the third coordinate in order to get a point on the image plane at distance 1 from the camera center.
The calibration matrix
K can directly be defined such that image coordinates are obtained. This is done by scaling the focal length with the largest dimension, i.e. with
max(width, height), and setting the principal point to
width / 2 and
height / 2 respectively. This yields continuous coordinates on the image plane between (0,0) and (width, height). The center of pixel (0,0) is at (0.5, 0.5), i.e. the obtained coordinates on the image plane need to be subtracted by (0.5, 0.5) to obtain pixel coordinates.