Skip to content
Simon Fuhrmann edited this page May 29, 2013 · 14 revisions

Wiki HomeMath Cookbook

Camera Conventions

The MVE camera conventions use common textbook notation, e.g. from the book "Multiple View Geometry in Computer Vision" from Hartley and Zisserman. The projection of a 3D point X in space on the image plane works as follows:

x = K * (R * X + t)

where K is the calibration matrix, R is the world to camera rotation matrix, and t is the camera translation vector. R and t are referred to as extrinsic camera parameters. The calibration matrix is assembled from quantities referred to as intrinsic camera parameters.

Extrinsic Parameters

The extrinsic parameters transform 3D points X in world coordinates into 3D points in camera coordinates X' = R * X + t. This transformation can also be applied using homogeneous coordinates X' = (R|t) * X where (R|t) is a 3x4 matrix. The translation vector is computed from the known camera center as t = -R * c. The camera center is computed from the known translation as c = -R-1 * t. The inverse of R can be obtained by transposing R (only if R is a proper rotation matrix). To transform a point in camera coordinates to world coordinates, the inverse world-to-camera, or camera-to-world, transformation is applied: X = R-1 * X' - R-1 * t.

Intrinsic Parameters

The calibration matrix is composed of the focal length of the camera, the principal point of the image plane, and the pixel aspect ratio. The focal length is normalized in the following way: Suppose the longer side of the image plane has length 1 in 3D space. Then the normalized focal length is the orthogonal distance from the camera center to the image plane.