<a href="https://colab.research.google.com/github/ggmeiner22/Estimating-Camera-Pose-from-a-Planar-Object/blob/main/EstimatingCameraPoseFromAPlanarObject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Goal

Given a calibrated camera (i.e., known intrinsics **K**) and a single image of a planar object, estimate the camera **pose** (i.e., extrinsics **R**, **t**).

I will:

1. build a Gradio UI to click 2D features and manage point order,
2. estimate pose **from a homography** (explicit derivation), and
3. estimate pose using **OpenCV** functions,
4. compare the two (short notes only).

Installs dependencies

In [2]:
%pip -q install opencv-python numpy gradio matplotlib

Imports

In [3]:
import json, io, math
import numpy as np
import cv2 as cv
import gradio as gr
import matplotlib.pyplot as plt

from matplotlib.figure import Figure

Load inrinsics from JSOn

In [4]:
def load_intrinsics_from_json(json_bytes):
    d = json.loads(json_bytes.decode("utf-8"))
    K = np.array(d["K"], dtype=np.float64)
    dist = np.array(d.get("distCoeffs", [0,0,0,0,0]), dtype=np.float64).reshape(-1,1)
    return K, dist

Load 2D model points

In [5]:
def load_model_points_json(json_bytes):
    """
    Load model points from a JSON file with structure:
    {
      "units": "meters",
      "square_size_meters": 0.02177,
      "rows": 6,
      "cols": 9,
      "ordering": "row-major from (0,0)",
      "points": [
        { "X": 0.0, "Y": 0.0, "Z": 0.0 },
        ...
      ]
    }
    Returns: Nx3 numpy array [X,Y,Z]
    """
    import json, numpy as np

    # Parse JSON
    data = json.loads(json_bytes.decode("utf-8"))

    # Must have "points" array
    if "points" not in data:
        raise ValueError("JSON must have a 'points' array with X,Y,Z fields.")

    pts = data["points"]
    if not all("X" in p and "Y" in p for p in pts):
        raise ValueError("Each point must have X and Y (Z optional).")

    # Extract X,Y,Z into array
    X = [float(p["X"]) for p in pts]
    Y = [float(p["Y"]) for p in pts]
    Z = [float(p.get("Z", 0.0)) for p in pts]

    arr = np.column_stack([X, Y, Z]).astype(np.float64)

    # Optional: sort row-major if rows/cols exist
    if "rows" in data and "cols" in data:
        rows, cols = int(data["rows"]), int(data["cols"])
        if len(arr) == rows * cols:
            arr = arr.reshape(rows, cols, 3).reshape(-1, 3, order="C")

    return arr


Pose from a homography (explicit derivation)

In [6]:
def dlt_homography(model2d, image2d):
    """DLT (no RANSAC): model2d Nx2 (plane coords, Z=0), image2d Nx2 (pixels)."""
    N = model2d.shape[0]
    A = []
    for i in range(N):
        X, Y = model2d[i]
        x, y = image2d[i]
        A.append([0,0,0, -X,-Y,-1,  y*X, y*Y, y])
        A.append([X,Y,1,  0, 0, 0, -x*X,-x*Y,-x])
    A = np.asarray(A, dtype=np.float64)
    _, _, Vt = np.linalg.svd(A)
    H = Vt[-1].reshape(3,3)
    return H / H[2,2]

def pose_from_homography_explicit(K, model2d, image2d):
    """
    Explicit H->(R,t):
    1) H via DLT
    2) Remove K: B = K^{-1} H
    3) Scale so ||b1||≈||b2||=1
    4) r1=b1*s, r2=b2*s, r3=r1×r2
    5) Orthonormalize with SVD, enforce det(R)=+1
    """
    H = dlt_homography(model2d, image2d)
    Kinv = np.linalg.inv(K)
    B = Kinv @ H
    b1, b2, b3 = B[:,0], B[:,1], B[:,2]

    s = 1.0 / ((np.linalg.norm(b1) + np.linalg.norm(b2)) / 2.0 + 1e-12)
    r1 = b1 * s
    r2 = b2 * s
    t  = b3 * s

    R_approx = np.column_stack([r1, r2, np.cross(r1, r2)])
    U, _, Vt = np.linalg.svd(R_approx)
    R = U @ Vt
    if np.linalg.det(R) < 0:
        U[:,-1] *= -1
        R = U @ Vt
        t = -t  # keep consistent handedness

    return R, t.reshape(3,1), H

Pose using OpenCV functions

In [8]:
def pose_from_opencv_solvepnp(K, distCoeffs, model3d, image2d):
    ok, rvec, tvec = cv.solvePnP(model3d, image2d, K, distCoeffs, flags=cv.SOLVEPNP_ITERATIVE)
    if not ok:
        raise RuntimeError("solvePnP failed")
    R, _ = cv.Rodrigues(rvec)

    # Enforce det(R)=+1
    if np.linalg.det(R) < 0:
        U, _, Vt = np.linalg.svd(R)
        R = U @ Vt
        tvec = -tvec
    return R, tvec