In [1]:
import cv2 as cv
import numpy as np

In [None]:
calib_data_path = "..//calibrations/calibrated_data/MultiMatrix_rpi.npz"
calib_data = np.load(calib_data_path)

In [3]:
print(calib_data.files)

['camMatrix', 'distCoef', 'rVector', 'tVector']


In [4]:
cam_mat = calib_data["camMatrix"] # assigning the datas caliberated through the MultiMatrix file
dist_coef = calib_data["distCoef"]


In [5]:
cam_mat

array([[5.43002586e+03, 0.00000000e+00, 7.40231366e+02],
       [0.00000000e+00, 5.25183503e+03, 3.62542665e+02],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

Parts of the Camera Matrix

fx = focal length in pixels (along x-axis)

fy = focal length in pixels (along y-axis)

cx = principal point x (optical center)

cy = principal point y

In [8]:
fx = cam_mat[0, 0]
fy = cam_mat[1, 1]
cx = cam_mat[0, 2]
cy = cam_mat[1, 2]

print(f"fx = {fx:.2f}, fy = {fy:.2f}, cx = {cx:.2f}, cy = {cy:.2f}")

fx = 5430.03, fy = 5251.84, cx = 740.23, cy = 362.54


In [9]:
# Suppose rectangle in real world is 50 mm wide
W_real = 50  # mm

# Detected width of that rectangle in image (in pixels)
W_px = 480  # pixels

Z = (fx * W_real) / W_px
print(f"Estimated distance to object: {Z:.2f} mm")

Estimated distance to object: 565.63 mm


✅ Yes, exactly!

If the width in pixels increases from 400 to 600, you can estimate how much the object has moved closer to the camera (i.e., "climbed" toward it):

$$
Z_{\text{new}} = Z_{\text{original}} \times \frac{\text{original\_width\_px}}{\text{new\_width\_px}}
$$

🧮 Example:

* Original Z = 564 mm
* Width increased from 400 px → 600 px

$$
Z_{\text{new}} = 564 \times \frac{400}{600} = 376 \text{ mm}
$$

📏 So, the block climbed ≈ 188 mm closer to the camera.


So the idea is:

- The block of known dimension is captured and width px measured at ground + 0

- We can get the Z height of the block using the formula
    
    - We can print the Z height output to the display

    - we can also print the width of the block in mm and in pixels 

- Then we arbitrarily increase the height of the block to ground + Y mm

    - We calculate the new height and update to the display

- 

| Spec                       | Value             |
| -------------------------- | ----------------- |
| Sensor width (mm)          | 3.68 mm           |
| Sensor height (mm)         | 2.76 mm           |
| Pixel pitch (single pixel) | 1.12 µm × 1.12 µm |

Let’s compute it step-by-step using the full sensor dimensions (Sony IMX219 = 3.68mm × 2.76mm):

1. For main stream (1280 × 720):
Pixel size X:

3.68
 mm
1280
≈
0.002875
 mm/px
1280
3.68 mm
​
 ≈0.002875 mm/px
Pixel size Y:

2.76
 mm
720
≈
0.003833
 mm/px
720
2.76 mm
​
 ≈0.003833 mm/px
2. For lores stream (640 × 480):
Pixel size X:

3.68
 mm
640
≈
0.00575
 mm/px
640
3.68 mm
​
 ≈0.00575 mm/px
Pixel size Y:

2.76
 mm
480
≈
0.00575
 mm/px
480
2.76 mm
​
 ≈0.00575 mm/px

> ✅ But it contains the information needed to relate pixels and real-world units — through focal length and sensor dimensions.

Let’s unpack this.

---

### 📸 The Camera Matrix (Intrinsic Matrix K)

$$
K = \begin{bmatrix}
f_x & 0 & c_x \\
0 & f_y & c_y \\
0 & 0 & 1
\end{bmatrix}
$$

* fx and fy are focal lengths in **pixels**
* cx and cy are the principal point (optical center) in pixels

To convert between mm and pixels, you need:

* Sensor size in mm (e.g., 3.6mm × 2.7mm)
* Image resolution in pixels (e.g., 640 × 480)

---

### 🧠 How fx is Computed:

$$
f_x = \frac{f_{\text{mm}}}{s_x}, \quad f_y = \frac{f_{\text{mm}}}{s_y}
$$

Where:

* $f_{\text{mm}}$ is focal length in millimeters
* $s_x$, $s_y$ are pixel sizes in mm/pixel (depends on sensor resolution and physical size)

So:

$$
s_x = \frac{\text{sensor width in mm}}{\text{image width in px}}, \quad
s_y = \frac{\text{sensor height in mm}}{\text{image height in px}}
$$

Then:

$$
\text{pixel size} = \text{mm/pixel} = \frac{1}{f_x} \cdot f_{\text{mm}}
$$

---

### ✅ How to Use the Camera Matrix for Real-World Scaling

Let’s say:

* You know the rectangle width in mm (e.g., 100 mm)
* You measured its width in pixels on the image (e.g., 250 px)
* fx = 700 pixels (from camera matrix)

Then:

$$
Z = \frac{f_x \cdot \text{real\_width\_mm}}{\text{width\_px}}
= \frac{700 \cdot 100}{250} = 280 \text{ mm}
$$

So fx acts like a bridge from real-world mm to pixels.

---

### 📐 Want mm-per-pixel?

Use this:

$$
\text{mm per pixel} = \frac{\text{sensor width in mm}}{\text{image width in px}}
$$

For example, Pi Camera v2 has:

* Sensor width ≈ 3.68 mm
* Resolution = 3280 pixels wide

So:

$$
s_x = \frac{3.68}{3280} ≈ 0.00112 \text{ mm/px}
$$

---

### 🔁 Summary

| You have...       | You get...                               |
| ----------------- | ---------------------------------------- |
| fx in pixels      | Projection scale from 3D mm to 2D pixels |
| fx + known width  | Estimate Z (depth) using pinhole model   |
| Sensor size & res | Pixel size in mm/px                      |

---

If you tell me:

* your fx from camera matrix,
* sensor size (or Pi Camera model),
* resolution used for capture,

I can compute the exact mm/pixel scale or give a calibrated version of depth estimation for you.

In [6]:
import cv2

cap = cv2.VideoCapture(0)

# Request a specific resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

ret, frame = cap.read()
height, width = frame.shape[:2]

print(f"Actual resolution: {width}x{height}")
cap.release()

Actual resolution: 1280x720
