# README

Previously, we used "image_to_colmap.ipynb" to process our custom image dataset. This produced a few text files to help us match our custom images to each other and create a 3D scene understanding for our image dataset.

Now, this notebook "colmap_to_ray_rgb_data.ipynb"  is used for processing the outputs from the previous "image_to_colmap.ipynb" notebook. This notebook will generate the numpy array containing all ray origin, ray direction and rgb data from our custom dataset.

\

Next, we will list out the inputs and outputs of this notebook.

\

<b>NOTEBOOK INPUT</b> (all in the same root folder as this notebook)
- zip file containing custom image dataset
- cameras.txt
- images.txt

\

<b>NOTEBOOK OUTPUT</b>
- full_ray_rgb_array.npz

\

<b>Output Details</b>

- "full_ray_rgb_array.npz" is a numpy array of shape (num_rays, 9).
- It contains all the ray origin, ray direction and rgb data from our custom dataset.

\

Here is what an example row in the "full_ray_rgb_array.npz" will look like:

[
  x_origin, y_origin, z_origin, &nbsp;  # Ray origin \
  dx, dy, dz, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Ray direction \
  r_value, g_value, b_value] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Pixel color \

\


# Import image dataset

Assuming you are using google colab, you will be prompted to upload the zip file containing your image dataset. The images will be extracted into 'images_dir', which has been defined to be the root of our working colab folder.

In [1]:
images_dir = "/content"

from google.colab import files
uploaded = files.upload()  # Upload merlion_iphone_nerf.zip manually here


import zipfile

zip_path = list(uploaded.keys())[0]  # rename extracted folder to 'iphone'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(images_dir)

Saving merlion_iphone_nerf.zip to merlion_iphone_nerf.zip



\

# Set up utility functions

In this section, we implement the core utility functions. Here is a short description of what each function does, and why they are needed:

\

*   <b>qvec2rotmat(qvec)</b>: converts a quarternion vector into a 3x3 rotation matrix.

> COLMAP outputs camera orientations as quarternions, converting them to rotation matrices is necessary to build the camera-to-world transformation.


*   <b>parse_colmap_text(folder)</b>: parses 'cameras.txt' and 'images.txt', then extracts camera intrinsics and post information.

>This extracts the essential metadata (intrinsics and extrinsics) from COLMAP's sparse reconstruction, which enables ray generation


*   <b>get_rays(H, W, fx, fy, cx, cy, c2w</b>: computes ray origins and directions in world coordinates for each pixel in an image, given camera intrinsics and a camera-to-world matrix.

> NeRF requires camera rays for each image pixel in world space, which are computed using pinhole camera parameters and camera poses in this function.


*   <b>generate_ray_rgb_data(images_txt_dir, image_folder)</b>: main function which converts COLMAP camera and image data into per-pixel ray origins, directions and RGB values. Saves output as compressed .npz files.

> This is the main preprocessing step which prepares training data for our NeRF model.


*   <b>merge_batches_to_npz_streaming(qvec)</b>: merges all per-image .npz ray batches into a single compressed dataset using memory-efficient streaming, then (optionally) cleans up temporary files.

> Helps consolidate large per-image ray datasets into a single output array. This function specifically avoids memory (RAM) overload by using disk-backed arrays.

\

In [2]:
import numpy as np
import os
import imageio
from tqdm import tqdm


def qvec2rotmat(qvec):
    """Convert quaternion to rotation matrix."""
    w, x, y, z = qvec
    return np.array([
        [1 - 2 * y**2 - 2 * z**2,     2 * x * y - 2 * z * w,     2 * x * z + 2 * y * w],
        [2 * x * y + 2 * z * w,       1 - 2 * x**2 - 2 * z**2,   2 * y * z - 2 * x * w],
        [2 * x * z - 2 * y * w,       2 * y * z + 2 * x * w,     1 - 2 * x**2 - 2 * y**2]
    ])


def parse_colmap_text(folder):
    """Parse COLMAP's cameras.txt and images.txt."""
    cameras = {}
    with open(os.path.join(folder, 'cameras.txt'), 'r') as f:
        for line in f:
            if line.startswith('#'): continue
            parts = line.split()
            cam_id, model, W, H, fx, fy, cx, cy = int(parts[0]), parts[1], *map(float, parts[2:])
            cameras[cam_id] = {'W': int(W), 'H': int(H), 'fx': fx, 'fy': fy, 'cx': cx, 'cy': cy}

    images = []
    with open(os.path.join(folder, 'images.txt'), 'r') as f:
        lines = f.readlines()
        for i in range(0, len(lines), 2):
            if lines[i].startswith('#') or i + 1 >= len(lines): continue
            parts = lines[i].split()
            qvec = np.array(list(map(float, parts[1:5])))
            tvec = np.array(list(map(float, parts[5:8])))
            cam_id = int(parts[8])
            img_name = parts[-1]
            images.append((img_name, qvec, tvec, cam_id))

    return images, cameras


def get_rays(H, W, fx, fy, cx, cy, c2w):
    """Get ray origins and directions in world space."""
    i, j = np.meshgrid(np.arange(W), np.arange(H), indexing='xy')
    dirs = np.stack([(i - cx) / fx, (j - cy) / fy, np.ones_like(i)], axis=-1)
    rays_d = dirs @ c2w[:3, :3].T
    rays_o = np.broadcast_to(c2w[:3, 3], rays_d.shape)
    return rays_o, rays_d

In [3]:
def generate_ray_rgb_data(images_txt_dir, image_folder):
    images, cameras = parse_colmap_text(images_txt_dir)

    tmp_dir = "/content/tmp"
    os.makedirs(tmp_dir, exist_ok=True)

    for img_name, qvec, tvec, cam_id in tqdm(images, desc="Processing images"):
        cam = cameras[cam_id]
        H, W, fx, fy, cx, cy = cam['H'], cam['W'], cam['fx'], cam['fy'], cam['cx'], cam['cy']

        # Load image and normalize
        img_path = os.path.join(image_folder, img_name)
        img = imageio.imread(img_path).astype(np.float32) / 255.0  # shape (H, W, 3)

        # Build camera-to-world matrix
        R = qvec2rotmat(qvec).T
        t = -R @ tvec
        c2w = np.eye(4)
        c2w[:3, :3] = R
        c2w[:3, 3] = t

        # === Flip axes to match NeRF convention (COLMAP → NeRF) ===
        c2w[:3, 2] *= -1  # Flip Z
        c2w[:3, 1] *= -1  # Flip Y
        c2w = c2w[[1, 0, 2, 3], :]  # Swap X and Y
        c2w[2, :] *= -1  # Flip the scene

        rays_o, rays_d = get_rays(H, W, fx, fy, cx, cy, c2w)

        # Reshape for proper data formatting
        rays_o = rays_o.reshape(-1, 3).astype(np.float16)
        rays_d = rays_d.reshape(-1, 3).astype(np.float16)
        rgb = img.reshape(-1, 3).astype(np.float16)

        # Save batch
        np.savez_compressed(os.path.join(tmp_dir, f"{img_name}.npz"), rays_o=rays_o, rays_d=rays_d, rgb=rgb)

In [4]:
import glob


def merge_batches_to_npz_streaming(tmp_dir, save_path):
    paths = sorted(glob.glob(os.path.join(tmp_dir, "*.npz")))

    # First pass: count total rays
    total_rays = 0
    for path in paths:
        with np.load(path) as data:
            total_rays += data["rays_o"].shape[0]

    # Create memory-mapped arrays
    rays = np.memmap("rays.dat", dtype=np.float16, mode="w+", shape=(total_rays, 6))
    rgb = np.memmap("rgb.dat", dtype=np.float16, mode="w+", shape=(total_rays, 3))

    # Second pass: fill memory-mapped arrays
    offset = 0
    for path in tqdm(paths, desc="Merging ray batches"):
        with np.load(path) as data:
            N = data["rays_o"].shape[0]
            rays[offset:offset+N, :3] = data["rays_o"]
            rays[offset:offset+N, 3:] = data["rays_d"]
            rgb[offset:offset+N] = data["rgb"]
            offset += N
        os.remove(path) #optional, do it only if running out of disk space.

    # Save final .npz
    np.savez_compressed(save_path, rays=rays, rgb=rgb)
    print(f"Saved {total_rays} rays to {save_path}")

    # Clean up .dat files (optional)
    os.remove("rays.dat")
    os.remove("rgb.dat")

# Generate ray and rgb data

Produces the ray and rgb data for each image in the dataset. Therefore outputs ~100 .npz arrays into the 'tmp' folder.

In [8]:
generate_ray_rgb_data(
     images_txt_dir="/content",
     image_folder="/content/iphone"
 )

  img = imageio.imread(img_path).astype(np.float32) / 255.0  # shape (H, W, 3)
Processing images: 100%|██████████| 127/127 [31:55<00:00, 15.08s/it]


# Generate combined array of ray and rgb data

Combines all the .npz arrays in the 'tmp' folder into one central .npz array, called "full_ray_rgb_array.npz"

In [None]:
merge_batches_to_npz_streaming("/content/tmp", save_path = "full_ray_rgb_array.npz")

Merging ray batches: 100%|██████████| 127/127 [14:50<00:00,  7.01s/it]


In [1]:
import os
import numpy as np

# 1. Path to file
path = "/content/rays.dat"

# 2. dtype and number of columns
dtype = np.float16
num_columns = 6

# 3. Get total number of bytes
num_bytes = os.path.getsize(path)

# 4. Compute number of rows
itemsize = np.dtype(dtype).itemsize  # should be 2 bytes for float16
num_rows = num_bytes // (itemsize * num_columns)

print(f"Shape of rays.dat: ({num_rows}, {num_columns})")


Shape of rays.dat: (2493844416, 6)


In [3]:
import os
import numpy as np

# 1. Path to file
path = "/content/rgb.dat"

# 2. dtype and number of columns
dtype = np.float16
num_columns = 3

# 3. Get total number of bytes
num_bytes = os.path.getsize(path)

# 4. Compute number of rows
itemsize = np.dtype(dtype).itemsize  # should be 2 bytes for float16
num_rows = num_bytes // (itemsize * num_columns)

print(f"Shape of rgb.dat: ({num_rows}, {num_columns})")

Shape of rgb.dat: (2493844416, 3)


In [5]:
import numpy as np

total_rays = 2493844416

# Step 1: Reopen the memmaps from disk
rays = np.memmap('/content/rays.dat', dtype=np.float16, mode='r', shape=(total_rays, 6))
rgb  = np.memmap('/content/rgb.dat', dtype=np.float16, mode='r', shape=(total_rays, 3))

# Step 2: Save them to compressed .npz without loading into RAM
np.savez_compressed('full_ray_rgb_array.npz', rays=rays, rgb=rgb)


In [6]:
from google.colab import drive
import shutil

# Mount your Google Drive
drive.mount('/content/drive')

# Copy the file to your Drive
shutil.copy('/content/full_ray_rgb_array.npz', '/content/drive/MyDrive/')


Mounted at /content/drive


'/content/drive/MyDrive/full_ray_rgb_array.npz'