# NETHERGAZE: Markerless Augmented Reality Pipeline

**Author:** Milan Savard  
**Course:** CS366 F25 Final Personal Project  
**Date:** November 2025

---

## Project Overview

NETHERGAZE is a markerless augmented reality (AR) system built using computer vision techniques. Unlike traditional AR systems that rely on fiducial markers (like ArUco or AprilTags), NETHERGAZE uses **natural feature tracking** to estimate camera pose and overlay virtual content.

### Key Technologies
- **OpenCV** for image processing and computer vision
- **ORB (Oriented FAST and Rotated BRIEF)** for feature detection and description
- **Lucas-Kanade Optical Flow** for frame-to-frame feature tracking
- **Essential Matrix + RANSAC** for robust pose estimation
- **Temporal filtering** for smooth AR overlays
- **Sparse SLAM** for 3D mapping and loop closure
- **Depth-aware rendering** for AR occlusion handling


## Table of Contents

1. [Architecture Overview](#1-architecture-overview)
2. [Implementation Progress](#2-implementation-progress)
3. [Component Deep Dive](#3-component-deep-dive)
4. [Demo: Running the Pipeline](#4-demo-running-the-pipeline)
5. [Configuration Options](#5-configuration-options)
6. [Next Steps & Roadmap](#6-next-steps--roadmap)
7. [References](#7-references)


---

## 1. Architecture Overview

The NETHERGAZE pipeline consists of seven main stages:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Capture   ‚îÇ -> ‚îÇ   Tracking   ‚îÇ -> ‚îÇ Pose Estimation‚îÇ -> ‚îÇ  Mapping  ‚îÇ -> ‚îÇ   Overlay   ‚îÇ -> ‚îÇ Display ‚îÇ
‚îÇ  (video.py) ‚îÇ    ‚îÇ (feature.py) ‚îÇ    ‚îÇ   (pose.py)    ‚îÇ    ‚îÇ(mapping.py‚îÇ    ‚îÇ (overlay.py)‚îÇ    ‚îÇ (ui.py) ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                                                   ‚îÇ                  ‚îÇ
                                                                   v                  v
                                                            ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
                                                            ‚îÇ    Loop     ‚îÇ    ‚îÇ  Occlusion  ‚îÇ
                                                            ‚îÇ   Closure   ‚îÇ    ‚îÇ(occlusion.py‚îÇ
                                                            ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Module Responsibilities

| Module | File | Purpose |
|--------|------|---------|  
| **VideoProcessor** | `src/video.py` | Camera capture, preprocessing, backend selection |
| **FeatureTracker** | `src/tracking/feature.py` | ORB detection, optical flow, keyframe management |
| **PoseEstimator** | `src/pose.py` | Essential matrix, pose recovery, temporal filtering |
| **SparseMap** | `src/mapping.py` | 3D point cloud, keyframe management, loop closure |
| **OcclusionHandler** | `src/occlusion.py` | Depth estimation, occlusion masks, depth-aware rendering |
| **OverlayRenderer** | `src/overlay.py` | 2D/3D overlay rendering, blending |
| **UserInterface** | `src/ui.py` | Display window, keyboard controls |
| **NETHERGAZEApp** | `src/main.py` | Pipeline orchestration, CLI interface |


---

## 2. Implementation Progress

### ‚úÖ Completed Components

| Component | Status | Description |
|-----------|--------|-------------|
| Video Capture | ‚úÖ Complete | Multi-backend support (AVFoundation, DirectShow, V4L2) |
| Feature Tracking | ‚úÖ Complete | ORB + optical flow with keyframe management |
| Pose Estimation | ‚úÖ Complete | Essential matrix decomposition with temporal smoothing |
| Overlay Rendering | ‚úÖ Complete | 2D primitives + 3D wireframes |
| User Interface | ‚úÖ Complete | OpenCV window with keyboard controls |
| Pipeline Orchestration | ‚úÖ Complete | CLI-driven main application |
| Configuration System | ‚úÖ Complete | JSON config with sensible defaults |
| Camera Calibration Tool | ‚úÖ Complete | Interactive chessboard calibration with preview mode |
| Integration Tests | ‚úÖ Complete | Synthetic video testing, benchmarks, detector comparison |
| SLAM/Mapping | ‚úÖ Complete | Sparse 3D mapping with keyframes, triangulation, loop closure |
| Occlusion Handling | ‚úÖ Complete | Depth estimation, occlusion masks, depth-aware rendering |

### üöß In Progress / Planned

| Component | Status | Priority |
|-----------|--------|----------|
| Scale Recovery | üöß Partial | High |
| Textured 3D Models | üöß Planned | Medium |
| Dense Depth (MiDaS) | üöß Future | Low |


---

## 3. Component Deep Dive

Let's explore each component in detail.


In [1]:
# Setup: Add src to path for imports
import sys
import os

# Navigate to project root
notebook_dir = os.getcwd()
project_root = os.path.dirname(notebook_dir) if 'notebooks' in notebook_dir else notebook_dir
src_path = os.path.join(project_root, 'src')
sys.path.insert(0, src_path)

print(f"Project root: {project_root}")
print(f"Source path: {src_path}")


Project root: /Users/milansavard/Desktop/GitHub/ComputerVision/NETHERGAZE
Source path: /Users/milansavard/Desktop/GitHub/ComputerVision/NETHERGAZE/src


### 3.1 Feature Tracking

The `FeatureTracker` class implements a hybrid approach:
1. **Detection**: ORB features are detected when tracking is lost or features drop below threshold
2. **Tracking**: Lucas-Kanade optical flow tracks features frame-to-frame (faster than re-detection)
3. **Keyframes**: Best frames are stored for re-localization when tracking fails


In [2]:
from tracking.feature import FeatureTracker, TrackingConfiguration

# View default configuration
default_config = TrackingConfiguration()
print("Feature Tracking Configuration:")
print(f"  Method: {default_config.method}")
print(f"  Max Features: {default_config.max_features}")
print(f"  Use Optical Flow: {default_config.use_optical_flow}")
print(f"  Optical Flow Window: {default_config.optical_flow_win_size}")
print(f"  Reacquire Threshold: {default_config.reacquire_threshold}")
print(f"  Keyframe Interval: {default_config.keyframe_interval}")
print(f"  Max Keyframes: {default_config.max_keyframes}")


Feature Tracking Configuration:
  Method: orb
  Max Features: 1000
  Use Optical Flow: True
  Optical Flow Window: 21
  Reacquire Threshold: 300
  Keyframe Interval: 15
  Max Keyframes: 5


### 3.2 Pose Estimation

The `PoseEstimator` class recovers camera motion from 2D-2D correspondences:

1. **Essential Matrix**: Computed from matched points using RANSAC
2. **Pose Recovery**: Decompose E into rotation (R) and translation (t)
3. **Temporal Filtering**: EMA smoothing + outlier rejection for stable overlays


In [3]:
from pose import PoseEstimator, PoseFilterConfig

# View pose filter configuration
filter_config = PoseFilterConfig()
print("Pose Filter Configuration:")
print(f"  Enable Smoothing: {filter_config.enable_smoothing}")
print(f"  Smoothing Alpha (EMA): {filter_config.smoothing_alpha}")
print(f"  Enable Outlier Rejection: {filter_config.enable_outlier_rejection}")
print(f"  Max Translation Jump: {filter_config.max_translation_jump} m")
print(f"  Max Rotation Jump: {filter_config.max_rotation_jump} rad")
print(f"  Min Inliers Threshold: {filter_config.min_inliers_threshold}")


Pose Filter Configuration:
  Enable Smoothing: True
  Smoothing Alpha (EMA): 0.3
  Enable Outlier Rejection: True
  Max Translation Jump: 0.5 m
  Max Rotation Jump: 0.5 rad
  Min Inliers Threshold: 10


### 3.3 Overlay Rendering

The `OverlayRenderer` supports both 2D screen-space and 3D world-space overlays:

**2D Overlays:**
- Text labels
- Rectangles, circles, lines
- Polygons

**3D Wireframe Objects:**
- Cube, pyramid, grid
- Coordinate axes (RGB = XYZ)
- Custom wireframes


In [4]:
from overlay import OverlayRenderer, OverlayConfiguration

# View overlay configuration
overlay_config = OverlayConfiguration()
print("Overlay Configuration:")
print(f"  Enable 2D Overlays: {overlay_config.enable_2d_overlays}")
print(f"  Enable 3D Overlays: {overlay_config.enable_3d_overlays}")
print(f"  Default 3D Color: {overlay_config.default_3d_color} (BGR)")
print(f"  Blend Alpha: {overlay_config.blend_alpha}")
print(f"  Antialiasing: {overlay_config.antialiasing}")


Overlay Configuration:
  Enable 2D Overlays: True
  Enable 3D Overlays: True
  Default 3D Color: (0, 255, 255) (BGR)
  Blend Alpha: 0.7
  Antialiasing: True


### 3.4 Camera Calibration

The system uses a camera intrinsic matrix (K) for projection:

```
K = | fx   0  cx |
    |  0  fy  cy |
    |  0   0   1 |
```

Where:
- `fx, fy` = focal lengths in pixels
- `cx, cy` = principal point (image center)


In [5]:
from utils import get_config
import numpy as np

# Load default calibration
config = get_config()
calib = config['calibration']

K = np.array(calib['camera_matrix'])
dist = np.array(calib['dist_coeffs'])

print("Default Camera Matrix (K):")
print(K)
print(f"\nFocal Length: fx={K[0,0]:.1f}, fy={K[1,1]:.1f} pixels")
print(f"Principal Point: cx={K[0,2]:.1f}, cy={K[1,2]:.1f} pixels")
print(f"\nDistortion Coefficients: {dist}")


Default Camera Matrix (K):
[[800.   0. 320.]
 [  0. 800. 240.]
 [  0.   0.   1.]]

Focal Length: fx=800.0, fy=800.0 pixels
Principal Point: cx=320.0, cy=240.0 pixels

Distortion Coefficients: [0. 0. 0. 0. 0.]


---

## 4. Demo: Running the Pipeline

### Command Line Usage

```bash
# Basic demo (uses examples/run_demo.py)
cd NETHERGAZE
python3 examples/run_demo.py

# Full pipeline with CLI options
python3 src/main.py                    # Default camera
python3 src/main.py --camera 1         # Specific camera index
python3 src/main.py --video demo.mp4   # Video file input
python3 src/main.py --verbose          # Debug logging
python3 src/main.py --config cfg.json  # Custom config file
```

### Keyboard Controls

| Key | Action |
|-----|--------|
| `q` / `ESC` | Quit |
| `m` | Toggle feature marker display |
| `a` | Toggle pose axes display |
| `p` | Pause/Resume |
| `h` | Show help |


In [6]:
# Display the main.py CLI help
import subprocess
result = subprocess.run(
    ['python3', '../src/main.py', '--help'],
    capture_output=True,
    text=True
)
print(result.stdout)


usage: main.py [-h] [--config CONFIG] [--camera CAMERA] [--video VIDEO]
               [--width WIDTH] [--height HEIGHT] [--verbose]

NETHERGAZE - Markerless Augmented Reality Pipeline

options:
  -h, --help            show this help message and exit
  --config CONFIG, -c CONFIG
                        Path to JSON configuration file
  --camera CAMERA, -cam CAMERA
                        Camera index to use (default: 0)
  --video VIDEO, -v VIDEO
                        Path to video file (overrides camera)
  --width WIDTH         Video capture width
  --height HEIGHT       Video capture height
  --verbose, -V         Enable verbose/debug logging

Examples:
  python main.py                          # Run with default camera
  python main.py --camera 1               # Use camera index 1
  python main.py --video demo.mp4         # Process video file
  python main.py --config my_config.json  # Use custom config
  python main.py --verbose                # Enable debug logging
        



---

## 5. Configuration Options

All parameters can be customized via JSON config file or modified in `src/utils.py`.


In [7]:
import json
from utils import get_config

# Get full configuration
config = get_config()

# Pretty print
print("Full Configuration:")
print(json.dumps(config, indent=2, default=str))


Full Configuration:
{
  "camera_id": 0,
  "video_width": 640,
  "video_height": 480,
  "video_fps": 30,
  "camera_backend_priority": null,
  "camera_init_attempts": 10,
  "enable_preprocessing": true,
  "blur_kernel": [
    5,
    5
  ],
  "contrast_alpha": 1.0,
  "brightness_beta": 0,
  "axis_length": 0.05,
  "feature_tracking": {
    "method": "orb",
    "max_features": 1000,
    "quality_level": 0.01,
    "min_distance": 7.0,
    "fast_threshold": 20,
    "orb_scale_factor": 1.2,
    "orb_nlevels": 8,
    "akaze_threshold": 0.001,
    "use_optical_flow": true,
    "optical_flow_win_size": 21,
    "optical_flow_max_level": 3,
    "optical_flow_criteria_eps": 0.03,
    "optical_flow_criteria_count": 30,
    "adaptive_optical_flow": true,
    "reacquire_threshold": 200,
    "keyframe_interval": 15,
    "max_keyframes": 6,
    "min_keyframe_features": 160,
    "keyframe_quality_threshold": 0.5,
    "matcher_type": "bf_hamming",
    "match_ratio_threshold": 0.75,
    "use_grid_detection"

### Key Configuration Sections

| Section | Description |
|---------|-------------|
| `feature_tracking` | ORB detection and optical flow parameters |
| `calibration` | Camera intrinsic matrix and distortion coefficients |
| `pose_filter` | Temporal smoothing and outlier rejection settings |
| `overlay` | 2D/3D rendering options |


---

## 6. Next Steps & Roadmap

### ‚úÖ Recently Completed

#### 1. Camera Calibration Tool ‚úÖ
Fully implemented in `examples/calibrate_camera.py`:
- Interactive chessboard capture mode
- Batch calibration from image files  
- Live undistorted preview
- JSON export/import of calibration data

```bash
# Usage examples
python examples/calibrate_camera.py --capture           # Interactive mode
python examples/calibrate_camera.py --images *.jpg     # Batch mode
python examples/calibrate_camera.py --preview          # Preview calibration
```

#### 2. Integration Tests ‚úÖ
Comprehensive test suite in `tests/test_integration.py`:
- Synthetic video generation (checkerboard, feature-rich sequences)
- Pipeline metrics collection (tracking rate, pose rate, FPS)
- Detector comparison benchmarks (ORB, AKAZE, BRISK)
- Video file playback for reproducible testing

#### 3. SLAM/Mapping Integration ‚úÖ
Sparse SLAM in `src/mapping.py`:
- `SparseMap` class with 3D point cloud from triangulated features
- Keyframe management with covisibility graph
- Loop closure detection with geometric verification
- Map persistence (save/load to JSON)

#### 4. Occlusion Handling ‚úÖ
Depth-aware rendering in `src/occlusion.py`:
- `DepthEstimator` - sparse depth from features + ground plane assumption
- `OcclusionHandler` - mask generation and compositing
- `DepthAwareOverlayRenderer` - proper AR occlusion

### üöß Remaining Tasks

#### High Priority: Scale Recovery Enhancement
Markerless tracking recovers pose up to an unknown scale. Options:
- Known object size in scene
- IMU integration for metric scale
- Stereo camera setup

#### Medium Priority: Textured 3D Models
Replace wireframes with proper 3D mesh rendering:
- Load OBJ/PLY files
- OpenGL integration for GPU rendering
- Texture mapping

#### Low Priority: Dense Depth Integration
- MiDaS or similar monocular depth estimation
- Better occlusion with dense depth maps


### ‚úÖ Camera Calibration Tool - Now Implemented!

The calibration tool has been fully implemented. Here's how to use it:


In [8]:
# Display the calibration tool CLI help
import subprocess
result = subprocess.run(
    ['python3', '../examples/calibrate_camera.py', '--help'],
    capture_output=True,
    text=True
)
print(result.stdout)


Camera Calibration Skeleton:

import cv2
import numpy as np
import json

def calibrate_camera(image_paths, board_size=(9, 6), square_size=0.025):
    """
    Calibrate camera from chessboard images.
    
    Args:
        image_paths: List of paths to calibration images
        board_size: (cols, rows) of internal chessboard corners
        square_size: Physical size of chessboard square in meters
    
    Returns:
        camera_matrix, dist_coeffs
    """
    # Prepare object points (0,0,0), (1,0,0), (2,0,0), ...
    objp = np.zeros((board_size[0] * board_size[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:board_size[0], 0:board_size[1]].T.reshape(-1, 2)
    objp *= square_size
    
    obj_points = []  # 3D points in world
    img_points = []  # 2D points in image
    
    for path in image_paths:
        img = cv2.imread(path)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        
        ret, corners = cv2.findChessboardCorners(gray, board_size, None)
        if ret:
    

---

## 7. References

### Papers and Books
1. Multiple View Geometry - Hartley and Zisserman (Essential matrix, pose recovery)
2. ORB: An efficient alternative to SIFT or SURF - Rublee et al., 2011
3. Lucas-Kanade Optical Flow - Lucas and Kanade, 1981

### OpenCV Documentation
- Feature Detection: https://docs.opencv.org/4.x/db/d27/tutorial_py_table_of_contents_feature2d.html
- Camera Calibration: https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html
- Pose Estimation: https://docs.opencv.org/4.x/d7/d53/tutorial_py_pose.html

### Project Files
- PROGRESS.md - Detailed implementation progress
- TESTING.md - Testing procedures
- TROUBLESHOOTING.md - Common issues and solutions
- docs/design_overview.md - Architecture documentation


---

## Summary

NETHERGAZE is a fully functional markerless AR pipeline with:

‚úÖ **Real-time feature tracking** using ORB + optical flow  
‚úÖ **Robust pose estimation** with temporal filtering  
‚úÖ **Flexible overlay system** for 2D and 3D content  
‚úÖ **CLI-driven application** with extensive configuration  
‚úÖ **Camera calibration tool** for accurate intrinsics  
‚úÖ **Integration tests** for stability verification  
‚úÖ **Sparse SLAM/mapping** with loop closure detection  
‚úÖ **Occlusion handling** with depth-aware rendering  

**Next steps:**
1. Improve scale recovery for metric pose
2. Add textured 3D model support
3. Integrate dense depth estimation (MiDaS)

---


