# From Images to a Place Memory: VPR Tutorial

This notebook demonstrates **Visual Place Recognition (VPR)** for robot localization using **NetVLAD** and a **Vector Database**.

## Overview

**SLAM provides precise motion tracking. VPR provides long-term place recognition.**  
Combined, they enable reliable indoor localization over time.

### What You'll Learn

**Part 1 — Building Place Memory (Offline)**
- Step 1: Turn images into "Place Signatures" using NetVLAD
- Step 2: Store these signatures into a Vector Database with pose metadata

**Part 2 — Querying Place Memory (Online)**
- Step 1: Convert current camera view → query vector
- Step 2: KNN search in Vector DB → retrieve similar places
- Step 3: Map candidates to coarse locations

---

## Setup & Initialization

In [2]:
# sys
import sys
import shutil
from pathlib import Path

# math
import numpy as np
import pandas as pd
import torch

# IO
import h5py
import yaml

# utils
from termcolor import colored
from pprint import pformat
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec


In [None]:
vpr_root = Path(".")  # Use local vpr_core copy
sys.path.insert(0, str(vpr_root))

from vpr_core import (extract_features,
                  pairs_from_retrieval,
                  pairs_from_retrieval)


print("GPU ON: ", torch.cuda.is_available())
print("Using local vpr_core from:", vpr_root.resolve())

GPU ON:  True
Using local vpr_core from: /home/gx-shu/My/RevisitVLOC


In [None]:
# Data paths
data_dir = Path("./data/netvlad")
images_train = data_dir / "train_mono"  # Reference images with vertex metadata
images_test = data_dir / "test"          # Query images
outputs = data_dir / "outputs"
outputs.mkdir(exist_ok=True)

print(f"Train images: {images_train} ({len(list(images_train.glob('*.png')))} files)")
print(f"Test images:  {images_test} ({len(list(images_test.glob('*.png')))} files)")
print(f"Outputs:      {outputs}")

---

# Part 1 — Building Place Memory (Offline)

When the robot explores, we build a **place memory** by:
1. Converting each image into a compact **global descriptor** (place signature)
2. Storing these descriptors in a **vector database** with pose metadata from SLAM

This creates a searchable map of visual memories.

## Step 1 — Image → Place Signature (NetVLAD)

When a robot captures a frame, **NetVLAD** transforms it into a single vector using two key components:

### 1) CNN Backbone: Learning Visual Patterns
A convolutional neural network (e.g., VGG16) extracts **feature maps** — rich patterns related to:
- Shapes, Textures, Spatial layout

These features capture what the scene *looks and feels like*, beyond just raw pixels.

### 2) VLAD Layer: Compressing Features into a Vector
**VLAD** (Vector of Locally Aggregated Descriptors) clusters local features into visual concepts and measures:
- Which concepts are present
- How strongly they appear
- How they differ from learned "centers"

**Result:** A full image becomes a compact **4096-D vector** — a global descriptor representing the identity of a place.

In [None]:
# Reset GPU memory stats to track NetVLAD usage from baseline
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()
    print("GPU memory stats reset. Ready to track NetVLAD memory usage.")

In [None]:
print(f"Configs for feature extractors:\n{pformat(extract_features.confs)}")

feature_conf = extract_features.confs["netvlad"]

# Extract features for TEST set
print("\n=== Extracting NetVLAD features for TEST set ===")
feature_path_test = outputs / "global-feats-netvlad-test.h5"
extract_features.main(feature_conf, images_test, outputs, feature_path=feature_path_test)
print(f"Test features saved to: {feature_path_test}")

# Extract features for TRAIN set
print("\n=== Extracting NetVLAD features for TRAIN set ===")
feature_path_train = outputs / "global-feats-netvlad-train.h5"
extract_features.main(feature_conf, images_train, outputs, feature_path=feature_path_train)
print(f"Train features saved to: {feature_path_train}")

In [None]:
# Check GPU memory usage
if torch.cuda.is_available():
    allocated = torch.cuda.memory_allocated() / 1024**3  # Convert to GB
    reserved = torch.cuda.memory_reserved() / 1024**3
    max_allocated = torch.cuda.max_memory_allocated() / 1024**3
    
    print(f"\n=== GPU Memory Usage ===")
    print(f"Currently allocated: {allocated:.2f} GB")
    print(f"Currently reserved:  {reserved:.2f} GB")
    print(f"Peak allocated:      {max_allocated:.2f} GB")
    
    # Get total GPU memory
    total_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"Total GPU memory:    {total_memory:.2f} GB")
    print(f"Memory utilization:  {(allocated/total_memory)*100:.1f}%")
else:
    print("CUDA not available")

## Step 2 — Store Place Memories into Vector Database

Each NetVLAD vector becomes a **memory of a place**. To use it for localization, we also store **where the robot was**:
- Robot's pose (from SLAM system)
- Metadata: timestamp, image path, vertex ID

### Why a Vector Database?

A vector database provides:
- **Efficient long-term memory** for large environments
- **Real-time retrieval** even with millions of vectors
- **High-performance indexes** (IVF, HNSW) for fast similarity search

**In short:** A vector DB allows robots to remember more places, find memories faster, and use them intelligently.

In [None]:
# Create image pairs using NetVLAD retrieval
top = 5  # Number of top matches per test image

# Define paths for feature descriptors
descriptors_test = feature_path_test
descriptors_train = feature_path_train

# Output pairs file
output_pairs_file = outputs / f"pairs-netvlad-top{top}.txt"

print(f"\n=== Creating image pairs (top {top} matches per test image) ===")
print(f"Test descriptors:  {descriptors_test}")
print(f"Train descriptors: {descriptors_train}")
print(f"Output pairs: {output_pairs_file}")

pairs_from_retrieval.main(
    descriptors=descriptors_test,
    db_descriptors=descriptors_train,
    output=output_pairs_file,
    num_matched=top,
)

print(f"\nPairs saved to: {output_pairs_file}")

### Visualize Retrieval Results

Let's verify the extracted features by running a simple retrieval test (without vector DB yet).

In [None]:
# Read pairs file
pairs = []
with open(output_pairs_file, 'r') as f:
    for line in f:
        test_img, train_img = line.strip().split()
        pairs.append((test_img, train_img))

# Group pairs by test image
from collections import defaultdict
test_matches = defaultdict(list)
for test_img, train_img in pairs:
    test_matches[test_img].append(train_img)

print(f"Found {len(test_matches)} test images")
print(f"Each test has {len(list(test_matches.values())[0])} matches\n")

# Create output directory for plots
plots_dir = outputs / "retrieval_plots"
plots_dir.mkdir(exist_ok=True)

# Visualize ALL test images (or set a specific number)
num_test_to_show = len(test_matches)  # Change to a number like 10 to limit output
test_list = list(test_matches.items())[:num_test_to_show]

print(f"Generating {num_test_to_show} visualization plots...")

for idx, (test_img, matched_imgs) in enumerate(test_list):
    num_matches = len(matched_imgs)
    
    fig = plt.figure(figsize=(14, 3))
    gs = GridSpec(1, num_matches + 1, figure=fig, wspace=0.3)
    
    # Show test image
    ax_test = fig.add_subplot(gs[0, 0])
    test_path = images_test / test_img
    img_test = plt.imread(test_path)
    ax_test.imshow(img_test)
    ax_test.set_title(f"Test {idx+1}\n{test_img}", fontsize=10, fontweight='bold', color='red')
    ax_test.axis('off')
    
    # Show top-k matched images
    for i, match_img in enumerate(matched_imgs):
        ax_match = fig.add_subplot(gs[0, i + 1])
        match_path = images_train / match_img
        img_match = plt.imread(match_path)
        ax_match.imshow(img_match)
        ax_match.set_title(f"Rank {i+1}\n{match_img}", fontsize=9)
        ax_match.axis('off')
    
    # Add title with proper spacing
    fig.suptitle(f"Top-{num_matches} Retrieval Results for Test Image {idx+1}", 
                 fontsize=12, fontweight='bold')
    fig.subplots_adjust(top=0.88)  # Adjust to prevent overlap
    
    # Save figure (no display in notebook)
    save_path = plots_dir / f"retrieval_test_{idx+1:03d}.png"
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.close(fig)  # Close to free memory
    
    if (idx + 1) % 10 == 0:
        print(f"  Progress: {idx+1}/{num_test_to_show} saved")

print(f"\n✓ All {num_test_to_show} plots saved to: {plots_dir}")
print(f"  Example files: retrieval_test_001.png, retrieval_test_002.png, ...")

In [None]:
# Load feature descriptors to compute similarity scores
print("Loading feature descriptors...")
with h5py.File(feature_path_test, 'r') as f_test:
    test_descriptors = {name: f_test[name]['global_descriptor'][:] for name in f_test.keys()}

with h5py.File(feature_path_train, 'r') as f_train:
    train_descriptors = {name: f_train[name]['global_descriptor'][:] for name in f_train.keys()}

print(f"Loaded {len(test_descriptors)} test descriptors and {len(train_descriptors)} train descriptors")

def compute_similarity(desc1, desc2):
    """Compute cosine similarity between two descriptors"""
    return np.dot(desc1, desc2) / (np.linalg.norm(desc1) * np.linalg.norm(desc2))

# Read pairs file and group by test image
pairs = []
with open(output_pairs_file, 'r') as f:
    for line in f:
        test_img, train_img = line.strip().split()
        pairs.append((test_img, train_img))

# Group by test image
from collections import defaultdict
test_to_matches = defaultdict(list)
for test_img, train_img in pairs:
    test_to_matches[test_img].append(train_img)

# Build DataFrame with test image, matched vertices, and similarity scores
results = []

for test_img, matched_train_imgs in test_to_matches.items():
    row = {'test_image': test_img}
    
    # Get test descriptor
    test_desc = test_descriptors.get(test_img)
    
    # Extract vertex indices and compute scores
    for rank, train_img in enumerate(matched_train_imgs, start=1):
        # Parse vertex index: vertex_005_000034_1751340273862754844.png -> vertex_idx = 5
        parts = train_img.split('_')
        if parts[0] == 'vertex' and len(parts) >= 2:
            v_idx = int(parts[1])
            row[f'rank_{rank}_vertex'] = v_idx
            row[f'rank_{rank}_train_img'] = train_img
            
            # Compute similarity score
            if test_desc is not None and train_img in train_descriptors:
                train_desc = train_descriptors[train_img]
                score = compute_similarity(test_desc, train_desc)
                row[f'rank_{rank}_score'] = float(score)
            else:
                row[f'rank_{rank}_score'] = None
        else:
            row[f'rank_{rank}_vertex'] = None
            row[f'rank_{rank}_train_img'] = train_img
            row[f'rank_{rank}_score'] = None
    
    results.append(row)

# Create DataFrame
matches_df = pd.DataFrame(results)

# Reorder columns for better readability
cols = ['test_image']
for i in range(1, 6):  # top 5
    cols.append(f'rank_{i}_vertex')
    cols.append(f'rank_{i}_score')
    cols.append(f'rank_{i}_train_img')
matches_df = matches_df[cols]

print(f"\nCreated DataFrame with {len(matches_df)} test images")
print(f"\nFirst 5 rows:")
print(matches_df.head(5))

# Save to CSV
matches_csv_path = outputs / "test_to_vertex_matches.csv"
matches_df.to_csv(matches_csv_path, index=False)
print(f"\n✓ Saved to: {matches_csv_path}")

# Show summary statistics
print(f"\nSimilarity Score Statistics:")
score_cols = [f'rank_{i}_score' for i in range(1, 6)]
for col in score_cols:
    print(f"  {col}: mean={matches_df[col].mean():.4f}, min={matches_df[col].min():.4f}, max={matches_df[col].max():.4f}")

# Display full DataFrame (uncomment to see all)
# matches_df

In [None]:
# Map and vertices paths (SLAM data)
map_dir = Path("./data/map")
vertices_path = map_dir / "vertices.csv"

# Load map metadata
with open(map_dir / "map.yaml", 'r') as f:
    map_meta = yaml.safe_load(f)

print("Map metadata:")
print(f"  Resolution: {map_meta['resolution']} m/pixel")
print(f"  Origin: {map_meta['origin']}")

# Load map image
map_img = plt.imread(map_dir / "map.pgm")
print(f"  Map size: {map_img.shape}")

# Load vertices
vertices = pd.read_csv(vertices_path)
print(f"\nLoaded {len(vertices)} vertices")

# Extract positions
positions = vertices[[' position x [m]', ' position y [m]']].to_numpy()

# Convert world coordinates (meters) to pixel coordinates
origin_x, origin_y = map_meta['origin'][0], map_meta['origin'][1]
resolution = map_meta['resolution']

# Transform: pixel_x = (world_x - origin_x) / resolution
pixel_x = (positions[:, 0] - origin_x) / resolution
pixel_y = (positions[:, 1] - origin_y) / resolution

# Invert Y axis (image origin is top-left, but map origin is bottom-left)
pixel_y = map_img.shape[0] - pixel_y

# Visualize
fig, ax = plt.subplots(figsize=(14, 12))

# Show map
ax.imshow(map_img, cmap='gray', origin='upper', alpha=0.7)

# Overlay vertices with color gradient (trajectory sequence)
colors = np.arange(len(positions))
scatter = ax.scatter(pixel_x, pixel_y, c=colors, cmap='jet', s=20, alpha=0.8, edgecolors='white', linewidths=0.5)

# Add colorbar to show trajectory direction
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Vertex sequence', rotation=270, labelpad=20)

# Mark start and end
ax.scatter(pixel_x[0], pixel_y[0], c='lime', s=150, marker='o', edgecolors='black', linewidths=2, label='Start', zorder=10)
ax.scatter(pixel_x[-1], pixel_y[-1], c='red', s=150, marker='X', edgecolors='black', linewidths=2, label='End', zorder=10)

ax.set_title('SLAM Vertices on 2D Occupancy Grid Map', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.axis('off')

# Save to outputs
map_viz_path = outputs / "all_vertices_on_map.png"
plt.savefig(map_viz_path, dpi=150, bbox_inches='tight')
print(f"\nSaved to: {map_viz_path}")

plt.show()

### Visualize SLAM Vertices on Map

Each reference image is linked to a **SLAM vertex** — a known pose on the map.

### Visualize Matched Vertices on Map

For each test image, show which reference vertices (locations) were retrieved.

In [None]:
# Map and vertices paths
map_dir = Path("./data/map")
vertices_path = map_dir / "vertices.csv"

# Load map metadata
with open(map_dir / "map.yaml", 'r') as f:
    map_meta = yaml.safe_load(f)

print("Map metadata:")
print(f"  Resolution: {map_meta['resolution']} m/pixel")
print(f"  Origin: {map_meta['origin']}")

# Load map image
map_img = plt.imread(map_dir / "map.pgm")
print(f"  Map size: {map_img.shape}")

# Load all vertices
vertices = pd.read_csv(vertices_path)
print(f"\nLoaded {len(vertices)} vertices")

# Extract all vertex positions
all_positions = vertices[[' position x [m]', ' position y [m]']].to_numpy()

# Conversion function
def world_to_pixel(world_x, world_y):
    """Convert world coordinates (meters) to pixel coordinates"""
    origin_x, origin_y = map_meta['origin'][0], map_meta['origin'][1]
    resolution = map_meta['resolution']
    
    pixel_x = (world_x - origin_x) / resolution
    pixel_y = (world_y - origin_y) / resolution
    
    # Invert Y axis (image origin is top-left, map origin is bottom-left)
    pixel_y = map_img.shape[0] - pixel_y
    
    return pixel_x, pixel_y

# Convert all positions to pixels for background plot
all_pixel_x, all_pixel_y = world_to_pixel(all_positions[:, 0], all_positions[:, 1])

# Create output directory for map plots
map_plots_dir = outputs / "vertex_match_maps"
map_plots_dir.mkdir(exist_ok=True)

print(f"\nGenerating map visualizations for {len(matches_df)} test images...")

# Define colors for ranks (rank 1 = best match = green, lower ranks = yellow/orange/red)
rank_colors = ['lime', 'yellow', 'orange', 'coral', 'red']
rank_sizes = [150, 120, 100, 80, 60]  # Size decreases with rank

# Iterate through each test image
for idx, row in matches_df.iterrows():
    fig, ax = plt.subplots(figsize=(14, 12))
    
    # Show map as background
    ax.imshow(map_img, cmap='gray', origin='upper', alpha=0.7)
    
    # Plot all vertices as background (light gray)
    ax.scatter(all_pixel_x, all_pixel_y, c='lightgray', s=3, alpha=0.3, label='All vertices')
    
    # Extract and plot top 5 matched vertices
    for rank in range(1, 6):
        v_idx = row[f'rank_{rank}_vertex']
        score = row[f'rank_{rank}_score']
        
        if pd.notna(v_idx):
            v_idx = int(v_idx)
            if v_idx < len(vertices):
                v_pos_x = vertices.iloc[v_idx][' position x [m]']
                v_pos_y = vertices.iloc[v_idx][' position y [m]']
                
                px, py = world_to_pixel(v_pos_x, v_pos_y)
                
                ax.scatter(px, py, 
                          c=rank_colors[rank-1], 
                          s=rank_sizes[rank-1], 
                          marker='o',
                          edgecolors='black', 
                          linewidths=2,
                          label=f'Rank {rank} (score: {score:.3f})',
                          zorder=10-rank,
                          alpha=0.9)
                
                ax.text(px, py, str(rank), 
                       fontsize=10, fontweight='bold',
                       ha='center', va='center',
                       color='black', zorder=15)
    
    test_img_name = row['test_image']
    ax.set_title(f'Top-5 Matched Vertices for: {test_img_name}', fontsize=14, fontweight='bold')
    ax.legend(loc='upper right', fontsize=10, framealpha=0.9)
    ax.axis('off')
    
    save_path = map_plots_dir / f"vertex_map_test_{idx+1:03d}.png"
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.close(fig)
    
    if (idx + 1) % 10 == 0:
        print(f"  Progress: {idx+1}/{len(matches_df)} saved")

print(f"\nAll {len(matches_df)} map plots saved to: {map_plots_dir}")

### Initialize Milvus Vector Database

We use **Milvus Lite** — a lightweight vector database for fast similarity search.

In [None]:
# Import NetVLADMilvusDB helper
utils_path = Path("./utils")
if str(utils_path) not in sys.path:
    sys.path.insert(0, str(utils_path))

from vector_DB_IO import NetVLADMilvusDB

print("NetVLADMilvusDB imported successfully")

### Insert Reference Features (Build the Place Memory)

Insert all training image descriptors into Milvus with their vertex IDs (pose references from SLAM).

In [None]:
# Check actual NetVLAD dimension first
with h5py.File(feature_path_train, 'r') as f:
    sample_img = list(f.keys())[0]
    sample_desc = f[sample_img]['global_descriptor'][:]
    actual_dim = sample_desc.shape[0]
    print(f"Actual NetVLAD dimension: {actual_dim}")

# Initialize Milvus database
milvus_db_path = outputs / "milvus_netvlad.db"
db_manager = NetVLADMilvusDB(db_path=milvus_db_path, collection_name="tornare_train_references")

# Create collection with correct dimension (drop old if exists to start fresh)
db_manager.create_collection(dim=actual_dim, drop_old=True)

# Insert training features into Milvus
print("\n=== Inserting training features into Milvus ===")
num_inserted = db_manager.insert_from_h5(feature_path_train)

print(f"\n✓ Milvus database ready with {num_inserted} reference images")

---

# Part 2 — Querying Place Memory (Online Localization)

Once the place memory is built, we use it for **online localization**: given the robot's current camera view, find where it might be on the map.

## Step 1 — Current View → Global Descriptor

At runtime, the robot captures a new image and passes it through the **same NetVLAD pipeline**:
1. CNN backbone extracts feature maps
2. VLAD layer aggregates them into a global descriptor

Using exactly the same model is important — descriptors must live in the **same feature space** as stored memories.

## Step 2 — KNN Search in Vector DB

The query vector is sent to the vector database for **K-nearest-neighbor search**.

The DB returns top-K most similar place memories:
```
Top-1: "corridor_03.png", similarity 0.92
Top-2: "corridor_02.png", similarity 0.89
Top-3: "lobby_side_01.png", similarity 0.81
```

Under the hood, efficient similarity search (L2 distance or cosine similarity) keeps retrieval fast even with thousands of stored descriptors.

## Step 3 — Candidates → Coarse Location

Each retrieved match includes **pose metadata from SLAM**, so every match delivers a **candidate location on the map**.

**VPR's contribution:** Narrow down "Where am I likely to be?" to a small set of candidate places.

The localization backend (e.g., SfM, PnP) can then verify and compute the final precise pose.

In [None]:
# Reload the local vpr_core extract_features module to get timing updates
import importlib
from vpr_core import extract_features
importlib.reload(extract_features)

print("✓ Reloaded extract_features module with timing breakdowns")

In [None]:
# Process one test image: extract vector and search in Milvus DB
import time
import random

total_start = time.time()

# Randomly select one test image
test_image_path = random.choice(list(images_test.glob("*.png")))
print(f"Processing test image: {test_image_path.name}")

# Step 1: Create temp directory and copy image
t1 = time.time()
query_dir = outputs / "single_query"
query_dir.mkdir(exist_ok=True)
shutil.copy(test_image_path, query_dir / test_image_path.name)
cprint(f"  [1] File I/O: {time.time() - t1:.3f}s", 'green')

# Step 2: Extract NetVLAD features (convert image to vector)
t2 = time.time()
query_h5_path = outputs / "single_query_features.h5"
if query_h5_path.exists():
    query_h5_path.unlink()

extract_features.main(feature_conf, query_dir, outputs, feature_path=query_h5_path)
cprint(f"  [2] NetVLAD extraction: {time.time() - t2:.3f}s", 'green')

# Step 3: Read the extracted vector
t3 = time.time()
with h5py.File(query_h5_path, 'r') as f:
    img_name = list(f.keys())[0]
    query_vector = f[img_name]['global_descriptor'][:]
cprint(f"  [3] Read vector from H5: {time.time() - t3:.3f}s", 'green')
print(f"      Vector shape: {query_vector.shape}, norm: {np.linalg.norm(query_vector):.2f}")

# Step 4: Search in Milvus database
t4 = time.time()
matches = db_manager.search(query_vector, top_k=5)
cprint(f"  [4] Milvus search: {time.time() - t4:.3f}s", 'green')

# Step 5: Display results
print(f"\n{'='*70}")
print(f"Query: {test_image_path.name}")
print(f"{'='*70}")
for rank, (match_name, vertex_id, similarity) in enumerate(matches, 1):
    print(f"Rank {rank}: {match_name:50s} (vertex {vertex_id:3d}, sim: {similarity:.4f})")

# Cleanup
t5 = time.time()
shutil.rmtree(query_dir)
query_h5_path.unlink()
cprint(f"\n  [5] Cleanup: {time.time() - t5:.3f}s", 'green')

print(f"\n{'='*70}")
cprint(f"Total time: {time.time() - total_start:.3f}s", 'yellow')
print(f"{'='*70}")

In [None]:
# Compare Milvus results with original pairs_from_retrieval results
query_img_name = test_image_path.name

# Get original results from matches_df
original_row = matches_df[matches_df['test_image'] == query_img_name]

if len(original_row) > 0:
    print(f"\n{'='*70}")
    print(f"Comparison for: {query_img_name}")
    print(f"{'='*70}")
    print(f"{'Rank':<6} {'Original Method':<35} {'Milvus DB':<35}")
    print(f"{'-'*70}")
    
    for rank in range(1, 6):
        # Original results
        orig_img = original_row.iloc[0][f'rank_{rank}_train_img']
        orig_vertex = original_row.iloc[0][f'rank_{rank}_vertex']
        orig_score = original_row.iloc[0][f'rank_{rank}_score']
        
        # Milvus results
        milvus_img, milvus_vertex, milvus_score = matches[rank-1]
        
        # Check if same
        same = "✓" if orig_img == milvus_img else "✗"
        
        print(f"{rank:<6} v{orig_vertex} ({orig_score:.3f})  {same:^6}  v{milvus_vertex} ({milvus_score:.3f})")
    
    print(f"{'='*70}")
else:
    print(f"Warning: {query_img_name} not found in original results")

In [None]:
# Release GPU memory after VPR
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU cache cleared. Memory released for other tasks.")

---

# Part 3 — (Optional) Multi-Frame Aggregation

Even with strong descriptors like NetVLAD, a single frame may not always be reliable — especially in environments with **repetitive structures** (corridors) or **motion blur**.

## Strategy: Use Multiple Recent Frames

Instead of trusting a single frame, use the **last N frames** (e.g., 3-5 images):
1. Perform VPR search for each frame (top-K results)
2. Aggregate all candidate matches
3. Find consensus among observations

## Aggregation Methods

**Method A — Frequency Clustering**  
Count how many times each candidate location appears across all frames. More consistent results → more likely correct.

**Method B — Spatial Voting**  
Each candidate has a known pose → group nearby poses on the map. The largest cluster → best location estimate.

**Method C — Heatmap Filtering**  
Plot match scores onto the map → regions with higher accumulated scores → more confident hypothesis.

## Why This Helps

Indoor motion is usually **continuous and smooth**. If frame 1 says "corridor A" but frames 2-5 say "lobby area", the system can confidently discard the outlier.

**With only lightweight post-processing, VPR accuracy and robustness improve significantly.**

---

# Summary

In this tutorial, we built a complete VPR-based localization system:

1. **NetVLAD** converts camera frames into global descriptors (place signatures)
2. **Vector Database** stores these descriptors with pose metadata from SLAM
3. **Online Query** retrieves the most similar places for coarse localization
4. **(Optional) Multi-frame aggregation** improves reliability

**SLAM provides precise motion tracking. VPR provides long-term place recognition.**  
Combined, they enable reliable indoor localization over time.