In [2]:
import cv2
import os
import numpy as np

In [3]:
DATASET_DIR="../imc_25/image-matching-challenge-2025/train/imc2024_lizard_pond"


In [4]:
samp_image_1=os.listdir(DATASET_DIR)[0]
samp_image_2=os.listdir(DATASET_DIR)[10]
img_path_1=os.path.join(DATASET_DIR,samp_image_1)
img_path_2=os.path.join(DATASET_DIR,samp_image_2)
print(img_path_2)

../imc_25/image-matching-challenge-2025/train/imc2024_lizard_pond\lizard_00074.png


In [5]:
img1 = cv2.imread(img_path_1, cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread(img_path_2, cv2.IMREAD_GRAYSCALE)

The **SIFT (Scale-Invariant Feature Transform)** algorithm is a computer vision technique that detects and describes unique, distinctive features (keypoints) in images, making them robust to changes in scale, rotation, illumination, and viewpoint. 

It works by identifying stable points in an image, assigning them orientations, and creating a 128-dimensional vector (descriptor) that represents their local neighborhood, enabling tasks like image matching, object recognition, and panorama stitching. 
### How SIFT Works (Key Steps)

- **Scale-Space Extrema Detection**: Creates multiple blurred versions of the image (scale space) and uses Difference of Gaussians (DoG) to find potential keypoints at different scales, which helps detect features regardless of their size.
- **Keypoint Localization**: Refines the locations of potential keypoints to get precise coordinates, rejecting those with low contrast or poor localization.
- **Orientation Assignment**: Assigns a dominant orientation to each keypoint based on the gradient direction in its neighborhood, providing rotation invariance.
- **Keypoint Descriptor**: Generates a unique, 128-element feature vector by analyzing gradient orientations and magnitudes in a region around the keypoint, creating a distinctive fingerprint.
- **Keypoint Matching**: Compares these descriptors between images, often using the ratio of the best to second-best match, to find reliable correspondences. 

In [6]:
sift = cv2.SIFT_create()

In [7]:
keypoints_1, descriptors_1 = sift.detectAndCompute(img1, None)  
keypoints_2, descriptors_2 = sift.detectAndCompute(img2, None)  

print(f"Keypoints in image 1: {len(keypoints_1)}")
print(f"Keypoints in image 2: {len(keypoints_2)}")

Keypoints in image 1: 6895
Keypoints in image 2: 989


In [8]:
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
matches = bf.match(descriptors_1, descriptors_2)

print("Raw matches:", len(matches))

Raw matches: 487


### What this cell does

- Compares descriptors from image 1 and image 2
- Finds nearest neighbors

### Why BFMatcher?

- Brute-force matching
- Simple and transparent

### Why crossCheck=True?

- Ensures mutual agreement
- Reduces false matches

In [9]:
points_matched_in_1 = np.float32([keypoints_1[m.queryIdx].pt for m in matches])
points_matched_in_2 = np.float32([keypoints_2[m.trainIdx].pt for m in matches])

### Concept takeaway

We are moving from appearance → geometry.

In [10]:
F, mask = cv2.findFundamentalMat(
    points_matched_in_1, points_matched_in_2,
    cv2.FM_RANSAC,
    ransacReprojThreshold=1.0,
    confidence=0.99
)


In [11]:
if mask is None:
    inliers = 0
else:
    inliers = int(mask.sum())

print("RANSAC inliers:", inliers)

RANSAC inliers: 13


### What those inliers represent (conceptually)

Each inlier pair means:

“There exists ONE camera motion that explains how this same 3D point moved from image 1 to image 2.”