# Image Data Association

In [23]:
# optional: allow Jupyter to "hot reload" the Python modules I wrote, to avoid restarting the kernel after every change
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Part A: Formulating the Dataset

Note: the cells below do NOT need to be rerun!

### Downloading the Entire Dataset

In [1]:
! mkdir ./Data

mkdir: ./Data: File exists


In [2]:
! git clone https://github.com/openMVG/SfM_quality_evaluation.git 

Cloning into 'SfM_quality_evaluation'...


remote: Enumerating objects: 237, done.[K
remote: Total 237 (delta 0), reused 0 (delta 0), pack-reused 237[K
Receiving objects: 100% (237/237), 254.31 MiB | 13.61 MiB/s, done.
Resolving deltas: 100% (9/9), done.
Updating files: 100% (214/214), done.


### Aggregating the Images

We only care about the `images/` directories in the following subsets of the data:
- `fountain-P11`
- `Herz-Jesus-P8`
- `entry-P10`

In [4]:
import glob

In [15]:
BASE_DATA_PATH = "./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008"
subset_names = ["entry-P10", "Herz-Jesus-P8", "fountain-P11"]
file_ext_pattern = "*.jpg"

In [18]:
all_img_paths = list()

for subset in subset_names:
    pattern_for_subset_images = "/".join(
        [BASE_DATA_PATH, subset, "images", file_ext_pattern]
    )
    all_img_paths.extend(glob.glob(pattern_for_subset_images))

print(all_img_paths)

['./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0006.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0007.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0005.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0004.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0000.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0001.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0003.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0002.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0009.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0008.jpg', './SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/Herz-Jesus-P8/images/0006.

Sweet. Lastly, we can load all these images into memory for further processing:

In [21]:
from util import ops

In [22]:
all_img_arrays = [
    ops.load_image(img_path, return_grayscale=False, return_array=True)
    for img_path in all_img_paths
]

all_imgs = dict(zip(all_img_paths, all_img_arrays))

Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0006.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0007.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0005.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0004.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0000.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0001.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0003.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmarking_Camera_Calibration_2008/entry-P10/images/0002.jpg: 2048 x 3072 x 3
Dimensions of ./SfM_quality_evaluation/Benchmark

## Part B: Extract Global Descriptors

I will choose to extract a GIST descriptor for each image. I will elect to keep all the images in their current size (since they're already all the same dimensions).

We'll continue based on the assumption that color matters in this scenario. Therefore, for each image I'll compute the GIST descriptor separately for each color channel, before concatenating them altogether at the end.

In [31]:
from gist import gist
import numpy as np


def compute_gist_descriptor(img: np.ndarray) -> np.ndarray:
    """Compute GIST descriptors for each color channel, return as a single feature vector in 1 row"""
    descriptors = []
    for channel in np.arange(img.shape[-1]):
        channel_gist = gist(img[:, :, channel])
        descriptors.extend(channel_gist)

    return np.array(descriptors)

ImportError: cannot import name 'gist' from 'gist' (/Applications/anaconda3/lib/python3.9/site-packages/gist/__init__.py)

In [32]:
all_img_descriptors = [
    compute_gist_descriptor(img) for img in all_img_arrays
]

AttributeError: module 'gist' has no attribute 'gist_descriptor'