# 📸 Data Loading

This notebook details the steps to load data.

## Setup

---

Let's install some necessary dependencies and set global variables.

In [None]:
%reload_ext autoreload
%autoreload 2

In [None]:
# Autoroot
import autorootcwd

In [None]:
import cv2 as cv
import matplotlib.image as mpimg

# Imports
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image

In [None]:
from typing import List, Tuple, Union

## Python Imaging Libraries

---

We can load an image into Python in various different ways:

* [Pillow](https://python-pillow.org/) is a fork of PIL (Python Imaging Library) with loads of functionality for loading and manipulating images
* [OpenCV](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html) implements common computer vision algorithms in Python
* [Matplotlib](https://matplotlib.org/stable/tutorials/images.html) supports basic functionality for loading and plotting images

**Pillow**

In [None]:
# Load image
img = Image.open("imgs/cinestill-800t.jpg")

img

In [None]:
# Get format, mode, dimensions
img.format, img.mode, img.size

In [None]:
# Convert to numpy array
np.array(img).shape

We can also do basic image processing with the PIL library. For example, we can convert the image to grayscale.

In [None]:
box = (1000, 1000, 2000, 2000)
img.crop(box)

**OpenCV**

In [None]:
# Load OpenCV image
img = cv.cvtColor(cv.imread("imgs/cinestill-800t.jpg"), cv.COLOR_BGR2RGB)

plt.imshow(img);

**Matplotlib**

In [None]:
# Load Matplotlib image
img = mpimg.imread("imgs/cinestill-800t.jpg")

plt.imshow(img);

It seems that Pillow is our best shot for loading images and doing basic image processing for a wide variety of image formats.

## Loading Data

We are dealing with an image-pair dataset where each scene is captured on a digital camera (either Panasonic or Sony) and on a film camera (XY). We will define convenient loaders for loading image pairs (possibly including their metadata).

In [None]:
from src.utils.load import load_image, load_image_pair, load_metadata

In [None]:
# Load metadata
load_metadata(13)

In [None]:
# Load image
load_image(13, processing_state="raw", camera="film")

In [None]:
# Load image pair with metadata
film, digital, meta = load_image_pair(13, processing_state="raw")
print(meta)

_, axs = plt.subplots(ncols=2, figsize=(15, 5))
axs[0].imshow(digital)
axs[1].imshow(film);

In [None]:
# Test loading speed
%timeit load_image(13, camera="film", processing_state="raw")
%timeit load_image(13, camera="digital", processing_state="raw")

## EDA

Some exploratory data analysis on our image pairs dataset.

### Metadata

In [None]:
# Load all metadata
metadata = load_metadata(as_df=True)
print(f"✅ Loaded {len(metadata)} metadata entries for image-pair dataset.")

metadata.head()

In [None]:
# Percentage of `location`
metadata.location.value_counts(normalize=True).apply(lambda x: x * 100).round(2)

In [None]:
# Percentage of `weather`
metadata.weather.value_counts(normalize=True).apply(lambda x: x * 100).round(2)

In [None]:
# Percentage of `group`
metadata.group.value_counts(normalize=True).apply(lambda x: x * 100).round(2)