<a href="https://colab.research.google.com/github/nolauren/colabs/blob/main/DVT_NUS_Workbook_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Distant Viewing Toolkit**

For more information about the toolkit and the Distant Viewing Lab, please see our [GitHub page](https://github.com/distant-viewing/dvt).

# 1. Introduction

- How can we look at images through computers?
- How might they help us "see" and "view" in expected and unexpected ways?


This notebook uses the Distant Viewing Toolkit (DVT) to analyze color photographs from the 1930s and 1940s. Using computer vision, DVT facilitates the computational analysis of (moving) images. Specifically designed for the study of visual culture, the toolkit enacts the DV method. See our article in DSH at https://doi.org/10.1093/llc/fqz013 for more.


 A note about this notebook: This document displays a file known as a Jupyter notebook. In this case, it contains a mix of plain text (like this one!) and code in the open-source programing language Python. The notebook is being hosted on Google's Colab platform, which allows us to run the code for free on a third-party system without the need to install Python and its many dependencies on our local machine. If you are interested in using these methods further, however, it is possible to install all of this on your machine and run the code locally. See the [INSTALL.md](https://github.com/distant-viewing/dvt/blob/main/INSTALL.md) file for more information.

# 2. Setup

### 2.1 Installation

The Google Colab environment already has a running version of Python and several of the most common modules (third-party code that extends the basic language). We only need to install the Distant Viewing Toolkit, which will also load a few extra dependencies. To do this, hover over the code block below and click on the run button that appears in the upper left corner of the block.

In [None]:
!pip install -q git+https://github.com/distant-viewing/dvt.git

It is possible that you may see one or two errors about extra dependencies. Our experience is that these can be ignored for the moment.

### 2.2 Load Data

Now that we have the Python modules installed, we next need to  download our image data. This can be done by running the following code block. We'll also create a directory to put all of the deep learning models that get downloaded in the code below.

In [None]:
!wget -q https://distantviewing.org/fsa_color.zip
!unzip -q fsa_color.zip
!mkdir -p /root/.cache/torch/hub/checkpoints/

Once finished, you will have all of the images downloaded to the Colab. Let's load all of the metadata for the collection.

A note about the data: We often use .jpeg files for our work. There is usually a file with the images and then a .csv file with the file name and relevant metadata. For example, you can download the fsa_color.zip directly to your machine and see how we set it up. There is a folder with the images as jpegs. We reduced the size for the purpose of this tutorial in Colab. In general, we find images around 2-3MB to be best for photographs since it captures significant visual data without being unnecessarily large. There is also a .csv file with the path name to each file and metadata that further describes contextual information for each image such as photographer and year taken.

### 2.3 Load Modules

As a final setup task, we need to load in all of the modules that we will use. Just run the following block to load these functions and classes. We will explain what each of these does during the demo.

In [None]:
import dvt
import numpy as np
import pandas as pd
import cv2
from google.colab.patches import cv2_imshow

def show_img(img):
    img_rgb = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    cv2_imshow(img_rgb)

Note that you will ideally not see any output in the previous block, which means that everything loaded without a problem. If you did have an error, this is likely an indication that a more serious problem has occured.



*   dvt = [Distant Viewing Toolkit ](https://github.com/distant-viewing/dvt) is our library for supporting computer vision.
*  numpy = [NumPy](https://numpy.org/) is for working with arrays and matrices.

*   pandas = [pandas](https://pandas.pydata.org/) is for data analysis and manipulation.
*   cv2 = [OpenCV](https://opencv.org/) is a computer vision library.







# 3. Metadata

Before we start diving into the images, let's load the image metadata and take a moment to understand what information exists for these photographs.

In [None]:
import pandas

dt = pandas.read_csv("fsa_color/fsa_color_metadata_subset.csv")
dt

# 4. Digital Images

Briefly, let's start by understanding how digitial images are stored inside of Python. We'll load into Python one of the images that we downloaded and then look at the structure. Reading in the image with the function `dvt.load_image` and using our wrapper function `show_img` displays the image in much the way you would see it on a typical website.

In [None]:
img = dvt.load_image("fsa_color/images/" + dt.path[94])
show_img(img)

The actual image data in Python, though, is stored as grids of numbers. Specifically, we have the rectangular grids of pixel intensities corresponding to the colors red, green, and blue. We can see the shape of these data using the `shape` method:

In [None]:
img.shape

We can try to look at small slices of the pixel intensities by using the following notation in Python:

In [None]:
img[50:55, 600:605, 0]

The challenge is that we won't get very far trying to interpret the photographs by looking at these individual numbers.

# 5. Annotating One Image


### 5.1 Initial Annotators

While many annotation that we might use to provide structured information describing an image requires complex deep learning models, this is not a requirement. We can also apply relatively simple algorithms to determine things such as the average value (brightness) or saturation (richness) of the image. The brightness comes from just the average of the pixel intensities:

In [None]:
np.mean(img)

Saturation requires using a different color space, to which we can then apply the average.

In [None]:
img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
np.mean(img_hsv[:, :, 1])

This may not be very interesting on its own, but both of these measurements can reveal interesting patterns when aggregated across a larger collection of images.

### 5.2 Detecting and Identifying Faces

Now that we understand the basic structure of the toolkit, let's work through some more complex annotators. For example, we will use the `AnnoFaces` to detect all of the faces in our example image. We pass both a detector (to find the faces) as well as an embedding (to identify the faces).

In [None]:
anno_face = dvt.AnnoFaces()
out_face = anno_face.run(img, visualize=True)
pd.DataFrame(out_face['boxes'])

The annotation includes detections of the image, which gives a bounding box for each face and a confidence score. The final column provides an *image embedding*. If we applied this to a larger collection, we could identify people across images/frames by associating embeddings that are very close to one another.

In [None]:
show_img(out_face['img'])

### 5.3 Image Embedding

We can also apply an embedding to the entire image. These are useful for doing visual search, building recommendor systems, and doing visualizations of a larger corpus. Here is an example embedding a single image:

In [None]:
anno_embed = dvt.AnnoEmbed()
out_embed = anno_embed.run(img)
out_embed

The output gives an *embedding* as a sequence of 1280 numbers.

In [None]:
out_embed['embedding'].shape

### 5.4 Instance Segmentation

> Indented block



Another annotator that exists in the toolkit conducts instance annotation. This annotators try to locate common objects and people in the frame of the image. Unlike the face detector, it tries to find entire people rather than only detecting faces. Here is what the structured output look like:

In [None]:
anno_detect = dvt.AnnoDetect()
pd.DataFrame(anno_detect.run(img))

# 6. Applying to the Whole Collection

### 6.1 Compute Simple Annotator on All Images

Looking at one image is interesting, but what if we apply an annotator to the entire collection? We can do that with the code below.

In [None]:
output = {'path': [], 'value': [], 'saturation': []}
for fname in dt.path.values:
    img = dvt.load_image("fsa_color/images/" + fname)
    img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    avg_saturation = np.mean(img_hsv[:, :, 1])
    output['path'] += [fname]
    output['value'] += [np.mean(img_hsv[:, :, 2])]
    output['saturation'] += [np.mean(img_hsv[:, :, 1])]

Here is what the averages data looks like:

In [None]:
output = pd.DataFrame(output)
output

We can sort this data to find the brightest images.

In [None]:
output.sort_values(by=['value'], ascending=False)

And then, we can look at the images with the most extreme values:

In [None]:
img = dvt.load_image("fsa_color/images/" + "1a34792v.jpg")
show_img(img)

### 6.2 Nearest Embedding

We can also apply the embedding annotator, which will enable finding nearest neighbors. Note that it will take a few minutes to finish.

In [None]:
anno_embed = dvt.AnnoEmbed()
out_embed = anno_embed.run(img)

X = np.zeros((dt.shape[0], 1280))
for ival, fname in enumerate(dt.path.values):
    img = dvt.load_image("fsa_color/images/" + fname)
    out_embed = anno_embed.run(img)
    X[ival, :] = out_embed['embedding']

X.shape

Now, we can compute the images that are the closest to any starting image.

In [None]:
import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (16, 16)

In [None]:
ref_img_num = 400       # change this number!

idx = np.argsort(np.sum(np.abs(X - X[ref_img_num, :])**2, axis=1))[:12]
for ind, i in enumerate(idx):
    plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
    plt.subplot(4, 3, ind + 1)

    img = dvt.load_image("fsa_color/images/" + dt.path[i])
    plt.imshow(img)
    plt.axis("off")

And we can also get metadata information for the starting image:

In [None]:
dt.iloc[[ref_img_num]]

## **7**. Conclusions and Cautions

Applying computer vision algorithms to large collections of culturally important images and moving images offers some exciting possibilities. The
Distant Viewing Toolkit was designed to lower the barrier of doing this and
make the possibilities more accessible to a wide range of interested users.
However, there remain many technical and ethical challenges for the application
of this work that should not be ignored. We encourage you to explore the
toolkit, while being aware of the potential issues and unintended consequences
that these methods could exacerbate.

To pursue a much deeper dive, *Distant Viewing: Computational Analysis of Digital Images* is coming out this fall with MIT Press. Open access! For more info, visit: https://mitpress.mit.edu/9780262546133/distant-viewing/





If you are interested in discuss these issues and areas of application, please
reach out the Distant Viewing Lab's directors Taylor Arnold
(tarnold2@richmond.edu) and Lauren Tilton (ltilton@richmond.edu).

