# <center>Critical AI</center>
<center>ENGL 54.41</center>
<center>Dartmouth College</center>
<center>Fall 2024</center>
<pre>Created: 04/29/2023; Revised: 10/02/2024</pre>

In [None]:
# This notebook uses the VGG16 neural network (https://www.robots.ox.ac.uk/~vgg/) 
# VGG16 was trained in ImageNet and can provide object detection, which can
# be useful with these images and the network's representation of scaled images
# as a vector of 1,000 values provides useful features for distance measurements,
# classifications, and other image tasks. 

import numpy as np
import random
from glob import glob
from matplotlib import pyplot as plt
from PIL import Image
import numpy, os, re
import torch, torchvision
from torchvision.models import vgg16
from torchvision.io import read_image, ImageReadMode
import sklearn

In [None]:
# create model and load saved weights. We are loading a pre-trained model
# that has been trained on ImageNet data. In our earlier lab, we worked in 
# two stages: training and testing. Note that we are not iterating through
# training data but simply supplying data to be classified, as we did with 
# the testing stage of the previous neural networks.

model = vgg16(weights='DEFAULT')
weights = torchvision.models.VGG16_Weights.DEFAULT
preprocess = weights.transforms()

In [None]:
# Put model into eval state. This will display the model's architecture.
# The architecture is the organization of the network. From the first object, we know
# that this is a Sequential container:
# https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html
# It is followed by a series of layers that perform a convolution on the input data.
# https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d
# Each Convolution (Conv2d) is followed by an activation function (ReLU, or Rectified 
# Linear Unit, which we used previously in our Perceptron-style network).
# https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

model.eval()

In [None]:
weights.meta.keys()

In [None]:
# This CNN network architecture will generate 1000 outputs, one for
# each label in the training data. Like our earlier neural networks,
# we will have output values that can be read as probabilities or
# confidence scores in the classification.
print(f'Total classes: {len(weights.meta["categories"])}')

In [None]:
print(f'Ten sample classes: {random.sample(weights.meta["categories"],k=10)}')

In [None]:
#################################################################################
### Here we are defining a number of helper functions. We'll use these later ###
#################################################################################

def display_image(image_file):
    img = np.asarray(Image.open(image_file))
    plt.imshow(img)
    
def decode_predictions(prediction,topk):
    # can't figure out how to make tensor's argsort reverse like this
    # we should already be on the CPU but leaving as such for future use
    class_ids = np.argsort(prediction.to('cpu').detach().numpy())[::-1][:topk]
    for i in class_ids:
        score = prediction[i].item()
        category_name = weights.meta["categories"][i]
        print(f"{100 * score:5.2f}% {category_name}")

def get_prediction(image_file,topk=5,display_flag=False):
    img = read_image(image_file,mode=ImageReadMode.RGB)
    batch = preprocess(img).unsqueeze(0)
    # we should already be on the CPU but leaving as such for future use
    prediction = model(batch.to('cpu')).squeeze(0).softmax(0)
    if display_flag:
        display_image(image_file)
    decode_predictions(prediction,topk=topk)

In [None]:
# We will load in a sample image.
# This comes from the Vilhjalmur Stefansson collection in Rauner
# https://www.library.dartmouth.edu/digital/digital-collections/vilhjalmur-stefansson-collection-arctic-photographs

# download image
!wget 'https://raw.githubusercontent.com/jeddobson/ENGL54.41-24F/6ebd8c683c3b0d230f16e99fdf7baa2113d10822/img/stefansson-arctic-image-rauner.png'

# load image
img = read_image('stefansson-arctic-image-rauner.png',mode=ImageReadMode.RGB)
print(f'Found Image of the following dimensions: {img.shape}')

# Why three dimensions? We used only grayscale images before with
# pixel intensity values of 0-255. Now we have RGB color images with
# three channels (R,G,B) of pixel intensity values that enable us
# represent color images.
# 
# But in order to display the image, we need to swap the dimensions:
# x,y,z 

plt.imshow(img.permute(1, 2, 0))
plt.show()

In [None]:
# Now we are going to preprocess this image to standardize the data
# for our CNN network.
img2 = preprocess(img).unsqueeze(0)

In [None]:
# What are our image dimensions now? Note the change from before. 
# What does this mean?
img2.shape

In [None]:
# preprocess() returns a 4d tensor. This enables us to pass multiple images
# but we only want 3d data from the first image right now.
plt.imshow(img2[0].permute(1,2,0))
plt.show()

## Preprocessing

Inputs for almost all machine learning and artificial intelligence systems need to be normalized. Think back to our MNIST dataset of handwritten digits or the CelebA dataset. The faces were all centered and the eyes aligned. The images were then cropped and resized to all be the identical size. The digits were all 28x28 pixel matrices. This normalization of data enables the same transformations to be applied to all images. Our above image has been resized to be 224x224 pixels--but has it been resized? What differences do you notice between the above image the initial image?

## Prediction

We run *inference* on a model to obtain outputs from our inputs. In this case we are also going to run softmax on these outputs while assigning to our outputs variable. 

In [None]:
outputs = model(img2.to('cpu')).squeeze(0).softmax(0)

In [None]:
# What shape are our outputs from the CNN? What does this 
# tell us?
outputs.shape

In [None]:
# We can map these values to our labels and sort them with 
# the following function:
decode_predictions(outputs,topk=25)

Do these classes and values make sense? Do they make less sense after a certain point? Do you have a sense of what visual features may have informed the model?

In [None]:
# To see all the classes that could be used we can examine the values
# held in the weights.meta dictionary as key 'categories'. Read through 
# these. Do they make sense to you? What do you notice?
weights.meta["categories"]

## Putting it together!

Okay, now we want to find an image, from your computer or off the internet. Go to a website that you like or use Google Image Search. Whatever. Download the image if needed, rename to something short (like testimage.png) if the name is wonky, and then upload to JupyterLab.

Replace FILENAME below with the name of your file.

In [None]:
get_prediction('FILENAME',topk=10,display_flag=True)