# Pratical session on Image Retrieval

In this practical session, we will explore how to perform typical tasks associated with image retrieval. Students will be able to download this IPython/Jupyter notebook after the class in order to perform the experiments also at home. 

**Link to the slides**: [PDF 30MB](https://www.dropbox.com/s/mjmh8al5wg6731j/18_07_PAISS_practical_session.pdf?dl=0)

## Installation

This code requires Python 3, Pytorch 0.4, and Jupyter Notebook. Follow the instructions below to install all the necessary dependencies.

### Installing dependencies

First, download and install the appropriate version of miniconda following the instructions for [MacOS](https://conda.io/docs/user-guide/install/macos.html) or [Linux](https://conda.io/docs/user-guide/install/linux.html).

Then run the following commands:

```
source $HOME/miniconda3/bin/activate #Activates your conda environment
conda install numpy matplotlib ipython scikit-learn jupyter
conda install pytorch torchvision faiss-cpu -c pytorch
```


### Downloading the code, dataset, and models

First, clone this repository:

```
cd $HOME/my_projects
git clone https://github.com/rafarez/paiss.git
```

Then, you will need to download 4 files:

- oxbuild_images.tgz (1.8GB)
- gt\_files\_170407.tgz (280KB)
- features.tgz (579MB)
- models.tgz (328MB)

and store them in the appropriate paths.

_Note:_ All paths in this section are relative to the root directory of this repository.

#### Oxford dataset

On Linux/MacOS, execute the following:

```
cd $HOME/my_projects/paiss
wget www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz -O images.tgz
mkdir -p data/oxford5k/jpg && tar -xzf images.tgz -C data/oxford5k/jpg
wget www.robots.ox.ac.uk/~vgg/data/oxbuildings/gt_files_170407.tgz -O gt_files.tgz
mkdir -p data/oxford5k/lab && tar -xzf gt_files.tgz -C data/oxford5k/lab
```


#### Features and models

On Linux/MacOS, execute the following:

```
cd $HOME/my_projects/paiss
wget https://www.dropbox.com/s/gr404xlfr4021pw/features.tgz?dl=1 -O features.tgz
tar -xzf features.tgz -C data
wget https://www.dropbox.com/s/mr4risqu7t9neel/models.tgz?dl=1 -O models.tgz
tar -xzf models.tgz -C data
```



#### Running the notebook

```
cd $HOME/my_projects/paiss
jupyter notebook --ip='localhost' --port=61230 --NotebookApp.token=''
```

# Preparatives

We start by importing the necessary modules and fixing a random seed.

In [None]:
import numpy as np
from numpy.linalg import norm
import torch
from torch import nn
import json
import pdb
import sys
import os.path as osp

from datasets import create
from archs import *
from utils.test import extract_query
from utils.tsne import do_tsne

np.random.seed(0)

We then instantiate the Oxford dataset, that we will use in all following experiments.

In [None]:
# create Oxford 5k database
dataset = create('Oxford')

# get the label vector
labels = dataset.get_label_vector()
classes = dataset.get_label_names()

# load the dictionary of the available models and features
with open('data/models.json', 'r') as fp:
    models_dict = json.load(fp)

# Part 1: Training

## a) Creating a network with the AlexNet architecture

As a first step, we will be creating a neural network implementing the AlexNet architecture to use in our experiments.

In [2]:
# instantate the model for the first experiment
model_1a = alexnet_imagenet()

# show the network details
print(model_1a)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_feature

In [None]:
dfeats = np.load(models_dict['alexnet-cls-imagenet-fc7']['dataset'])

# Q: What does each line of the matrix feats represent? Where does the dimension of these lines comes from and how do we extract these features?
# Hint: uncomment and run the following command
# model_1a_test = alexnet_imagenet_fc7(); print(model_1a_test)

In [None]:
norm(dfeats[:10], axis=1)

In [None]:
dfeats.shape

In [None]:
# visualize top results for a given query
dataset.vis_top(dfeats, q_idx, ap_flag=True, out_image_file=out_image)

if args.show_tsne:
    # run t-SNE
    do_tsne(dfeats, labels, classes, sec='1a')
    # Q: What can be observe from the t-SNE visualization? Which classes 'cluster' well? Which do not?

## b) Finetuning the created network on the Landmarks dataset

In [None]:
model_1b = alexnet_lm()
print(model_1b)
input("Check session.py. Press Enter to continue...")
# Q: Why do we change the last layer of the AlexNet architecture? How do we initialize the layers of model_1b for finetuning?

dfeats = np.load(models_dict['alexnet-cls-lm-fc7']['dataset'])
qfeats = np.load(models_dict['alexnet-cls-lm-fc7']['queries'])
dataset.vis_top(dfeats, q_idx, ap_flag=True, out_image_file=out_image)

if args.show_tsne:
    # run t-SNE
    do_tsne(dfeats, labels, classes, sec='1b')
    # Q: How does the visualization change after finetuning? What about the top results?

# question on how the architecture demands the resize of the input images (specifically, the fully connected layers) ##########

## c) Replacing last max pooling layer with GeM layer

In [None]:
model_1c = alexnet_GeM()
print(model_1c)
input("Check session.py. Press Enter to continue...")
# Q: For this model, we remove all fully connected layers (classifier layers) and replace the last max pooling layer by an aggregation pooling layer (more details about this layer in the next subsection)

dfeats = np.load(models_dict['alexnet-cls-lm-gem']['dataset'])
qfeats = np.load(models_dict['alexnet-cls-lm-gem']['queries'])
print(dfeats.shape)
input("Check session.py. Press Enter to continue...")
# Q: Why does the size of the feature representation changes? Why does the size of the feature representation is important for a image retrieval task?
dataset.vis_top(dfeats, q_idx, ap_flag=True, out_image_file=out_image)

if args.show_tsne:
    do_tsne(dfeats, labels, classes, sec='1c')
    # Q: How does the aggregation layer changes the t-SNE visualization? Can we see some structure in the clusters of similarly labeled images?

## d) ResNet18 architecture with GeM pooling

In [None]:
model_0 = resnet18()
model_1d = resnet18_GeM()
print(model_0.adpool)
print(model_1d.adpool)
input("Check session.py. Press Enter to continue...")
# Q: Why do we change the average pooling layer of the original Resnet18
# architecture for a generalized mean pooling? What operation is the layer
# model_1d.adpool doing?
# Hint: You can see the code of the generalized mean pooling in file pooling.py

# load oxford features from ResNet18 model
dfeats = np.load(models_dict['resnet18-cls-lm-gem']['dataset'])
qfeats = np.load(models_dict['resnet18-cls-lm-gem']['queries'])
print(norm(dfeats[:10], axis=1))
print(dfeats.shape)
# visualize top results for a given query index
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True, out_image_file=out_image)



#### Show t-SNE:

In [None]:
do_tsne(dfeats, labels, classes, sec='1d')
# Q: How does this model compare with model 1c, that was trained in the same dataset for the same task? How does is compare to the finetuned models of 1b?

## e) PCA Whitening

In [None]:
# We use a PCA learnt on landmarks to whiten the output features of 'resnet18-cls-lm-gem'
dfeats = np.load(models_dict['resnet18-cls-lm-gem-pcaw']['dataset'])
qfeats = np.load(models_dict['resnet18-cls-lm-gem-pcaw']['queries'])
dataset.vis_top(dfeats, q_idx, q_feat=qfeats[q_idx], ap_flag=True)

Now we visualize the data with t-SNE

In [None]:
do_tsne(dfeats, labels, classes, sec='1e-1')

# run t-SNE including unlabeled images
do_tsne(dfeats, labels, classes, sec='1e-2', show_unlabeled=True)
# Q: What can we say about the separation of data when included unlabeled images? And the distribution of the unlabeled features? How can we train a model to separate labeled from unlabeled data?