Calculates feature vectors of an image dataset using the Deep Image Retrieval NN.
Automates these steps, which would normally be run manually:
- creates an image list (db_all.txt)
- creates the command line command (for Windows) including escaping backspaces etc.
- exectues the command

This requires having https://github.com/naver/deep-image-retrieval downloaded into a folder called 'deep-image-retrieval-repo'.
Make sure you also download a model. I was using Resnet101-AP-GeM-LM18, stored in deep-image-retrieval-repo/dirtorch/models

The folder structure should be as follows:

- Dummy Image Dataset
    - Images
- util
    - get_features.ipynb
- deep-image-retrieval-repo
    - dirtorch
    - README.md
    - ...

In [1]:
import os

# path to the image folder w.r.t deep-image-retrieval-repo
image_folder = "../Dummy Image Dataset/Images"

image_list_file = "../Dummy Image Dataset/db_all.txt"
image_list = os.listdir(image_folder)

with open(image_list_file, "w") as f:
    f.write("\n".join(image_list))

# w.r.t deep-image-retrieval-repo
output_file = "../Dummy Image Dataset/boijmans_dummy_features.npy"

dataset_str = "\"ImageList(\\\"{}\\\" , \\\"{}\\\")\"".format(image_list_file,image_folder )

In [2]:
# if this doesn't run successfully we need to restart the kernel to reset the working directory
%cd ..
%cd deep-image-retrieval-repo
# !ls

# # not used but needs to be set
os.environ['DB_ROOT'] = "dummy"

command = "-m dirtorch.extract_features --dataset {} --checkpoint dirtorch/models/Resnet101-AP-GeM-LM18.pt --output \"{}\" --whiten Landmarks_clean --whitenp 0.25 --gpu 0".format(dataset_str, output_file)
# can run the command manually too if running it per jupyter shell doesn't work. (make sure to set environment variable DB_ROOT to any variable before)
print("python",command)

C:\Users\Nutzer\Documents\IDP2\MySpace\IdP2 - gitlab\BoijmansPaper
C:\Users\Nutzer\Documents\IDP2\MySpace\IdP2 - gitlab\BoijmansPaper\deep-image-retrieval-repo
python -m dirtorch.extract_features --dataset "ImageList(\"../Dummy Image Dataset/db_all.txt\" , \"../Dummy Image Dataset/Images\")" --checkpoint dirtorch/models/Resnet101-AP-GeM-LM18.pt --output "../Dummy Image Dataset/boijmans_dummy_features.npy" --whiten Landmarks_clean --whitenp 0.25 --gpu 0


In [3]:
# this can take a while to compute (on cmdline it normally shows a progress bar) (~0.1s per image at 1050x1050 pixels on a Nvidia 2060 Super)
!python {command}

Launching on GPUs 0
Dataset: Dataset: ImageList
  50 images
  root: ../Dummy Image Dataset/Images...
=> loading checkpoint 'dirtorch/models/Resnet101-AP-GeM-LM18.pt' (current_iter 376)

>> Extracting features...
Features extracted.



DB:   0%|          | 0/50 [00:00<?, ?it/s]
DB:   2%|2         | 1/50 [00:03<02:42,  3.32s/it]
DB:  12%|#2        | 6/50 [00:03<00:18,  2.35it/s]
DB:  24%|##4       | 12/50 [00:03<00:07,  5.36it/s]
DB:  32%|###2      | 16/50 [00:03<00:05,  6.76it/s]
DB:  38%|###8      | 19/50 [00:04<00:04,  7.51it/s]
DB:  42%|####2     | 21/50 [00:04<00:03,  8.45it/s]
DB:  46%|####6     | 23/50 [00:04<00:02,  9.30it/s]
DB:  50%|#####     | 25/50 [00:04<00:02,  9.77it/s]
DB:  54%|#####4    | 27/50 [00:04<00:02, 10.18it/s]
DB:  58%|#####8    | 29/50 [00:04<00:01, 11.42it/s]
DB:  62%|######2   | 31/50 [00:05<00:01, 12.25it/s]
DB:  66%|######6   | 33/50 [00:05<00:01, 11.88it/s]
DB:  70%|#######   | 35/50 [00:05<00:01, 11.91it/s]
DB:  74%|#######4  | 37/50 [00:05<00:01, 11.45it/s]
DB:  78%|#######8  | 39/50 [00:05<00:01, 10.67it/s]
DB:  84%|########4 | 42/50 [00:05<00:00, 13.54it/s]
DB:  88%|########8 | 44/50 [00:06<00:00, 12.61it/s]
DB:  92%|#########2| 46/50 [00:06<00:00, 12.64it/s]
DB:  96%|#########6| 4