Interfaces to the THINGS (1 2), IHSJ (3 4) and Yummly (5 6) datasets that produce data for triplet-based distance metric learning.
The code is close to production grade and provides an effective way to access triplet labeled datasets for distance metric learning.
- Python 3.8+
- more_itertools
- numpy
- scipy
- scikit-learn
- Pillow
- Tkinter
- psiz==0.5.1
- Navigate to the main THINGS dataset page on OSF and download the Main folder as a zip archive.
- Unzip the archive and its subarchives to folder {THINGS_ROOT}/Main.
- Navigate to the "Revealing the multidimensional mental representations..." page on OSF and download both the "data" and the "variables" folder as zip archives.
- Unzip the two archives to folder {THINGS_ROOT}/Revealing.
- Ask the corresponding author of the THINGS dataset for the labeled triplet data.
- Place the files in {THINGS_ROOT}/Revealing/triplets.
- Download the ImageNet dataset to folder {IHSJ_ROOT}/imagenet.
- Navigate to the IHSJ dataset page on OSF and download the file data/deprecated/psiz0.4.1/catalog.hdf5 to folder {IHSJ_ROOT}/val/catalogs/psiz0.4.1 and data/deprecated/psiz0.4.1/obs-195.hd5 to folder {IHSJ_ROOT}/val/obs/psiz0.4.1.
- Download the zip archive http://vision.cornell.edu/se3/wp-content/uploads/2014/09/food100-dataset.zip.
- Unzip the archive to {YUMMLY_ROOT}.
Take a look at the scripts in the tools/ directory, eg report_data_statistics.py, and the *DataInterface classes in order to understand how to implement a PyTorch Dataset / write TensorFlow records using on the data interfaces.
The code includes functionality to split the triplets into training, validation and test subsets - unit testing included. If desired, new splits can be implemented in the ThingsDataInterface class.
All tools must be run from the ditdml folder.
To see statistics like the number of images etc:
python ditdml/tools/report_data_statistics.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type quasi_original --seed 13
python ditdml/tools/report_data_statistics.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class --class-triplet-conversion-type all_instances --seed 13
python ditdml/tools/report_data_statistics.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class --class-triplet-conversion-type prototypes --seed 13
python ditdml/tools/report_data_statistics.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type all_instances --seed 13
python ditdml/tools/report_data_statistics.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type prototypes --seed 13
python ditdml/tools/report_data_statistics.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class --class-triplet-conversion-type all_instances --seed 15
python ditdml/tools/report_data_statistics.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class --class-triplet-conversion-type prototypes --seed 15
python ditdml/tools/report_data_statistics.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type all_instances --seed 15
python ditdml/tools/report_data_statistics.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type prototypes --seed 15
python ditdml/tools/report_data_statistics.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type same_training_validation_test --seed 16
python ditdml/tools/report_data_statistics.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type by_instance --seed 16
python ditdml/tools/report_data_statistics.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type by_instance_same_training_validation --seed 16
To interactively visualize labeled triplets:
python ditdml/tools/visualize_triplets.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type quasi_original --seed 23 --subset-name test --initial-triplet-index 200
python ditdml/tools/visualize_triplets.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class --class-triplet-conversion-type all_instances --seed 23 --subset-name training --initial-triplet-index 315715
python ditdml/tools/visualize_triplets.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class --class-triplet-conversion-type prototypes --seed 23 --subset-name validation --initial-triplet-index 42
python ditdml/tools/visualize_triplets.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type all_instances --seed 23 --subset-name test --initial-triplet-index 101
python ditdml/tools/visualize_triplets.py --dataset-name things --data-directory-name {THINGS_ROOT} --split-type by_class_same_training_validation --class-triplet-conversion-type prototypes --seed 23 --subset-name training --initial-triplet-index 22
python ditdml/tools/visualize_triplets.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class --seed 25 --subset-name test --initial-triplet-index 200
python ditdml/tools/visualize_triplets.py --dataset-name ihsjc --data-directory-name {IHSJ_ROOT} --split-type by_class --class-triplet-conversion-type prototypes --seed 25 --subset-name validation --initial-triplet-index 300
python ditdml/tools/visualize_triplets.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type same_training_validation_test --seed 26 --subset-name training --initial-triplet-index 222
python ditdml/tools/visualize_triplets.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type by_instance --seed 26 --subset-name test --initial-triplet-index 333
python ditdml/tools/visualize_triplets.py --dataset-name yummly --data-directory-name {YUMMLY_ROOT} --split-type by_instance_same_training_validation --seed 26 --subset-name validation --initial-triplet-index 444
(press left, right arrows)
To interactively visualize neighbors according to the provided embedding for THINGS:
python ditdml/tools/visualize_neighbors.py --data-directory-name {THINGS_ROOT} --num-neighbors 4 --initial-class-index 1854
(press left, right arrows)
To interactively visualize the similarity matrix for THINGS:
python ditdml/tools/visualize_similarity_matrix.py --data-directory-name {THINGS_ROOT}
(click on matrix elements in left pane to show image pairs)