This repository contains the (1) Learned Perceptual Image Patch Similarity (LPIPS) metric and (2) Berkeley-Adobe Perceptual Patch Similarity (BAPPS) dataset proposed in the following paper:
This repository uses Python 2 or 3, with the following libraries: PyTorch, numpy, scipy, skimage.
(1) Learned Perceptual Image Patch Similarity (LPIPS) metric
About the metric
We found that deep network activations work surprisingly well as a perceptual similarity metric. This was true across network architectures (SqueezeNet [2.8 MB], AlexNet [9.1 MB], and VGG [58.9 MB] provided similar scores) and supervisory signals (unsupervised, self-supervised, and supervised all perform strongly). We slightly improved scores by linearly "calibrating" networks - adding a linear layer on top of off-the-shelf classification networks. We provide 3 variants, using linear layers on top of the SqueezeNet, AlexNet (default), and VGG networks. Using this code, you can simply call
model.forward(im0,im1) to evaluate the distance between two image patches.
Using the metric
test_network.py contains example usage. Running
test_network.py will take the distance between example reference image
ex_ref.png to distorted images
ex_p1.png. Before running it - which do you think should be closer?
Load a model with the following commands.
from models import dist_model as dm model = dm.DistModel() model.initialize(model='net-lin',net='alex',use_gpu=True)
net can be
alex is fastest and performs the best (not specifying the
net will default to
model=net for an uncalibrated off-the-shelf network (taking cos distance).
To call the model, run
im0, im1 are PyTorch tensors with shape
python Nx3xHxW (
N patches of size
HxW, RGB images scaled in
(2) Berkeley Adobe Perceptual Patch Similarity (BAPPS) dataset
Downloading the dataset
bash ./scripts/get_dataset.sh to download and unzip the dataset. Dataset will appear in directory
./dataset. Dataset takes [6.6 GB] total.
- 2AFC train [5.3 GB]
- 2AFC val [1.1 GB]
- JND val [0.2 GB]
bash ./scripts/get_dataset_valonly.shto only download the validation set (no training set).
Evaluating a perceptual similarity metric on a dataset
test_dataset_model.py evaluates a perceptual model on a subset of the dataset.
jnd, which type of perceptual judgment to evaluate
datasets: list the datasets to evaluate
2afc, choices are [
jnd, choices are [
Perceptual pimilarity model flags
model: perceptual similarity model to use
net-linfor our LPIPS learned similarity model (linear network on top of internal activations of pretrained network)
netfor a classification network (uncalibrated with all layers averaged)
l2for Euclidean distance
ssimfor Structured Similarity Image Metric
net: choices are [
vgg] for the
netmodels (ignored for
colorspace: choices are [
RGB], used for the
ssimmodels (ignored for
batch_size: evaluation batch size (will default to 1 )
--use_gpu: turn on this flag for GPU usage
An example usage is as follows:
python ./test_dataset_model.py --dataset_mode 2afc --datasets val/traditional val/cnn --model net-lin --net alex --use_gpu --batch_size 50. This would evaluate our model on the "traditional" and "cnn" validation datasets.
About the dataset
The dataset contains two types of perceptual judgements: Two Alternative Forced Choice (2AFC) and Just Noticeable Differences (JND).
(1) Two Alternative Forced Choice (2AFC) - Data is contained in the
2afc subdirectory. Evaluators were given a reference patch, along with two distorted patches, and were asked to select which of the distorted patches was "closer" to the reference patch.
Training sets contain 2 human judgments/triplet.
Validation sets contain 5 judgments/triplet.
Each 2AFC subdirectory contains the following folders:
refcontains the original reference patches
p0,p1contain the two distorted patches
judgecontains what the human evaluators chose - 0 if all humans preferred p0, 1 if all humans preferred p1
(2) Just Noticeable Differences (JND) - Data is contained in the
jnd subdirectory. Evaluators were presented with two patches - a reference patch and a distorted patch - for a limited time, and were asked if they thought the patches were the same (identically) or difference.
Each set contains 3 human evaluations/example.
val/traditional[4.8k patch pairs]
val/cnn[4.8k patch pairs]
Each JND subdirectory contains the following folders:
p0,p1contain the two patches
samecontains fraction of human evaluators who thought the patches were the same (0 if all humans thought patches were different, 1 if all humans thought patches were the same)
This repository borrows partially from the pytorch-CycleGAN-and-pix2pix repository.