# Different types of metrics with pytorch implementations, metrics chosen from [1](https://arxiv.org/pdf/1806.07755.pdf)

Basically all of these can operate in the feature space of a pre-trained model on the ImageNet dataset. Map input into any semantically meaningful feature space; imagenet models tend to have a good feature representation, so using them is safe.

### Inception Score, most widely adopted in literature [source](https://papers.nips.cc/paper/6125-improved-techniques-for-training-gans.pdf)

Pytorch implementation: https://github.com/sbarratt/inception-score-pytorch

Note, [this paper](https://arxiv.org/pdf/1801.01973.pdf) warns against using the inception score.

Inception score simply evaluates the distribution of the (generated) images.
From [Wikipedia](https://en.wikipedia.org/wiki/Inception_score):
The Inception Score is maximized when the following conditions are true:
1. The entropy of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct".
2. The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse".

### Fréchet Inception Distance (FID), [source](https://arxiv.org/abs/1706.08500)

PyTorch implementation: https://github.com/mseitzer/pytorch-fid

FID compares the distribution of generated images with the distribution of a set of real images ("ground truth").

From [Wikipedia](https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance), FID is the current standard metric for assessing the quality of generative models as of 2020

In [2]:
!python -m pytorch_fid --num-workers 2 data/chest_xray/train/NORMAL data/chest_xray/test/NORMAL

  0%|                                                    | 0/27 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/home/nashir/miniconda3/envs/cap5516-final/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/nashir/miniconda3/envs/cap5516-final/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/nashir/miniconda3/envs/cap5516-final/lib/python3.9/site-packages/pytorch_fid/__main__.py", line 3, in <module>
    pytorch_fid.fid_score.main()
  File "/home/nashir/miniconda3/envs/cap5516-final/lib/python3.9/site-packages/pytorch_fid/fid_score.py", line 313, in main
    fid_value = calculate_fid_given_paths(args.path,
  File "/home/nashir/miniconda3/envs/cap5516-final/lib/python3.9/site-packages/pytorch_fid/fid_score.py", line 259, in calculate_fid_given_paths
    m1, s1 = compute_statistics_of_path(paths[0], model, batch_size,
  File "/home/nashir/miniconda3/envs/cap5516-final/li

Need to modify to have data folders where all images have same size. Test on fashion mnist:

In [3]:
!python -m pytorch_fid --num-workers 2 data/fmnist/set1-real data/fmnist/set2-real

100%|█████████████████████████████████████████████| 4/4 [00:03<00:00,  1.03it/s]
100%|█████████████████████████████████████████████| 2/2 [00:01<00:00,  1.97it/s]
FID:  87.44324079827993


In [4]:
!python -m pytorch_fid --num-workers 2 data/fmnist/set1-real data/fmnist/set1-real

100%|█████████████████████████████████████████████| 4/4 [00:03<00:00,  1.32it/s]
100%|█████████████████████████████████████████████| 4/4 [00:01<00:00,  2.23it/s]
FID:  -6.110252797952853e-05


### Kernel Inception Distance (KID), [source](https://arxiv.org/pdf/1801.01401.pdf)

Python implementation: https://github.com/abdulfatir/gan-metrics-pytorch/blob/master/kid_score.py

### Wasserstein distance

### Mode score, [source](https://arxiv.org/abs/1612.02136)

PyTorch implementation: https://github.com/xuqiantong/GAN-Metrics/blob/45dc74fac8b2452d5f37f035bba7352f873c7f91/metric.py#L356-L361

### Difference in maximum mean discrepancies (MMDs), [source](https://arxiv.org/abs/1511.04581)

PyTorch Implementation: https://github.com/ZongxianLee/MMD_Loss.Pytorch

### Classifier two-sample test, also called 1-NN classifier [source](https://arxiv.org/abs/1610.06545)

TODO: add this later if time

Note: human evaluation tends to be biased towards the visual quality of generated samples and neglect the overall distributional characteristics, which are important for unsupervised learning, [source](https://arxiv.org/pdf/1806.07755.pdf).