<a href="https://colab.research.google.com/github/potis/AISummit/blob/main/cifar10_baseline_additional_material.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch OOD


## https://pytorch-ood.readthedocs.io/


### Maximum Softmax Probability

Implements the Maximum Softmax Probability (MSP) Thresholding baseline for OOD detection.
Optionally, implements temperature scaling, which divides the logits by a constant temperature **T** before calculating the softmax.

https://arxiv.org/abs/1610.02136

### Max Logit Method

Implements the Max Logit Method for OOD Detection as proposed in Scaling Out-of-Distribution Detection for Real-World Settings.


https://arxiv.org/abs/1911.11132



### OpenMax
Implementation of the OpenMax Layer as proposed in the paper Towards Open Set Deep Networks.

The methods determines a center for each class in the logits space of a model, and then creates a statistical model of the distances of correct classified inputs. It uses extreme value theory to detect outliers by fitting a weibull function to the tail of the distance distribution.

We use the activation of the unknown class as outlier score.

https://arxiv.org/abs/1511.06233

### ODIN Preprocessing

Implements ODIN from the paper Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks.

ODIN is a preprocessing method for inputs that aims to increase the discriminability of the softmax outputs for IN and OOD data.


https://arxiv.org/abs/1706.02690


### EnergyBased
Implements the Energy Score of Energy-based Out-of-distribution Detection.

This methods calculates the negative energy for a vector of logits. This value can be used as outlier score.


https://proceedings.neurips.cc/paper/2020/file/f5496252609c43eb8a3d147ab9b9c006-Paper.pdf


### Mahalanobis Method

Implements the Mahalanobis Method from the paper A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks.


### Monte Carlo Dropout

From the paper Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Forward-propagates the input through the model several times with activated dropout and averages the results.


http://proceedings.mlr.press/v48/gal16.pdf


### Virtual Logit Matching

Implements Virtual Logit Matching (ViM) from the paper ViM: Out-Of-Distribution with Virtual-logit Matching.

https://arxiv.org/abs/2203.10807


### KL-Matching
Implements KL-Matching from the paper Scaling Out-of-Distribution Detection for Real-World Settings.

https://arxiv.org/abs/1911.11132

### Entropy-based
This methods calculates the entropy based on the logits of a classifier. Higher entropy means more uniformly distributed posteriors, indicating larger uncertainty



In [3]:
!pip install pytorch_ood

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytorch_ood
  Downloading pytorch_ood-0.1.2-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.4/101.4 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting torchmetrics==0.10.2 (from pytorch_ood)
  Downloading torchmetrics-0.10.2-py3-none-any.whl (529 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m529.7/529.7 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torchmetrics, pytorch_ood
Successfully installed pytorch_ood-0.1.2 torchmetrics-0.10.2


In [4]:
import pandas as pd  # additional dependency, used here for convenience
import torch
import torchvision.transforms as tvt
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10

from pytorch_ood.dataset.img import (
    LSUNCrop,
    LSUNResize,
    Textures,
    TinyImageNetCrop,
    TinyImageNetResize,
)
from pytorch_ood.detector import (
    ODIN,
    EnergyBased,
    Entropy,
    KLMatching,
    Mahalanobis,
    MaxLogit,
    MaxSoftmax,
    ViM,
)
from pytorch_ood.model import WideResNet
from pytorch_ood.utils import OODMetrics, ToRGB, ToUnknown

device = "cuda:0"

torch.manual_seed(123)

<torch._C.Generator at 0x7f3524196c90>

Setup preprocessing



In [5]:
mean = [x / 255 for x in [125.3, 123.0, 113.9]]
std = [x / 255 for x in [63.0, 62.1, 66.7]]
trans = tvt.Compose(
    [tvt.Resize(size=(32, 32)), ToRGB(), tvt.ToTensor(), tvt.Normalize(std=std, mean=mean)]
)

Setup datasets



In [6]:
dataset_in_test = CIFAR10(root="data", train=False, transform=trans, download=True)

# create all OOD datasets
ood_datasets = [Textures, LSUNCrop]
datasets = {}
for ood_dataset in ood_datasets:
    dataset_out_test = ood_dataset(
        root="data", transform=trans, target_transform=ToUnknown(), download=True
    )
    test_loader = DataLoader(dataset_in_test + dataset_out_test, batch_size=256)
    datasets[ood_dataset.__name__] = test_loader

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:02<00:00, 74446703.62it/s]


Extracting data/cifar-10-python.tar.gz to data
Downloading https://thor.robots.ox.ac.uk/datasets/dtd/dtd-r1.0.1.tar.gz to data/textures-r1_0_1.tar.gz


100%|██████████| 625239812/625239812 [00:19<00:00, 31986946.13it/s]


Extracting data/textures-r1_0_1.tar.gz to data
Downloading https://ucd372c1202449535828078fd857.dl.dropboxusercontent.com/cd/0/inline2/B-a_RKz6J2yk8MDXXkX15Wd6tTsCVNttlA1_xc64Xz_11bFhWcqdLDryFVH7Zl3wW_dsfYGWAdF8gNAUiRD9lZ14UrJ8B4BwTxQ8-EOsTXr5Wpeq5PRoNuoMVC8wAG2NDz0AfWLFdb-S76REiGlyTAPqs_Lf-Q5EhyP67RKlzQQr6v9O-76Do3k1RXH8EVQRe8s6or0Qv1Gme5y5VDs6wJdgmTggvvub4yT1IovPK2E5aV2o-ylfvN88ix-I0DBoilox9Ln2i-wQ_A3lcYoSTjkugO72ySuVgXz7Jga38cHd-qPaPW96Vq-VEha4iK7a91drePHEqNvQTdtNI_xf7W0KUertp4AG0sXMCsco9MSl2a5b61uNCHLpLJrPrx8yWA1Z7htRce9skhMktfctv6J5_hil64CKkEPkN2Xm8luXxg/file to data/Imagenet.tar.gz


100%|██████████| 26501958/26501958 [00:00<00:00, 104334619.04it/s]


Extracting data/Imagenet.tar.gz to data
Downloading https://uc438216df000adf545ccb699330.dl.dropboxusercontent.com/cd/0/inline2/B-Yez3AS7xtKpQzqCN0HVFbZh8aL1Gs4UAMcJbheySyna_H4YXDSDqjut7DiPhrgzDYRhxPTxk2RfR9mBtQJd0o3SeMcXwjE-zZ9C1RQVrI1XDlrVmSVZnZNIKgBYfx3z_4z1zf11CaBpHvNo0SCZQByapFtNIwsOVA5gX7Kh5g4ckhPAP_VWC8YuYrxmpHpMWuVPyXFtlzZICaSsI1opvWnaGBjDOfzp0TGL5PKrnhV-kaNWsUOJGwY7QF2QV304qrVeranIubOVkDTFN8Zk_DCN3cBUxf2K_VrwQSN9SelTphxBmL1gQXT2x9SEiOSY8YiBBGttCnRS88rKVFAnk7A-vADXASLEaWrYbAnw-8JWYt0J2thsNBOCGXBNd0HxhovH4_SxaOXCV1bhaLHPGQ51EsK5IexJdiLGs253yLDWQ/file to data/Imagenet_resize.tar.gz


100%|██████████| 4550980/4550980 [00:00<00:00, 27111485.22it/s]


Extracting data/Imagenet_resize.tar.gz to data
Downloading https://ucd1da99366c78a6e0f2bc09ab4b.dl.dropboxusercontent.com/cd/0/inline2/B-ZMUEXM0RtzX4x1rht54ukpydcQJBSxp3P1cjnntXyyMuqUk093O2CQrYFtc5GzJghuiI5SlzsQh6Eo3aCAMBAHmttyPPgxPMUa39lddaiPY6Z5eTummivUMXmVVQs5MRIPqOkwOBgqognINVoPI1OUJAeEBw6pYgeieVHAkh0FG5ga4WDq9NJpSHXz1fWRo7bLfQFDB8UwF9KDe8gNW3J0sFHdq0QxpD-JsaO11mM_ZbhPfuaCzZ_4StgdrQA2PH-XdZUFHD2AzNyLNn7qpAOWBlc5o-2rrKoB5zpeAG10CVz-1xOe8Nc2yYjya71-5sBIjHH12VDnMdvhFJzbejdRhOCq45itl0y7hkcrnP7Oa5caAX8kw1rtOFFFPg2UxjOdaVY3buJQ5Krf1bZFx2gWw5nfcaJ0ld8e2sx06-nKFg/file to data/LSUN.tar.gz


100%|██████████| 17309383/17309383 [00:00<00:00, 22960213.29it/s]


Extracting data/LSUN.tar.gz to data
Downloading https://uc60e19bb941cfb2018bc15b0f49.dl.dropboxusercontent.com/cd/0/inline2/B-YPV-8oiRHF3MEEu-pvmhyqHgduFbxUuwA90MX8scUrgefRVl3g9_5o6nkf0C7MhLrd1GZqFkQ88opy-OJFEaVuIyAa_hGmpM7-uccVPL7HZDl55fExOf2HXhXRkdRmKCq5nPHNY2Q6YPxcvHaTFsmIlEqsO0JzFVzL0Dz2tCDacrkYQ17P3Qg7R-DQVkZ0XBmV2O25UT_HgxLuX-in64jNEfGnNjSgJt0Hrkfb66fux1vvZ1nQ0h5d9EMuwsKqS7r1PMg6QgEI_yflQVqLk1Kb3om6Zx8tV4XGe1Ip0X2Ai7_XdE-Ip4XNNRlckDTvPrEghWOLoCOEQRDbWpVLk4TUoRXEzJ2BB-1aJV00JcY49W9hZ7v239Fm5Izh_AiVdM0zdSmRl1Z22GaD130PqfEjDXm6gFUBv6lcX3QADgezaQ/file to data/LSUN_resize.tar.gz


100%|██████████| 4688973/4688973 [00:00<00:00, 28546224.13it/s]


Extracting data/LSUN_resize.tar.gz to data


**Stage 1**: Create DNN with pre-trained weights from the Hendrycks baseline paper



In [7]:
print("STAGE 1: Creating a Model")
model = WideResNet(num_classes=10, pretrained="cifar10-pt").eval().to(device)

STAGE 1: Creating a Model


Downloading: "https://github.com/wetliu/energy_ood/raw/master/CIFAR/snapshots/pretrained/cifar10_wrn_pretrained_epoch_99.pt" to /root/.cache/torch/hub/checkpoints/wrn-cifar10-pt.pt
100%|██████████| 8.62M/8.62M [00:00<00:00, 63.3MB/s]


**Stage 2**: Create OOD detector



In [8]:
print("STAGE 2: Creating OOD Detectors")
detectors = {}
detectors["Entropy"] = Entropy(model)
detectors["ViM"] = ViM(model.features, d=64, w=model.fc.weight, b=model.fc.bias)
detectors["Mahalanobis"] = Mahalanobis(model.features, norm_std=std, eps=0.002)
detectors["KLMatching"] = KLMatching(model)
detectors["MaxSoftmax"] = MaxSoftmax(model)
detectors["EnergyBased"] = EnergyBased(model)
detectors["MaxLogit"] = MaxLogit(model)
detectors["ODIN"] = ODIN(model, norm_std=std, eps=0.002)

# fit detectors to training data (some require this, some do not)
print(f"> Fitting {len(detectors)} detectors")
loader_in_train = DataLoader(CIFAR10(root="data", train=True, transform=trans), batch_size=256)
for name, detector in detectors.items():
    print(f"--> Fitting {name}")
    detector.fit(loader_in_train, device=device)

STAGE 2: Creating OOD Detectors
> Fitting 8 detectors
--> Fitting Entropy
--> Fitting ViM
--> Fitting Mahalanobis
--> Fitting KLMatching
--> Fitting MaxSoftmax
--> Fitting EnergyBased
--> Fitting MaxLogit
--> Fitting ODIN


**Stage 3**: Evaluate Detectors



In [9]:
print(f"STAGE 3: Evaluating {len(detectors)} detectors on {len(datasets)} datasets.")
results = []

with torch.no_grad():
    for detector_name, detector in detectors.items():
        print(f"> Evaluating {detector_name}")
        for dataset_name, loader in datasets.items():
            print(f"--> {dataset_name}")
            metrics = OODMetrics()
            for x, y in loader:
                metrics.update(detector(x.to(device)), y.to(device))

            r = {"Detector": detector_name, "Dataset": dataset_name}
            r.update(metrics.compute())
            results.append(r)

# calculate mean scores over all datasets, use percent
df = pd.DataFrame(results)
mean_scores = df.groupby("Detector").mean() * 100
print(mean_scores.sort_values("AUROC").to_csv(float_format="%.2f"))

STAGE 3: Evaluating 8 detectors on 5 datasets.
> Evaluating Entropy
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating ViM
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating Mahalanobis
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating KLMatching
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating MaxSoftmax
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating EnergyBased
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating MaxLogit
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
> Evaluating ODIN
--> Textures
--> TinyImageNetCrop
--> TinyImageNetResize
--> LSUNCrop
--> LSUNResize
Detector,AUROC,AUPR-IN,AUPR-OUT,FPR95TPR
KLMatching,88.73,86.95,85.10,58.73
MaxSoftmax,91.85,88.5

  mean_scores = df.groupby("Detector").mean() * 100


In [18]:
print(f"STAGE 3: Evaluating {len(detectors)} detectors on {len(datasets)} datasets.")
results = []

with torch.no_grad():
    for detector_name, detector in detectors.items():
        print(f"> Evaluating {detector_name}")
        for dataset_name, loader in datasets.items():
            print(f"--> {dataset_name}")
            metrics = OODMetrics()
            x,y=next(iter(loader))
            print(dataset_name, dataset_name)
            prediction=detector(x[0:1].to(device))
            print(prediction)

#             r = {"Detector": detector_name, "Dataset": dataset_name}
#             r.update(metrics.compute())
#             results.append(r)

# # calculate mean scores over all datasets, use percent
# df = pd.DataFrame(results)
# mean_scores = df.groupby("Detector").mean() * 100
# print(mean_scores.sort_values("AUROC").to_csv(float_format="%.2f"))

STAGE 3: Evaluating 8 detectors on 5 datasets.
> Evaluating Entropy
--> Textures
Textures Textures
tensor([3.8212e-05], device='cuda:0')
--> TinyImageNetCrop
TinyImageNetCrop TinyImageNetCrop
tensor([3.8212e-05], device='cuda:0')
--> TinyImageNetResize
TinyImageNetResize TinyImageNetResize
tensor([3.8212e-05], device='cuda:0')
--> LSUNCrop
LSUNCrop LSUNCrop
tensor([3.8212e-05], device='cuda:0')
--> LSUNResize
LSUNResize LSUNResize
tensor([3.8212e-05], device='cuda:0')
> Evaluating ViM
--> Textures
Textures Textures
tensor([-4.0757])
--> TinyImageNetCrop
TinyImageNetCrop TinyImageNetCrop
tensor([-4.0757])
--> TinyImageNetResize
TinyImageNetResize TinyImageNetResize
tensor([-4.0757])
--> LSUNCrop
LSUNCrop LSUNCrop
tensor([-4.0757])
--> LSUNResize
LSUNResize LSUNResize
tensor([-4.0757])
> Evaluating Mahalanobis
--> Textures
Textures Textures
tensor([0.0012], device='cuda:0')
--> TinyImageNetCrop
TinyImageNetCrop TinyImageNetCrop
tensor([0.0012], device='cuda:0')
--> TinyImageNetResize
Tin