<a href="https://colab.research.google.com/github/mwroffo/OpenOOD/blob/main/openood_evaluator_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we will showcase how to use OpenOOD's unified evaluator to easily evaluate OOD detection performance w.r.t. a certain ID dataset (e.g., CIFAR-10, ImageNet-1K), given a trained base classifier (e.g., ResNet) and a postprocessor (e.g., MSP, ReAct). Here we use CIFAR-10 as an example (due to speed concern), yet this tutorial also generalizes to larger datasets including ImageNet-1K. **Remember to use GPU in Edit / Notebook settings.**

1. Install OpenOOD with pip and make necessary preparation

In [None]:
!pip install git+https://github.com/Jingkang50/OpenOOD

Collecting git+https://github.com/zjysteven/OpenOOD
  Cloning https://github.com/zjysteven/OpenOOD to /tmp/pip-req-build-l63u9xuh
  Running command git clone --filter=blob:none --quiet https://github.com/zjysteven/OpenOOD /tmp/pip-req-build-l63u9xuh
  Resolved https://github.com/zjysteven/OpenOOD to commit 6e0ade72ea2d959e0e5c4e5b5c9ef0cba60d4ab2
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting json5 (from openood==1.5)
  Downloading json5-0.9.14-py2.py3-none-any.whl (19 kB)
Collecting pre-commit (from openood==1.5)
  Downloading pre_commit-3.4.0-py2.py3-none-any.whl (203 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.7/203.7 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
Collecting diffdist>=0.1 (from openood==1.5)
  Downloading diffdist-0.1.tar.gz (4.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting faiss-gpu>=1.7.2 (from openood==1.5)
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85

In [None]:
# necessary imports
import torch

from openood.evaluation_api import Evaluator
from openood.networks import ResNet18_32x32 # just a wrapper around the ResNet

In [None]:
# download our pre-trained CIFAR-10 classifier
!gdown 1byGeYxM_PlLjT72wZsMQvP6popJeWBgt
!unzip cifar10_res18_v1.5.zip

Downloading...
From (uriginal): https://drive.google.com/uc?id=1byGeYxM_PlLjT72wZsMQvP6popJeWBgt
From (redirected): https://drive.google.com/uc?id=1byGeYxM_PlLjT72wZsMQvP6popJeWBgt&confirm=t&uuid=8b644d3e-a9d8-4ac0-b5fe-7f6d0f2aafbe
To: /content/cifar10_res18_v1.5.zip
100% 375M/375M [00:07<00:00, 46.9MB/s]
Archive:  cifar10_res18_v1.5.zip
   creating: cifar10_resnet18_32x32_base_e100_lr0.1_default/
   creating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/best_epoch99_acc0.9450.ckpt  
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/config.yml  
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/best.ckpt  
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/last_epoch100_acc0.9420.ckpt  
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/log.txt  
   creating: cifar10_resnet18_32x32_base_e100_lr0.1_default/s1/
  inflating: cifar10_resnet18_32x32_base_e100_lr0.1_default

In [None]:
# load the model
net = ResNet18_32x32(num_classes=10)
net.load_state_dict(
    torch.load('./cifar10_resnet18_32x32_base_e100_lr0.1_default/s0/best.ckpt')
)
net.cuda()
net.eval();

In [None]:
#@title choose an implemented postprocessor
postprocessor_name = "react" #@param ["openmax", "msp", "temp_scaling", "odin", "mds", "mds_ensemble", "rmds", "gram", "ebo", "gradnorm", "react", "mls", "klm", "vim", "knn", "dice", "rankfeat", "ash", "she"] {allow-input: true}

2. Initialize an evaluator instance

In [None]:
# Notes:
# 1) The evaluator will automatically download the required datasets given the
# ID dataset specified by `id_name`

# 2) Passing the `postprocessor_name` will use an implemented postprocessor. To
# use your own postprocessor, just make sure that it inherits the BasePostprocessor
# class (see openood/postprocessors/base_postprocessor.py) and pass it to the
# `postprocessor` argument.

# 3) `config_root` points to the directory with OpenOOD's configurations for the
# postprocessors. By default the evaluator will look for the configs that come
# with the OpenOOD module. If you want to use custom configs, clone the repo locally
# and make modifications to OpenOOD/configs.

# 4) As you will see when executing this cell, during the initialization the evaluator
# will automatically run hyperparameter search on ID/OOD validation data (if applicable).
# If you want to use a postprocessor with specific hyperparams, you need
# to clone the OpenOOD repo (or just download the configs folder in the repo).
# Then a) specify the hyperparams and b) set APS_mode to False in the respective postprocessor
# config.

evaluator = Evaluator(
    net,
    id_name='cifar10',                     # the target ID dataset
    data_root='./data',                    # change if necessary
    config_root=None,                      # see notes above
    preprocessor=None,                     # default preprocessing for the target ID dataset
    postprocessor_name=postprocessor_name, # the postprocessor to use
    postprocessor=None,                    # if you want to use your own postprocessor
    batch_size=200,                        # for certain methods the results can be slightly affected by batch size
    shuffle=False,
    num_workers=2)                         # could use more num_workers outside colab

Downloading...
From (uriginal): https://drive.google.com/uc?id=1XKzBdWCqg3vPoj-D32YixJyJJ0hL63gP
From (redirected): https://drive.google.com/uc?id=1XKzBdWCqg3vPoj-D32YixJyJJ0hL63gP&confirm=t&uuid=83ab2219-3a64-4299-b386-419e14f547bf
To: /content/data/benchmark_imglist.zip
100%|██████████| 27.7M/27.7M [00:00<00:00, 31.7MB/s]


cifar10 needs download:
./data/images_classic/cifar10


Downloading...
From (uriginal): https://drive.google.com/uc?id=1Co32RiiWe16lTaiOU6JMMnyUYS41IlO1
From (redirected): https://drive.google.com/uc?id=1Co32RiiWe16lTaiOU6JMMnyUYS41IlO1&confirm=t&uuid=d6e9ec36-4b9f-4321-a5f3-be224073d670
To: /content/data/images_classic/cifar10/cifar10.zip
100%|██████████| 143M/143M [00:00<00:00, 168MB/s]


cifar100 needs download:
./data/images_classic/cifar100


Downloading...
From (uriginal): https://drive.google.com/uc?id=1PGKheHUsf29leJPPGuXqzLBMwl8qMF8_
From (redirected): https://drive.google.com/uc?id=1PGKheHUsf29leJPPGuXqzLBMwl8qMF8_&confirm=t&uuid=e4d71dfe-96d8-4f0f-acdc-350b19bfa39c
To: /content/data/images_classic/cifar100/cifar100.zip
100%|██████████| 141M/141M [00:00<00:00, 189MB/s]


tin needs download:
./data/images_classic/tin


Downloading...
From (uriginal): https://drive.google.com/uc?id=1PZ-ixyx52U989IKsMA2OT-24fToTrelC
From (redirected): https://drive.google.com/uc?id=1PZ-ixyx52U989IKsMA2OT-24fToTrelC&confirm=t&uuid=be715787-f2a4-4dc2-a643-384593094ac1
To: /content/data/images_classic/tin/tin.zip
100%|██████████| 237M/237M [00:01<00:00, 203MB/s]


mnist needs download:
./data/images_classic/mnist


Downloading...
From (uriginal): https://drive.google.com/uc?id=1CCHAGWqA1KJTFFswuF9cbhmB-j98Y1Sb
From (redirected): https://drive.google.com/uc?id=1CCHAGWqA1KJTFFswuF9cbhmB-j98Y1Sb&confirm=t&uuid=98cad1e0-d628-492b-9b6c-32bb15c7fa37
To: /content/data/images_classic/mnist/mnist.zip
100%|██████████| 47.2M/47.2M [00:00<00:00, 120MB/s]


svhn needs download:
./data/images_classic/svhn


Downloading...
From: https://drive.google.com/uc?id=1DQfc11HOtB1nEwqS4pWUFp8vtQ3DczvI
To: /content/data/images_classic/svhn/svhn.zip
100%|██████████| 19.0M/19.0M [00:00<00:00, 136MB/s] 


texture needs download:
./data/images_classic/texture


Downloading...
From (uriginal): https://drive.google.com/uc?id=1OSz1m3hHfVWbRdmMwKbUzoU8Hg9UKcam
From (redirected): https://drive.google.com/uc?id=1OSz1m3hHfVWbRdmMwKbUzoU8Hg9UKcam&confirm=t&uuid=eac96e88-2402-41f0-83d7-0e216df6ded1
To: /content/data/images_classic/texture/texture.zip
100%|██████████| 626M/626M [00:08<00:00, 75.5MB/s]


places365 needs download:
./data/images_classic/places365


Downloading...
From (uriginal): https://drive.google.com/uc?id=1Ec-LRSTf6u5vEctKX9vRp9OA6tqnJ0Ay
From (redirected): https://drive.google.com/uc?id=1Ec-LRSTf6u5vEctKX9vRp9OA6tqnJ0Ay&confirm=t&uuid=9d7baa72-86de-4148-9108-1a30c404533f
To: /content/data/images_classic/places365/places365.zip
100%|██████████| 497M/497M [00:07<00:00, 69.5MB/s]
Setup: 100%|██████████| 5/5 [00:07<00:00,  1.60s/it]


Starting automatic parameter search...
Threshold at percentile 85 over id data is: 0.356078790128231


100%|██████████| 5/5 [00:01<00:00,  4.75it/s]
100%|██████████| 5/5 [00:01<00:00,  3.20it/s]


Hyperparam: [85], auroc: 0.8205529999999999
Threshold at percentile 90 over id data is: 0.45225103199481975


100%|██████████| 5/5 [00:01<00:00,  3.86it/s]
100%|██████████| 5/5 [00:01<00:00,  2.90it/s]


Hyperparam: [90], auroc: 0.839714
Threshold at percentile 95 over id data is: 0.621114119887352


100%|██████████| 5/5 [00:01<00:00,  3.01it/s]
100%|██████████| 5/5 [00:01<00:00,  3.96it/s]


Hyperparam: [95], auroc: 0.860705
Threshold at percentile 99 over id data is: 1.0516026592254641


100%|██████████| 5/5 [00:01<00:00,  4.86it/s]
100%|██████████| 5/5 [00:01<00:00,  4.43it/s]

Hyperparam: [99], auroc: 0.8794759999999999
Threshold at percentile 99 over id data is: 1.0516026592254641
Final hyperparam: 99





3. Evaluate

In [None]:
# let's do standard OOD detection
# full-spectrum detection is also available with
# `fsood` being True if you are evaluating on ImageNet

# the returned metrics is a dataframe which includes
# AUROC, AUPR, FPR@95 etc.
metrics = evaluator.eval_ood(fsood=False)

Performing inference on cifar10 test set...


100%|██████████| 45/45 [00:10<00:00,  4.46it/s]

Processing near ood...
Performing inference on cifar100 dataset...



100%|██████████| 45/45 [00:10<00:00,  4.29it/s]

Computing metrics on cifar100 dataset...
FPR@95: 75.51, AUROC: 85.24 AUPR_IN: 85.83, AUPR_OUT: 80.67





──────────────────────────────────────────────────────────────────────

Performing inference on tin dataset...


100%|██████████| 39/39 [00:10<00:00,  3.70it/s]

Computing metrics on tin dataset...
FPR@95: 67.63, AUROC: 87.70 AUPR_IN: 86.93, AUPR_OUT: 85.16
──────────────────────────────────────────────────────────────────────

Computing mean metrics...
FPR@95: 71.57, AUROC: 86.47 AUPR_IN: 86.38, AUPR_OUT: 82.91
──────────────────────────────────────────────────────────────────────

Processing far ood...
Performing inference on mnist dataset...



100%|██████████| 350/350 [01:18<00:00,  4.47it/s]

Computing metrics on mnist dataset...
FPR@95: 18.42, AUROC: 95.38 AUPR_IN: 99.31, AUPR_OUT: 75.87
──────────────────────────────────────────────────────────────────────

Performing inference on svhn dataset...



100%|██████████| 131/131 [00:27<00:00,  4.77it/s]

Computing metrics on svhn dataset...
FPR@95: 44.10, AUROC: 90.01 AUPR_IN: 95.46, AUPR_OUT: 75.58
──────────────────────────────────────────────────────────────────────

Performing inference on texture dataset...



100%|██████████| 29/29 [00:29<00:00,  1.02s/it]

Computing metrics on texture dataset...
FPR@95: 67.37, AUROC: 87.27 AUPR_IN: 82.30, AUPR_OUT: 88.35
──────────────────────────────────────────────────────────────────────






Performing inference on places365 dataset...


100%|██████████| 176/176 [01:11<00:00,  2.46it/s]

Computing metrics on places365 dataset...
FPR@95: 39.76, AUROC: 91.40 AUPR_IN: 97.39, AUPR_OUT: 71.91
──────────────────────────────────────────────────────────────────────

Computing mean metrics...
FPR@95: 42.41, AUROC: 91.02 AUPR_IN: 93.61, AUPR_OUT: 77.93
──────────────────────────────────────────────────────────────────────




ID Acc Eval: 100%|██████████| 45/45 [00:08<00:00,  5.36it/s]

           FPR@95  AUROC  AUPR_IN  AUPR_OUT   ACC
cifar100    75.51  85.24    85.83     80.67 95.22
tin         67.63  87.70    86.93     85.16 95.22
nearood     71.57  86.47    86.38     82.91 95.22
mnist       18.42  95.38    99.31     75.87 95.22
svhn        44.10  90.01    95.46     75.58 95.22
texture     67.37  87.27    82.30     88.35 95.22
places365   39.76  91.40    97.39     71.91 95.22
farood      42.41  91.02    93.61     77.93 95.22





4. What you can get from this evaluator

In [None]:
# there is some useful information stored as attributes
# of the evaluator instance

# evaluator.metrics stores all the evaluation results
# evaluator.scores stores OOD scores and ID predictions

# for more details please see OpenOOD/openood/evaluation_api/evaluator.py

print('Components within evaluator.metrics:\t', evaluator.metrics.keys())
print('Components within evaluator.scores:\t', evaluator.scores.keys())
print('')
print('The predicted ID class of the first 5 samples of CIFAR-100:\t', evaluator.scores['ood']['near']['cifar100'][0][:5])
print('The OOD score of the first 5 samples of CIFAR-100:\t', evaluator.scores['ood']['near']['cifar100'][1][:5])

Components within evaluator.metrics:	 dict_keys(['id_acc', 'csid_acc', 'ood', 'fsood'])
Components within evaluator.scores:	 dict_keys(['id', 'csid', 'ood', 'id_preds', 'id_labels', 'csid_preds', 'csid_labels'])

The predicted ID class of the first 5 samples of CIFAR-100:	 [9 9 9 9 9]
The OOD score of the first 5 samples of CIFAR-100:	 [5.153 5.214 6.402 6.655 5.155]


5. Extending OpenOOD for your own research/development

We try to make OpenOOD extensible and convenient for everyone.


You can evaluate your own trained model as long as it has necessary functions/methods that help it work with the postprocessors (see OpenOOD/openood/resnet18_32x32.py for example).


You can also design your own postprocessor by inheriting the base class (OpenOOD/openood/postprocessors/base_postprocessor.py), and the resulting method can be readily evaluated with OpenOOD.


Feel free to reach out to us if you have furthur suggestions on making OpenOOD more general and easy-to-use!