<a href="https://colab.research.google.com/github/mwroffo/OpenOOD/blob/main/openood_evaluator_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we will showcase how to use OpenOOD's unified evaluator to easily evaluate OOD detection performance w.r.t. a certain ID dataset (e.g., CIFAR-10, ImageNet-1K), given a trained base classifier (e.g., ResNet) and a postprocessor (e.g., MSP, ReAct). Here we use CIFAR-10 as an example (due to speed concern), yet this tutorial also generalizes to larger datasets including ImageNet-1K. **Remember to use GPU in Edit / Notebook settings.**

Expect Python 3.11.11

In [6]:
!python --version

Python 3.11.11


1. Install OpenOOD with pip and make necessary preparation

In [2]:
!pip install git+https://github.com/mwroffo/OpenOOD

Collecting git+https://github.com/mwroffo/OpenOOD
  Cloning https://github.com/mwroffo/OpenOOD to /tmp/pip-req-build-2k4uko_y
  Running command git clone --filter=blob:none --quiet https://github.com/mwroffo/OpenOOD /tmp/pip-req-build-2k4uko_y
  Resolved https://github.com/mwroffo/OpenOOD to commit b4b2fe9d4407c3679902e6f27707160ca4ac766c
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting json5 (from openood==1.5)
  Using cached json5-0.12.0-py3-none-any.whl.metadata (36 kB)
Collecting pre-commit (from openood==1.5)
  Using cached pre_commit-4.2.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting imgaug>=0.4.0 (from openood==1.5)
  Using cached imgaug-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting diffdist>=0.1 (from openood==1.5)
  Using cached diffdist-0.1.tar.gz (4.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting faiss-cpu>=1.7.2 (from openood==1.5)
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Colle

In [3]:
!python3 -m pip install libmr

Collecting libmr
  Downloading libmr-0.1.9.zip (39 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: libmr
  Building wheel for libmr (setup.py) ... [?25l[?25hdone
  Created wheel for libmr: filename=libmr-0.1.9-cp311-cp311-linux_x86_64.whl size=576576 sha256=914a3757efd4edaad5cc394f5245cbd23624d5b23c2cdc3eb276897a44580a49
  Stored in directory: /root/.cache/pip/wheels/08/f2/e3/4ca7c4be959762da3d5e817e45880d6febaa557877c56ee83c
Successfully built libmr
Installing collected packages: libmr
Successfully installed libmr-0.1.9


In [4]:
# necessary imports
import torch

from openood.evaluation_api import Evaluator
from openood.networks import ResNet18_32x32 # just a wrapper around the ResNet

In [5]:
# download our pre-trained CIFAR-10 classifier
!gdown 1byGeYxM_PlLjT72wZsMQvP6popJeWBgt
!unzip cifar10_res18_v1.5.zip

Downloading...
From (original): https://drive.google.com/uc?id=1byGeYxM_PlLjT72wZsMQvP6popJeWBgt
From (redirected): https://drive.google.com/uc?id=1byGeYxM_PlLjT72wZsMQvP6popJeWBgt&confirm=t&uuid=3d548224-d625-43b3-ac23-da6fcc741f05
To: /content/cifar10_res18_v1.5.zip
100% 375M/375M [00:02<00:00, 144MB/s]
Archive:  cifar10_res18_v1.5.zip
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/best_epoch99_acc0.9450.ckpt? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/config.yml? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/best.ckpt? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/last_epoch100_acc0.9420.ckpt? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s2/log.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace cifar10_resnet18_32x32_base_e100_lr0.1_default/s1/config.yml? [y]es, [n

In [7]:
# load the model
net = ResNet18_32x32(num_classes=10)
net.load_state_dict(
    torch.load('./cifar10_resnet18_32x32_base_e100_lr0.1_default/s0/best.ckpt')
)
net.cuda()
net.eval();

In [8]:
#@title choose an implemented postprocessor
postprocessor_name = "react" #@param ["openmax", "msp", "temp_scaling", "odin", "mds", "mds_ensemble", "rmds", "gram", "ebo", "gradnorm", "react", "mls", "klm", "vim", "knn", "dice", "rankfeat", "ash", "she"] {allow-input: true}

2. Initialize an evaluator instance

In [9]:
# Notes:
# 1) The evaluator will automatically download the required datasets given the
# ID dataset specified by `id_name`

# 2) Passing the `postprocessor_name` will use an implemented postprocessor. To
# use your own postprocessor, just make sure that it inherits the BasePostprocessor
# class (see openood/postprocessors/base_postprocessor.py) and pass it to the
# `postprocessor` argument.

# 3) `config_root` points to the directory with OpenOOD's configurations for the
# postprocessors. By default the evaluator will look for the configs that come
# with the OpenOOD module. If you want to use custom configs, clone the repo locally
# and make modifications to OpenOOD/configs.

# 4) As you will see when executing this cell, during the initialization the evaluator
# will automatically run hyperparameter search on ID/OOD validation data (if applicable).
# If you want to use a postprocessor with specific hyperparams, you need
# to clone the OpenOOD repo (or just download the configs folder in the repo).
# Then a) specify the hyperparams and b) set APS_mode to False in the respective postprocessor
# config.

evaluator = Evaluator(
    net,
    id_name='cifar10',                     # the target ID dataset
    data_root='./data',                    # change if necessary
    config_root=None,                      # see notes above
    preprocessor=None,                     # default preprocessing for the target ID dataset
    postprocessor_name=postprocessor_name, # the postprocessor to use
    postprocessor=None,                    # if you want to use your own postprocessor
    batch_size=200,                        # for certain methods the results can be slightly affected by batch size
    shuffle=False,
    num_workers=2)                         # could use more num_workers outside colab

Downloading...
From (original): https://drive.google.com/uc?id=1lI1j0_fDDvjIt9JlWAw09X8ks-yrR_H1
From (redirected): https://drive.google.com/uc?id=1lI1j0_fDDvjIt9JlWAw09X8ks-yrR_H1&confirm=t&uuid=1d4ef9d2-de3d-4939-acbf-3b021fc9a3d9
To: /content/data/benchmark_imglist.zip
100%|██████████| 28.0M/28.0M [00:00<00:00, 47.4MB/s]


cifar10 needs download:
./data/images_classic/cifar10


Downloading...
From (original): https://drive.google.com/uc?id=1Co32RiiWe16lTaiOU6JMMnyUYS41IlO1
From (redirected): https://drive.google.com/uc?id=1Co32RiiWe16lTaiOU6JMMnyUYS41IlO1&confirm=t&uuid=ebd192f5-0ea9-4d50-ba1b-fb2a6490d783
To: /content/data/images_classic/cifar10/cifar10.zip
100%|██████████| 143M/143M [00:01<00:00, 102MB/s]


cifar100 needs download:
./data/images_classic/cifar100


Downloading...
From (original): https://drive.google.com/uc?id=1PGKheHUsf29leJPPGuXqzLBMwl8qMF8_
From (redirected): https://drive.google.com/uc?id=1PGKheHUsf29leJPPGuXqzLBMwl8qMF8_&confirm=t&uuid=2ccf54a9-5dd7-429d-9259-c559c20003d7
To: /content/data/images_classic/cifar100/cifar100.zip
100%|██████████| 141M/141M [00:03<00:00, 37.2MB/s]


tin needs download:
./data/images_classic/tin


Downloading...
From (original): https://drive.google.com/uc?id=1PZ-ixyx52U989IKsMA2OT-24fToTrelC
From (redirected): https://drive.google.com/uc?id=1PZ-ixyx52U989IKsMA2OT-24fToTrelC&confirm=t&uuid=7b250584-963f-44d5-b140-458c97e3c7d8
To: /content/data/images_classic/tin/tin.zip
100%|██████████| 237M/237M [00:04<00:00, 47.7MB/s]


mnist needs download:
./data/images_classic/mnist


Downloading...
From (original): https://drive.google.com/uc?id=1CCHAGWqA1KJTFFswuF9cbhmB-j98Y1Sb
From (redirected): https://drive.google.com/uc?id=1CCHAGWqA1KJTFFswuF9cbhmB-j98Y1Sb&confirm=t&uuid=d906fe56-aba1-49bd-baf4-0549ceb57410
To: /content/data/images_classic/mnist/mnist.zip
100%|██████████| 47.2M/47.2M [00:00<00:00, 103MB/s] 


svhn needs download:
./data/images_classic/svhn


Downloading...
From (original): https://drive.google.com/uc?id=1DQfc11HOtB1nEwqS4pWUFp8vtQ3DczvI
From (redirected): https://drive.google.com/uc?id=1DQfc11HOtB1nEwqS4pWUFp8vtQ3DczvI&confirm=t&uuid=3e8b8180-ab66-46d0-a7f4-b61d6c7b6757
To: /content/data/images_classic/svhn/svhn.zip
100%|██████████| 19.0M/19.0M [00:00<00:00, 27.3MB/s]


texture needs download:
./data/images_classic/texture


Downloading...
From (original): https://drive.google.com/uc?id=1OSz1m3hHfVWbRdmMwKbUzoU8Hg9UKcam
From (redirected): https://drive.google.com/uc?id=1OSz1m3hHfVWbRdmMwKbUzoU8Hg9UKcam&confirm=t&uuid=02de9dc4-1eb6-4afe-86e5-10700f07532e
To: /content/data/images_classic/texture/texture.zip
100%|██████████| 626M/626M [00:03<00:00, 158MB/s]


places365 needs download:
./data/images_classic/places365


Downloading...
From (original): https://drive.google.com/uc?id=1Ec-LRSTf6u5vEctKX9vRp9OA6tqnJ0Ay
From (redirected): https://drive.google.com/uc?id=1Ec-LRSTf6u5vEctKX9vRp9OA6tqnJ0Ay&confirm=t&uuid=7f153739-bcaa-4051-bff9-3f6a3d582228
To: /content/data/images_classic/places365/places365.zip
100%|██████████| 497M/497M [00:08<00:00, 62.0MB/s]
Setup: 100%|██████████| 5/5 [00:01<00:00,  2.55it/s]


Starting automatic parameter search...
Threshold at percentile 85 over id data is: 0.356078790128231


100%|██████████| 5/5 [00:01<00:00,  3.87it/s]
100%|██████████| 5/5 [00:01<00:00,  4.13it/s]


Hyperparam: [85], auroc: 0.8205529999999999
Threshold at percentile 90 over id data is: 0.45225103199481975


100%|██████████| 5/5 [00:00<00:00,  5.91it/s]
100%|██████████| 5/5 [00:00<00:00,  5.18it/s]


Hyperparam: [90], auroc: 0.839714
Threshold at percentile 95 over id data is: 0.621114119887352


100%|██████████| 5/5 [00:00<00:00,  6.10it/s]
100%|██████████| 5/5 [00:00<00:00,  5.42it/s]


Hyperparam: [95], auroc: 0.860705
Threshold at percentile 99 over id data is: 1.0516026592254641


100%|██████████| 5/5 [00:00<00:00,  6.19it/s]
100%|██████████| 5/5 [00:00<00:00,  5.43it/s]

Hyperparam: [99], auroc: 0.8794759999999999
Threshold at percentile 99 over id data is: 1.0516026592254641
Final hyperparam: 99





3. Evaluate

In [10]:
# let's do standard OOD detection
# full-spectrum detection is also available with
# `fsood` being True if you are evaluating on ImageNet

# the returned metrics is a dataframe which includes
# AUROC, AUPR, FPR@95 etc.
metrics = evaluator.eval_ood(fsood=False)

Performing inference on cifar10 test set...


100%|██████████| 45/45 [00:07<00:00,  6.23it/s]

Processing near ood...
Performing inference on cifar100 dataset...



100%|██████████| 45/45 [00:06<00:00,  7.42it/s]

Computing metrics on cifar100 dataset...
FPR@95: 75.51, AUROC: 85.24 AUPR_IN: 80.67, AUPR_OUT: 85.83
──────────────────────────────────────────────────────────────────────

Performing inference on tin dataset...



100%|██████████| 39/39 [00:07<00:00,  5.48it/s]

Computing metrics on tin dataset...
FPR@95: 67.63, AUROC: 87.70 AUPR_IN: 85.16, AUPR_OUT: 86.93
──────────────────────────────────────────────────────────────────────

Computing mean metrics...
FPR@95: 71.57, AUROC: 86.47 AUPR_IN: 82.91, AUPR_OUT: 86.38
──────────────────────────────────────────────────────────────────────

Processing far ood...
Performing inference on mnist dataset...



100%|██████████| 350/350 [00:51<00:00,  6.75it/s]

Computing metrics on mnist dataset...
FPR@95: 18.42, AUROC: 95.38 AUPR_IN: 75.87, AUPR_OUT: 99.31
──────────────────────────────────────────────────────────────────────

Performing inference on svhn dataset...



100%|██████████| 131/131 [00:17<00:00,  7.43it/s]

Computing metrics on svhn dataset...
FPR@95: 44.10, AUROC: 90.01 AUPR_IN: 75.58, AUPR_OUT: 95.46
──────────────────────────────────────────────────────────────────────

Performing inference on texture dataset...



100%|██████████| 29/29 [00:25<00:00,  1.15it/s]

Computing metrics on texture dataset...
FPR@95: 67.37, AUROC: 87.27 AUPR_IN: 88.35, AUPR_OUT: 82.30
──────────────────────────────────────────────────────────────────────

Performing inference on places365 dataset...



100%|██████████| 176/176 [00:58<00:00,  3.00it/s]

Computing metrics on places365 dataset...
FPR@95: 39.76, AUROC: 91.40 AUPR_IN: 71.91, AUPR_OUT: 97.39
──────────────────────────────────────────────────────────────────────

Computing mean metrics...
FPR@95: 42.41, AUROC: 91.02 AUPR_IN: 77.93, AUPR_OUT: 93.61
──────────────────────────────────────────────────────────────────────




ID Acc Eval: 100%|██████████| 45/45 [00:05<00:00,  7.72it/s]

           FPR@95  AUROC  AUPR_IN  AUPR_OUT   ACC
cifar100    75.51  85.24    80.67     85.83 95.22
tin         67.63  87.70    85.16     86.93 95.22
nearood     71.57  86.47    82.91     86.38 95.22
mnist       18.42  95.38    75.87     99.31 95.22
svhn        44.10  90.01    75.58     95.46 95.22
texture     67.37  87.27    88.35     82.30 95.22
places365   39.76  91.40    71.91     97.39 95.22
farood      42.41  91.02    77.93     93.61 95.22





4. What you can get from this evaluator

In [11]:
# there is some useful information stored as attributes
# of the evaluator instance

# evaluator.metrics stores all the evaluation results
# evaluator.scores stores OOD scores and ID predictions

# for more details please see OpenOOD/openood/evaluation_api/evaluator.py

print('Components within evaluator.metrics:\t', evaluator.metrics.keys())
print('Components within evaluator.scores:\t', evaluator.scores.keys())
print('')
print('The predicted ID class of the first 5 samples of CIFAR-100:\t', evaluator.scores['ood']['near']['cifar100'][0][:5])
print('The OOD score of the first 5 samples of CIFAR-100:\t', evaluator.scores['ood']['near']['cifar100'][1][:5])

Components within evaluator.metrics:	 dict_keys(['id_acc', 'csid_acc', 'ood', 'fsood'])
Components within evaluator.scores:	 dict_keys(['id', 'csid', 'ood', 'id_preds', 'id_labels', 'csid_preds', 'csid_labels'])

The predicted ID class of the first 5 samples of CIFAR-100:	 [9 9 9 9 9]
The OOD score of the first 5 samples of CIFAR-100:	 [5.153 5.214 6.402 6.655 5.155]


5. Extending OpenOOD for your own research/development

We try to make OpenOOD extensible and convenient for everyone.


You can evaluate your own trained model as long as it has necessary functions/methods that help it work with the postprocessors (see OpenOOD/openood/resnet18_32x32.py for example).


You can also design your own postprocessor by inheriting the base class (OpenOOD/openood/postprocessors/base_postprocessor.py), and the resulting method can be readily evaluated with OpenOOD.


Feel free to reach out to us if you have furthur suggestions on making OpenOOD more general and easy-to-use!