No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Classifier-agnostic saliency map extraction

Example of using classifier-agnostic saliency map extraction on ImageNet

This repository contains the code originally forked from the ImageNet training in PyTorch that is modified to present the performance of classifier-agnostic saliency map extraction, a practical algorithm to train a classifier-agnostic saliency mapping by simultaneously training a classifier and a saliency mapping. The method was proposed by Konrad Żołna, Krzysztof J. Geras and Kyunghyun Cho.

The authors would like to acknowledge the code review done by Jason Phang (zphang).

Software requirements

  • Python 3.6 and PyTorch 0.4 for training procedure (
  • The bs4 library to run that computes localization metrics described in the paper.
  • The opencv-python library to inpaint images which is needed to run and


ImageNet dataset should be stored in IMAGENET-PATH path and set up in the usual way (separate train and val folders with 1000 subfolders each). See this repo for detailed instructions how to download and set up the dataset.

ImageNet annotations should be in IMAGENET-ANN directory that contains 50000 files named ILSVRC2012_val_<id>.xml where <id> is the validation image id (for example ILSVRC2012_val_00050000.xml). It may be simply obtained by unzipping the official validation bounding box annotations archive to IMAGENET-ANN directory.

A directory CASMS-PATH is used to store all trained models.

How to run the code


The easiest way to train classifier-agnostic saliency mapping (CASM) is to run

  • python3 IMAGENET-PATH --casms-path CASMS-PATH --log-path LOG-PATH --reproduce L

where LOG-PATH is a directory for the log to be saved at. The --reproduce option sets all hyperparameters linked with a given thinning strategy to reproduce results from the paper (possible options F|L|FL|L100|L1000, see the paper for details).

For the comparison, one can try --reproduce F which results in training classifier-dependent saliency mapping (called Baseline in the paper).

Object localization (and basic statistics)

Once the saliency mappings are trained and stored in CASMS-PATH one can run

  • python3 IMAGENET-PATH --annotation-path IMAGENET-ANN --casms-path CASMS-PATH or
  • python3 IMAGENET-PATH --annotation-path IMAGENET-ANN --casms-path CASMS-PATH --log-path LOG-PATH --save-to-file

where LOG-PATH is again the directory for logs to be saved at (note that the directory may be a different than the one used to store training logs since the training logs are not used in the evaluation).

For each CASM in CASMS-PATH the basic statistics and localization metrics are computed and (if --log-path LOG-PATH --save-to-file is used) saved as a separate file in LOG-PATH.

Score your own architecture

The script uses two functions load_model and get_masks_and_check_predictions from It should be simple to reimplement them to score a new architecture. Since converting a Torch Tensor to a NumPy array and vice versa is a breeze, the get_masks_and_check_predictions is implemented in the way that inputs and outputs are NumPy arrays to make the adjustment procedure to a new model even simpler.

The function get_masks_and_check_predictions takes as an input a batch of images, corresponding ground truth targets and the dictionary describing the model (that is, an output of the function load_model which takes the path to the model as an argument). By default, the images are normalized which can be deactivated with --not-normalize flag if one prefers to use its own normalization. The output of get_masks_and_check_predictions are three NumPy arrays.

  • Batch of continuous masks (size: BATCH_SIZEx224x224). This array contains masks predicted for the entire 224x224 pixel images (values between zero and one).
  • Batch of rectangular predictions (size: BATCH_SIZEx224x224) corresponding to object localizations, that are necessary to compute the scores. Each rectangle is obtained from continuous mask and is represented by a block of ones on the background of zeros.
  • NumPy array consisting of BATCH_SIZE binary values. The value in the array corresponding to a given images is one if the classifier makes a correct prediction for this image or zero otherwise. This array is necessary to compute the OM metric (see the paper for details).

Classification by multiple classifiers

Similarly to one can run

  • python3 IMAGENET-PATH --casms-path CASMS-PATH or
  • python3 IMAGENET-PATH --casms-path CASMS-PATH --log-path LOG-PATH --save-to-file

to get classification accuracy for modified images (masked-in, masked-out and inpainted masked-out images, see the paper for definitions).

The option --resnets-path RESNETS-PATH can be used to load pre-trained classifiers from the RESNETS-PATH that will be also evaluated. It is assumed that these classifiers are all ResNet-50 and are saved in the format like in this official repository.


To plot visualizations one can run

  • python3 IMAGENET-PATH --casm-path SINGLE-CASM-PATH --plots-path PLOTS-PATH

where SINGLE-CASM-PATH is a path to a single CASM (not a directory) and PLOTS-PATH is a directory where visualizations for a given CASM will be saved.

The exemplary visualization is below (see the caption of Figure 2 from the paper for the description).


Additional options

There are a few hyper-parameters of the training procedure. Running

  • python3 --help

displays the full list of them. The same works for, and


If you found this code useful, please cite the following paper:

Konrad Żołna, Krzysztof J. Geras, Kyunghyun Cho. "Classifier-agnostic saliency map extraction." arXiv preprint arXiv:1805.08249 (2018).

  title={Classifier-agnostic saliency map extraction},
  author={Zolna, Konrad and Geras, Krzysztof J and Cho, Kyunghyun},
  journal={arXiv preprint arXiv:1805.08249,