Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time

🎉 Update (Aug 2021):

Would you like to test how well your own model performs on challenging generalisation tests, and whether it might even match or outperform human observers? This has never been easier! The comprehensive toolbox at bethgelab:model-vs-human supports all datasets reported here and comes with code to evaluate arbitrary PyTorch / TensorFlow models. Simply load your favourite models, hit run and get a full PDF report on generalisation behaviour including ready-to-use figures!

Data and materials from
"Comparing deep neural networks against humans:
object recognition when the signal gets weaker"


This repository contains information, data and materials from the paper "Comparing deep neural networks against humans: object recognition when the signal gets weaker" by Robert Geirhos, David H. J. Janssen, Heiko H. Schütt, Jonas Rauber, Matthias Bethge, and Felix A. Wichmann.

The article is available at

Please don't hesitate to contact me at or open an issue in case there is any question!

This README is structured according to the repo's structure: one section per subdirectory (alphabetically).


Contains a .txt file with a mapping from all 16 employed entry-level MS COCO categories to the corresponding (fine-grained) ImageNet classes. Further information is provided in the file itself.


This subdirectory contains all image manipulation code used in our experiments (conversion to grayscale, adding noise, eidolon distortions, ..). The main method of walks you through the various degradations. Note that the eidolon manipulation that we use in one of our experiments is based on the Eidolon github repository, which you will need to download / clone if you would like to use it. We found and fixed a bug in the Python version of the toolbox, for which we created a pull request in August 2016 (Fixed bug in partial coherence #1) which has not (yet?) been merged (as of June 2017). Make sure to collect the files from the pull request as well, otherwise you will get different images!


The data-analysis/ subdirectory contains a main R script, data-analysis.R, which can be used to plot and analyze the data contained in raw-data/. We used R version 3.2.3 for the data analysis.


We preprocessed images from the ILSVRC2012 training database as described in the paper (e.g. we excluded grayscale images). In total we retained 213,555 images. The images/ directory contains a .txt file with the final image names (the ones that were retained). If you would like to obtain the images, check out the ImageNet website. In every experiment, the number of presented images for every entry-level MS COCO category (e.g. dog, car, boat, ...) were exactly the same.



Contains the main MATLAB experiment, object_recognition_experiment.m, as well as a .yaml file for every experiment. In the .yaml file, the specific parameter values used in an experiment are specified (such as the stimulus presentation duration). Some functions depend on our in-house iShow library which can be obtained from here.


Some of the helper functions are based on other people's code, please check out the corresponding files for the copyright notices.


The response screen icons appeared on the response screen, and participants were instructed to click on the corresponding one. The icons were taken from the MS COCO website.

response screen icons


The raw-accuracies/ directory contains a .txt file for each experiment with a table of all accuracies (split by experimental condition and subject/network). This therefore contains the underlying data used for all accuracy plots in the paper, and may be useful, for example, if one would like to generate new plots for comparing other networks to our human observers' accuracies. Note that all accuracies reported in these files are percentages.


This directory contains the raw data for all experiments reported in the paper, including a total number of 39,680 human trials in a controlled lab setting. Every .csv raw data file has a header with the bold categories below, here's what they stand for:

  • subj: for DNNs (Deep Neural Networks), name of network; for human observers: number of subject. This number is consistent across experiments. Note that the subjects were not necessarily given consecutive numbers, therefore it can be the case that e.g. 'subject-04' does not exist in some or all experiments.

  • session: session number

  • trial: trial number

  • rt: reaction time in seconds, or 'NaN' for DNNs

  • object_response: the response given, or 'na' (no answer) if human subjects failed to respond

  • category: the presented category

  • condition: short indicator of the condition of the presented stimulus. Color-experiment: 'cr' for color, 'bw' for grayscale images; contrast-experiment: 'c100', 'c50', ... 'c01' for 100%, 50%, ... 1% nominal contrast; noise-experiment: '0', '0.03', ... '0.9' for noise width; eidolon-experiment: in the form 'a-b-c', indicating:

    • a is the parameter value for 'reach', in {1,2,4,8,...128}
    • b in {0,3,10} for coherence value of 0.0, 0.3, or 1.0
    • c = 10 for grain value of 10.0 (not varied in this experiment)
  • imagename:

e.g. 3841_eid_dnn_1-0-10_knife_10_n03041632_32377.JPEG

This is a concatenation of the following information (separated by '_'):

  1. a four-digit number starting with 0000 for the first image in an experiment; the last image therefore has the number n-1 if n is the number of images in a certain experiment
  2. short code for experiment name, e.g. 'eid' for eidolon-experiment
  3. either e.g. 's01' for 'subject-01', or 'dnn' for DNNs
  4. condition
  5. category (ground truth)
  6. a number (just ignore it)
  7. image identifier in the form a_b.JPEG, with a being the WNID (WordNet ID) of the corresponding synset and b being an integer.


Data and materials from the paper "Comparing deep neural networks against humans: object recognition when the signal gets weaker" (arXiv 2017)








No packages published