Skip to content

SVHN-Remix Dataset for our NeurIPS 2023 DistShift Workshop Paper: "The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch"

Notifications You must be signed in to change notification settings

jzenn/svhn-remix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch

arxiv-link Project Page Download

Tim Z. Xiao*·Johannes Zenn*·Robert Bamler
*Equal contribution, order determined by coin flip.

About The Project

This is the official GitHub repository for our NeurIPS 2023 DistShift Workshop paper The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch where we propose a new split for the SVHN dataset that does not suffer from distribution mismatch. Visit the project page and download SVHN-Remix dataset or split.

The Street View House Numbers (SVHN) dataset (Netzer et al., 2011) is a popular benchmark dataset in deep learning. Originally designed for digit classification tasks, the SVHN dataset has been widely used as a benchmark for various other tasks including generative modeling. However, with this work, we aim to warn the community about an issue of the SVHN dataset as a benchmark for generative modeling tasks: we discover that the official split into training set and test set of the SVHN dataset are not drawn from the same distribution. We empirically show that this distribution mismatch has little impact on the classification task (which may explain why this issue has not been detected before), but it severely affects the evaluation of probabilistic generative models, such as Variational Autoencoders and diffusion models. As a workaround, we propose to mix and re-split the official training and test set when SVHN is used for tasks other than classification. We publish a new split and the indices we used to create it at https://jzenn.github.io/svhn-remix/.

Citation:

Following is the Bibtex if you would like to cite our paper :

@article{xiao2023the,
  title={The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch},
  author={Xiao, Tim Z. and Zenn, Johannes and Bamler, Robert},
  journal={NeurIPS 2023 Workshop on Distribution Shifts},
  year={2023}
}

(back to top)

About

SVHN-Remix Dataset for our NeurIPS 2023 DistShift Workshop Paper: "The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch"

Resources

Stars

Watchers

Forks