Simple MNIST data parser written in Python
Python Shell
Switch branches/tags
Latest commit 8364cb4 Jan 11, 2018 @sorki sorki Merge pull request #17 from graingert/patch-1
enable universal wheels



Simple MNIST and EMNIST data parser written in pure Python.

MNIST is a database of handwritten digits available on EMNIST is an extended MNIST database


  • Python 2 or Python 3


  • git clone

  • cd python-mnist

  • Get MNIST data:

  • Check preview with:

    PYTHONPATH=. ./bin/mnist_preview


Get the package from PyPi:

pip install python-mnist

or install with

python install

Code sample:

from mnist import MNIST
mndata = MNIST('./dir_with_mnist_data_files')
images, labels = mndata.load_training()

To enable loading of gzip-ed files use:

mndata.gz = True


  • Get EMNIST data:

  • Check preview with:

    PYTHONPATH=. ./bin/emnist_preview

To use EMNIST datasets you need to call:


Where digits is one of the available EMNIST datasets. You can choose from

  • balanced
  • byclass
  • bymerge
  • digits
  • letters
  • mnist

EMNIST loader uses gziped files by default, this can be disabled by by setting:

mndata.gz = False

You also need to unpack EMNIST files as script won't do it for you. EMNIST loader also needs to mirror and rotate images so it is a bit slower (If this is an issue for you, you should repack the data to avoid mirroring and rotation on each load).


This package doesn't use numpy by design as when I've tried to find a working implementation all of them were based on some archaic version of numpy and none of them worked. This loads data files with struct.unpack instead.