Tools for working with the MNIST data set
Switch branches/tags
Nothing to show
Clone or download
johnmyleswhite Merge pull request #17 from JuliaML/BaseTestNext
switch to BaseTestNext testing
Latest commit 0747042 Aug 10, 2016


Build Status


This package provides access to the classic MNIST data set of handwritten digits that has been used as a testbed for new machine learning methods. The MNIST data set is included with the package for convenience without any claim of copyright to the images, which are the property of Yann LeCun and Corinna Cortes. The images were downloaded into their original IDX format from and are stored in the data/ directory.

To work with the data, you will typically want to store the digits in Julian matrices. To load the i-th image or label, use:

  • trainfeatures(i)
  • testfeatures(i)
  • trainlabel(i)
  • testlabel(i)

The features will be stored in a 784-entry Float64 vector and the label will be returned as a Float64 scalar.

To access the entire data set at once, use:

  • traindata()
  • testdata()

The traindata method will return a tuple of two items: the first element of the tuple will be a 784x60,000 Float64 matrix containing all of the images in the training set. The second element of the tuple will be a 60,000 entry Float64 vector containing the labels of all of the images in the train set. The testdata method will return an equivalent tuple for the test set, which contains 10,000 images instead of 60,000.

Example Usage

using MNIST
features = trainfeatures(1)
label = trainlabel(1)

trainX, trainY = traindata()
testX, testY = testdata()