Skip to content
A collection of machine learning datasets for use with Torch7.
Lua
Latest commit fbdccee Mar 12, 2014 @akfidjeland akfidjeland Added missing license
Failed to load latest commit information.
dataset Merge pull request #25 from jucor/invP Sep 1, 2013
test Add binarization to TableDataset and Mnist Apr 8, 2013
.gitignore Adding ignore Sep 10, 2012
CMakeLists.txt Updating cmake file. Dec 6, 2012
LICENSE Added missing license Mar 12, 2014
README.md documentation Jan 16, 2013
TODO Update TODO Jan 22, 2013

README.md

Datasets

A collection of easy to use datasets for training and testing machine learning algorithms with Torch7.

Usage

require('dataset/mnist')
m = Mnist.dataset()
d:size()                      -- => 60000
d:sample(100)                 -- => {data = tensor, class = label}

-- scale values between [0,1] (by default they are in the range [0,255])
m = dataset.Mnist({scale = {0, 1}})

-- or normalize (subtract mean and divide by std)
m = dataset.Mnist({normalize = true})

-- only import a subset of the data (imports full 60,000 samples otherwise),
-- sorted by class label
m = dataset.Mnist({size = 1000, sort = true})

To process a randomly shuffled ordering of the dataset:

for sample in m:sampler() do
  net:forward(sample.data)
end

Or access mini batches:

local batch = m:mini_batch(1)

-- or use directly
net:forward(m:mini_batch(1).data)

-- set the batch size using an options table
local batch = m:mini_batch(1, {size = 100})

To process the full dataset in randomly shuffled mini-batches:

for batch in m:mini_batches() do
   net:forward(batch.data)
end

Generate animations over 10 frames for each sample, which will randomly rotate, translate, and/or zoom within the ranges passed.

local anim_options = {
    frames      = 10,
    rotation    = {-20, 20},
    translation = {-5, 5, -5, 5},
    zoom        = {0.6, 1.4}
 }
 s = dataset:sampler({animate = anim_options})

Standard pipeline options can be used to add post-processing stages (e.g. binarize and flatten):

 s = dataset:sampler({pad = 5, binarize = true, flatten = true})

Pass a custom pipeline for processing samples:

 s = dataset:sampler({pipeline = my_pipeline})

Create a dataset from bunch of images in a directory

 require 'datset/imageset'
 d = ImageSet.dataset({dir='your-data-directory'})
 while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end

Create a dataset from bunch of videos in a directory

 require 'datset/videoset'
 d = VideoSet.dataset({dir='KTH'})
 while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end
Something went wrong with that request. Please try again.