A collection of machine learning datasets for use with Torch7.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
dataset
test
.gitignore
CMakeLists.txt
LICENSE
README.md
TODO

README.md

Datasets

A collection of easy to use datasets for training and testing machine learning algorithms with Torch7.

Usage

require('dataset/mnist')
m = Mnist.dataset()
d:size()                      -- => 60000
d:sample(100)                 -- => {data = tensor, class = label}

-- scale values between [0,1] (by default they are in the range [0,255])
m = dataset.Mnist({scale = {0, 1}})

-- or normalize (subtract mean and divide by std)
m = dataset.Mnist({normalize = true})

-- only import a subset of the data (imports full 60,000 samples otherwise),
-- sorted by class label
m = dataset.Mnist({size = 1000, sort = true})

To process a randomly shuffled ordering of the dataset:

for sample in m:sampler() do
  net:forward(sample.data)
end

Or access mini batches:

local batch = m:mini_batch(1)

-- or use directly
net:forward(m:mini_batch(1).data)

-- set the batch size using an options table
local batch = m:mini_batch(1, {size = 100})

To process the full dataset in randomly shuffled mini-batches:

for batch in m:mini_batches() do
   net:forward(batch.data)
end

Generate animations over 10 frames for each sample, which will randomly rotate, translate, and/or zoom within the ranges passed.

local anim_options = {
    frames      = 10,
    rotation    = {-20, 20},
    translation = {-5, 5, -5, 5},
    zoom        = {0.6, 1.4}
 }
 s = dataset:sampler({animate = anim_options})

Standard pipeline options can be used to add post-processing stages (e.g. binarize and flatten):

 s = dataset:sampler({pad = 5, binarize = true, flatten = true})

Pass a custom pipeline for processing samples:

 s = dataset:sampler({pipeline = my_pipeline})

Create a dataset from bunch of images in a directory

 require 'datset/imageset'
 d = ImageSet.dataset({dir='your-data-directory'})
 while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end

Create a dataset from bunch of videos in a directory

 require 'datset/videoset'
 d = VideoSet.dataset({dir='KTH'})
 while true do w=image.display({image=d().data,win=w}) util.sleep(1/10) end