Skip to content

Commit

Permalink
Remove for loop from datasets.supervised.SupervisedDataSet.splitWithP…
Browse files Browse the repository at this point in the history
…roportion

Now splitWithProporion uses numpy array indicies with numpy.random.permutation instead of for loop, before this change on large datasets this method was very slow, now its finish almost instant.
  • Loading branch information
Nihn committed Jan 5, 2015
1 parent 91f66f6 commit 2f02b8d
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions pybrain/datasets/supervised.py
Expand Up @@ -2,6 +2,7 @@

__author__ = 'Thomas Rueckstiess, ruecksti@in.tum.de'

from numpy import random
from random import sample
from scipy import isscalar

Expand Down Expand Up @@ -104,16 +105,15 @@ def evaluateModuleMSE(self, module, averageOver = 1, **args):
def splitWithProportion(self, proportion = 0.5):
"""Produce two new datasets, the first one containing the fraction given
by `proportion` of the samples."""
leftIndices = set(sample(list(range(len(self))), int(len(self)*proportion)))
leftDs = self.copy()
leftDs.clear()
rightDs = leftDs.copy()
index = 0
for sp in self:
if index in leftIndices:
leftDs.addSample(*sp)
else:
rightDs.addSample(*sp)
index += 1
indicies = random.permutation(len(self))
separator = int(len(self) * proportion)

leftIndicies = indicies[:separator]
rightIndicies = indicies[separator:]

leftDs = SupervisedDataSet(inp=self['input'][leftIndicies].copy(),
target=self['target'][leftIndicies].copy())
rightDs = SupervisedDataSet(inp=self['input'][rightIndicies].copy(),
target=self['target'][rightIndicies].copy())
return leftDs, rightDs

1 comment on commit 2f02b8d

@attoPascal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit breaks polymorphism: When called on an ClassificationDataSet (as shown in the tutorials) it no longer returns ClassificationDataSets but SupervisedDataSets.

See discussion here: http://stackoverflow.com/questions/27887936/attributeerror-using-pybrain-splitwithportion-object-type-changed/30869317#30869317

And pull request here: #164

Please sign in to comment.