Skip to content
This repository has been archived by the owner on Apr 8, 2021. It is now read-only.

Downsample in-memory splitter #11

Merged
merged 2 commits into from
Dec 20, 2014
Merged

Downsample in-memory splitter #11

merged 2 commits into from
Dec 20, 2014

Conversation

avibryant
Copy link
Contributor

See #9 for more info

@snoble @RyW90

@avibryant avibryant mentioned this pull request Dec 3, 2014
@mlmanapat
Copy link

(lgtm)

@@ -255,17 +256,18 @@ case class Trainer[K: Ordering, V, T: Monoid](
for (
(treeIndex, tree) <- treeMap;
i <- 1.to(sampler.timesInTrainingSet(instance.id, instance.timestamp, treeIndex)).toList;
leaf <- tree.leafFor(instance.features).toList if stopper.canSplit(leaf.target) && !stopper.shouldSplitDistributed(leaf.target) && stopper.shouldSplitLocally(leaf.target)
leaf <- tree.leafFor(instance.features).toList if stopper.shouldSplit(leaf.target) && (r.nextDouble < stopper.samplingRateToSplitLocally(leaf.target))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you have id's for rows (or at least optionally do). Could you use them to seed your random selection for consistency?

avibryant added a commit that referenced this pull request Dec 20, 2014
Downsample in-memory splitter
@avibryant avibryant merged commit 0aabdee into master Dec 20, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants