Skip to content

Commit

Permalink
[SPARK-6004][MLlib] Pick the best model when training GradientBoosted…
Browse files Browse the repository at this point in the history
…Trees with validation

Since the validation error does not change monotonically, in practice, it should be proper to pick the best model when training GradientBoostedTrees with validation instead of stopping it early.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#4763 from viirya/gbt_record_model and squashes the following commits:

452e049 [Liang-Chi Hsieh] Address comment.
ea2fae2 [Liang-Chi Hsieh] Pick the best model when training GradientBoostedTrees with validation.
  • Loading branch information
viirya authored and jkbradley committed Feb 26, 2015
1 parent 2358657 commit cfff397
Showing 1 changed file with 9 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,15 @@ object GradientBoostedTrees extends Logging {

logInfo("Internal timing for DecisionTree:")
logInfo(s"$timer")

new GradientBoostedTreesModel(
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
if (validate) {
new GradientBoostedTreesModel(
boostingStrategy.treeStrategy.algo,
baseLearners.slice(0, bestM),
baseLearnerWeights.slice(0, bestM))
} else {
new GradientBoostedTreesModel(
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
}
}

}

0 comments on commit cfff397

Please sign in to comment.