Skip to content

Commit

Permalink
[SPARK-9077] [MLLIB] Improve error message for decision trees when nu…
Browse files Browse the repository at this point in the history
…mExamples < maxCategoriesPerFeature

Improve error message when number of examples is less than arity of high-arity categorical feature

CC jkbradley is this about what you had in mind? I know it's a starter, but was on my list to close out in the short term.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#7800 from srowen/SPARK-9077 and squashes the following commits:

b8f6cdb [Sean Owen] Improve error message when number of examples is less than arity of high-arity categorical feature
  • Loading branch information
srowen authored and jkbradley committed Jul 31, 2015
1 parent 351eda0 commit 65fa418
Showing 1 changed file with 6 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -128,9 +128,13 @@ private[spark] object DecisionTreeMetadata extends Logging {
// based on the number of training examples.
if (strategy.categoricalFeaturesInfo.nonEmpty) {
val maxCategoriesPerFeature = strategy.categoricalFeaturesInfo.values.max
val maxCategory =
strategy.categoricalFeaturesInfo.find(_._2 == maxCategoriesPerFeature).get._1
require(maxCategoriesPerFeature <= maxPossibleBins,
s"DecisionTree requires maxBins (= $maxPossibleBins) >= max categories " +
s"in categorical features (= $maxCategoriesPerFeature)")
s"DecisionTree requires maxBins (= $maxPossibleBins) to be at least as large as the " +
s"number of values in each categorical feature, but categorical feature $maxCategory " +
s"has $maxCategoriesPerFeature values. Considering remove this and other categorical " +
"features with a large number of values, or add more training examples.")
}

val unorderedFeatures = new mutable.HashSet[Int]()
Expand Down

0 comments on commit 65fa418

Please sign in to comment.