Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding binaryclassification bin score evaluator #119

Merged
merged 20 commits into from
Sep 14, 2018

Conversation

gokulsfdc
Copy link
Contributor

@gokulsfdc gokulsfdc commented Sep 7, 2018

Related issues
#101

Describe the proposed solution
Creating a evaluator for Binary Classification which provides statistics about the predicted scores. This evaluator creates the specified number of bins and computes the following details for each bin.

  1. Total number of data points.
  2. Average Score.
  3. Average Conversion rate.
  4. Bin Centers.

Overall BrierScore for the scores is also computed, which is a default metric as well.

Added unit tests for the evaluator

*/
case class BinaryClassificationBinMetrics
(
BrierScore: Double,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names should start with a lower case: brierScore

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for some reason all of our metrics do not follow this convention @tovbinm. We should figure out what we want and then make them all consistent.

val predictionLabel = "pred"

val dataset_test = Seq(
(Map("probability_1" -> 0.99999, "probability_0" -> 0.0001, "prediction" -> 1.0), 1.0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Prediction type instead

@RunWith(classOf[JUnitRunner])
class OpBinaryClassifyBinEvaluatorTest extends FlatSpec with TestSparkContext {

val labelName = "label"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a dataframe using TestFeatureBuilder easily. That way you need to define these strings here and create dataframes with spark below in tests.

case class BinaryClassificationBinMetrics
(
BrierScore: Double,
BinCenters: Seq[Double],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json annotations has to added on sequences. See other binclass eval metrics class.

* @param name name of default metric
* @param isLargerBetter is metric better if larger
* @param uid uid for instance
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Perhaps rename to OpBinScoreEvaluator?
  2. Also this evaluator has to be added to the Evaluators factory object as well and BinaryClassEvalMetrics enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This evaluator returns 5 diff values. Should there be 5 factory methods? or only for Brier Score, which is the defaultmetric?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the Brier Score. @leahmcguire wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes only for the brier score - the other metrics are support for the brier score. The brier score is the only metric that could be used for optimization

val metrics = new OpBinaryClassifyBinEvaluator(numBins = 0)
.setLabelCol(labelName).setPredictionCol(predictionLabel).evaluateAll(df)

metrics.BrierScore shouldBe 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case classes has equals implemented so you can simply do metrics shouldBe BinaryClassificationBinMetrics(0.0, Seq(), Seq(), Seq(), Seq()) here and everywhere in tests

metrics.AverageConversionRate shouldBe Seq(0.0, 0.0, 0.0, 1.0)
}

it should "return the empty bin metrics for numBins == 0" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should "error on invalid num of bins"

metrics.AverageConversionRate shouldBe Seq.empty[Double]
}

it should "return the bin metrics for skewed data" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should "evaluate bin metrics for skewed data"

metrics.AverageConversionRate shouldBe Seq(0.0, 0.0, 0.0, 0.0, 1.0)
}

it should "return the default metric as BrierScore" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should "evaluate the default metric as BrierScore"


val emptyDataSet = Seq.empty[(Map[String, Double], Double)]

Spec[OpBinaryClassifyBinEvaluator] should "return the bin metrics" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should "evaluate the bin metrics"

@codecov
Copy link

codecov bot commented Sep 8, 2018

Codecov Report

Merging #119 into master will increase coverage by 0.02%.
The diff coverage is 93.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #119      +/-   ##
==========================================
+ Coverage   86.19%   86.22%   +0.02%     
==========================================
  Files         298      299       +1     
  Lines        9668     9696      +28     
  Branches      329      542     +213     
==========================================
+ Hits         8333     8360      +27     
- Misses       1335     1336       +1
Impacted Files Coverage Δ
...com/salesforce/op/evaluators/OpEvaluatorBase.scala 90.47% <ø> (ø) ⬆️
...c/main/scala/com/salesforce/op/ModelInsights.scala 94.11% <ø> (ø) ⬆️
...cala/com/salesforce/op/evaluators/Evaluators.scala 97.14% <100%> (+0.08%) ⬆️
...sification/BinaryClassificationModelSelector.scala 95.65% <100%> (ø) ⬆️
...op/stages/impl/selector/ModelSelectorSummary.scala 91.3% <66.66%> (-0.84%) ⬇️
...salesforce/op/evaluators/OpBinScoreEvaluator.scala 95.65% <95.65%> (ø)
...om/salesforce/op/utils/spark/OpSparkListener.scala 98.7% <0%> (+1.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 078ea62...4b32250. Read the comment docs.

val dataProcessed = makeDataToUse(data, labelColumnName)

import dataProcessed.sparkSession.implicits._
val rdd = dataProcessed.select(getPredictionValueCol, labelColumnName).as[(Double, Double)].rdd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont seem to use this rdd beyond checking if the dataset is empty

} else {
// Find the significant digit to which the scores needs to be rounded, based of numBins.
val significantDigitToRoundOff = math.log10(numBins).toInt + 1
val scoreAndLabelsRounded = for {i <- scoreAndLabels}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my explanation on the doc may have made this more complicated than it needs to be.
The probabilities will be between 0 and 1 so use that information to compute the bins and their centers. Make a function binFn that takes in the probability and returns the bin. Then you can simply do:
scoreAndLabels.map{ case (score, label) => (binFun(score), (score, label, 1L) }.reduceByKey(_ + _) .map{ case (bin, (scoreSum, labelSum, count)) => (bin, scoreSum/count, labelSum/count, count) }
(you will need to import com.twitter.algebird.Operators._ for the reduceByKey to work without specifying everything)

*/
case class BinaryClassificationBinMetrics
(
BrierScore: Double,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for some reason all of our metrics do not follow this convention @tovbinm. We should figure out what we want and then make them all consistent.

Copy link
Collaborator

@leahmcguire leahmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is awesome, thank you for contributing!!

@tovbinm
Copy link
Collaborator

tovbinm commented Sep 12, 2018

please take a look into the failing test

[-] BinaryClassificationModelSelector should fit and predict for default models (497.062 secs)
com.salesforce.op.stages.impl.classification.BinaryClassificationModelSelectorTest > BinaryClassificationModelSelector should fit and predict for default models FAILED
    java.lang.ArrayIndexOutOfBoundsException: -175
        at com.salesforce.op.evaluators.OpBinScoreEvaluator$$anonfun$7.apply(OpBinScoreEvaluator.scala:100)
        at com.salesforce.op.evaluators.OpBinScoreEvaluator$$anonfun$7.apply(OpBinScoreEvaluator.scala:96)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
        at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
        at com.salesforce.op.evaluators.OpBinScoreEvaluator.evaluateAll(OpBinScoreEvaluator.scala:96)
        at com.salesforce.op.evaluators.OpBinScoreEvaluator.evaluateAll(OpBinScoreEvaluator.scala:59)
        at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:92)
        at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:89)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:296)
        at com.salesforce.op.stages.impl.selector.HasEval$class.evaluate(ModelSelectorNames.scala:89)
        at com.salesforce.op.stages.impl.selector.ModelSelector.evaluate(ModelSelector.scala:74)
        at com.salesforce.op.stages.impl.selector.ModelSelector.fit(ModelSelector.scala:171)
        at com.salesforce.op.stages.impl.classification.BinaryClassificationModelSelectorTest$$anonfun$8.apply(BinaryClassificationModelSelectorTest.scala:186)
        at com.salesforce.op.stages.impl.classification.BinaryClassificationModelSelectorTest$$anonfun$8.apply(BinaryClassificationModelSelectorTest.scala:171)

https://travis-ci.com/salesforce/TransmogrifAI/jobs/145162156

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is actually a deeper issue with probability values, so I actually let @leahmcguire comment here.

@leahmcguire
Copy link
Collaborator

Yeah so what is happening is that not every model type produces a probablity column. And there is a special case in the makeData call that will fall back to the rawPrediciton value instead of the probability. This is not bounded so you will actually need to do a reduce to get the min and max values for the bin range.

@gokulsfdc
Copy link
Contributor Author

So do you mean, default min max value to compute bin should be [0,1]
If probability goes out of bounds due to the fallback , then this range needs to be altered accordingly

@tovbinm tovbinm merged commit 7ba2a02 into salesforce:master Sep 14, 2018
@gokulsfdc gokulsfdc deleted the op/bin_evaluator branch September 14, 2018 17:06
@salesforce-cla
Copy link

Thanks for the contribution! It looks like @leahmcguire is an internal user so signing the CLA is not required. However, we need to confirm this.

@salesforce-cla
Copy link

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants