Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threshold metrics calculation fix when unseen labels are present #293

Merged
merged 84 commits into from
Apr 22, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
a0f5bc7
label metadata after data cut
gerashegalov Mar 31, 2019
0a7d872
idempotent trim and special-case DataCutter
gerashegalov Apr 1, 2019
8f900a1
ignore previous labels in metadata
gerashegalov Apr 1, 2019
0b520ee
more logging
gerashegalov Apr 1, 2019
0a775eb
num_vals instead of vals
gerashegalov Apr 2, 2019
9da41b2
Revert "num_vals instead of vals"
gerashegalov Apr 3, 2019
3e73aaa
fix metadata labels
gerashegalov Apr 3, 2019
bde7cf0
sync labels
gerashegalov Apr 3, 2019
c5d3272
add mkstring
gerashegalov Apr 3, 2019
2ad6876
cleanup and fixes
gerashegalov Apr 3, 2019
e084d4e
WIP
gerashegalov Apr 4, 2019
937dc17
Revert "cleanup and fixes"
gerashegalov Apr 4, 2019
a803644
rewwork labels dropped
gerashegalov Apr 4, 2019
ae1b58d
restore accidental revert
gerashegalov Apr 5, 2019
4f7e2ef
re-add num_vals
gerashegalov Apr 5, 2019
30f961d
WIP
gerashegalov Apr 5, 2019
81f5ee9
WIP2
gerashegalov Apr 5, 2019
acff707
tests passing
gerashegalov Apr 7, 2019
aee7bbe
debug for leah
gerashegalov Apr 8, 2019
f140aa9
if inverted
gerashegalov Apr 8, 2019
1b3bc27
LR repro'ed
gerashegalov Apr 8, 2019
5074ff5
WIP
gerashegalov Apr 9, 2019
bcb93fd
test vectors
gerashegalov Apr 9, 2019
8c6d685
test both indexed and non-idexed data with data cutter
gerashegalov Apr 9, 2019
f1bb79d
Leah's comments
gerashegalov Apr 11, 2019
da400d4
param for number dropped lables to log
gerashegalov Apr 12, 2019
5a5b19d
WIP
gerashegalov Apr 15, 2019
74e9dd1
WIP
gerashegalov Apr 15, 2019
4a4171f
WIP
gerashegalov Apr 15, 2019
2e467c1
WIP
gerashegalov Apr 15, 2019
2d64f14
formatting
gerashegalov Apr 15, 2019
c43259d
Added metrics calculation to existing test to reproduce error seen in…
Jauntbox Apr 16, 2019
5e99cb7
Fixed bug in threshold metrics when labels unseen at training time ar…
Jauntbox Apr 17, 2019
e89b647
Removed debug logging from test
Jauntbox Apr 17, 2019
cb17f96
Merge branch 'master' of github.com:salesforce/TransmogrifAI into km/…
Jauntbox Apr 17, 2019
27ff0b9
label metadata after data cut
gerashegalov Mar 31, 2019
453c1e1
idempotent trim and special-case DataCutter
gerashegalov Apr 1, 2019
6b31bad
ignore previous labels in metadata
gerashegalov Apr 1, 2019
2f0af21
more logging
gerashegalov Apr 1, 2019
78acc4c
num_vals instead of vals
gerashegalov Apr 2, 2019
3ee8149
Revert "num_vals instead of vals"
gerashegalov Apr 3, 2019
4415245
fix metadata labels
gerashegalov Apr 3, 2019
86e5425
sync labels
gerashegalov Apr 3, 2019
c3bda19
add mkstring
gerashegalov Apr 3, 2019
6000be7
cleanup and fixes
gerashegalov Apr 3, 2019
1fdbe3e
WIP
gerashegalov Apr 4, 2019
4c5aa83
Revert "cleanup and fixes"
gerashegalov Apr 4, 2019
896dcb0
rewwork labels dropped
gerashegalov Apr 4, 2019
71d0b60
restore accidental revert
gerashegalov Apr 5, 2019
dd75509
re-add num_vals
gerashegalov Apr 5, 2019
9af4edf
WIP
gerashegalov Apr 5, 2019
62c489d
WIP2
gerashegalov Apr 5, 2019
e70a621
tests passing
gerashegalov Apr 7, 2019
af77fda
debug for leah
gerashegalov Apr 8, 2019
e545f8b
if inverted
gerashegalov Apr 8, 2019
19c91be
LR repro'ed
gerashegalov Apr 8, 2019
1948507
WIP
gerashegalov Apr 9, 2019
0ee4ffb
test vectors
gerashegalov Apr 9, 2019
9f592ac
test both indexed and non-idexed data with data cutter
gerashegalov Apr 9, 2019
2841359
Leah's comments
gerashegalov Apr 11, 2019
cf55a49
param for number dropped lables to log
gerashegalov Apr 12, 2019
5701b08
WIP
gerashegalov Apr 15, 2019
b9eb963
WIP
gerashegalov Apr 15, 2019
ae3e5c5
WIP
gerashegalov Apr 15, 2019
a44707b
WIP
gerashegalov Apr 15, 2019
831e114
formatting
gerashegalov Apr 15, 2019
a1f37c3
Replace cacheValidatedDFForTesting with the extended DataCutter class…
gerashegalov Apr 17, 2019
b3db9c4
mt feedback WIP
gerashegalov Apr 18, 2019
dd16900
build failure fix
gerashegalov Apr 18, 2019
0dd9fb6
whitespace
gerashegalov Apr 18, 2019
37b2dce
column index constants
gerashegalov Apr 18, 2019
cc2d4f0
Param rename
gerashegalov Apr 18, 2019
e973f46
CodeFactor issue
gerashegalov Apr 18, 2019
27992db
Merge branch 'master' into gera/factorlabeltrim
tovbinm Apr 18, 2019
3029177
Merged with Gera's branch
Jauntbox Apr 18, 2019
8197332
Fixed merge
Jauntbox Apr 18, 2019
6bf7d72
Merge branch 'master' into km/label-trim-metrics
leahmcguire Apr 19, 2019
0d8941b
Merge with master
Jauntbox Apr 19, 2019
a13cf54
Merge branch 'km/label-trim-metrics' of github.com:salesforce/Transmo…
Jauntbox Apr 19, 2019
6970775
Put blank line back in
Jauntbox Apr 19, 2019
9307140
Put blank line back in, but not with space
Jauntbox Apr 19, 2019
69fad18
Address comment
Jauntbox Apr 19, 2019
14b442d
Merge branch 'master' of github.com:salesforce/TransmogrifAI into km/…
Jauntbox Apr 19, 2019
321bc7e
Merge branch 'master' of github.com:salesforce/TransmogrifAI into km/…
Jauntbox Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,8 @@ private[op] class OpMultiClassificationEvaluator
def computeMetrics(scoresAndLabels: (Array[Double], Double)): MetricsMap = {
val scores: Array[Double] = scoresAndLabels._1
val label: Label = scoresAndLabels._2.toInt
val trueClassScore: Double = scores(label)
// The label may be unseen during model training, so treat scores for unseen classes as all being zero
val trueClassScore: Double = scores.lift(label).getOrElse(0.0)
val topNsAndScores: Map[Label, Array[(Double, Int)]] = topNs.map(t => t -> scores.zipWithIndex.sortBy(-_._1)
.take(t)).toMap
val topNScores: Map[Label, Array[Double]] = topNsAndScores.mapValues(_.map(_._1))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,16 @@ class MultiClassificationModelSelectorTest extends FlatSpec with TestSparkContex
.setInput(label, features)
testEstimator.fit(bigDataWithMeta)

val prediction = testEstimator.getOutput()
val model = testEstimator.fit(bigDataWithMeta)
val transformedBigData = model.transform(bigDataWithMeta)

// Verify that multiclass metrics can be computed even when unseen labels may be present
val evaluatorMulti = new OpMultiClassificationEvaluator()
.setLabelCol(label)
.setPredictionCol(prediction)
noException should be thrownBy evaluatorMulti.evaluateAll(transformedBigData)

val (numLabelsInCutter, numLabelsInCutterMetadata) = {
import scala.language.reflectiveCalls
val df = cutter.prevalidationValForTest
Expand Down