Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sorting in Prediction type for multiclass classification and add stronger tests #213

Merged
merged 13 commits into from
Feb 6, 2019

Conversation

Jauntbox
Copy link
Contributor

@Jauntbox Jauntbox commented Feb 4, 2019

Related issues
None

Describe the proposed solution
Sorts keys by their final index - the actual index corresponding to the class, rather than the default string sort which jumbles classes together. Also adds many tests relevant to multiclass classification and threshold metrics.

Describe alternatives you've considered
N/A

Additional context
Previous threshold metrics (only those, not the metrics from Spark itself) were out of order when there were more that 10 classes due to this issue.

@codecov
Copy link

codecov bot commented Feb 4, 2019

Codecov Report

Merging #213 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #213      +/-   ##
==========================================
+ Coverage   86.36%   86.37%   +0.01%     
==========================================
  Files         310      310              
  Lines       10136    10137       +1     
  Branches      351      548     +197     
==========================================
+ Hits         8754     8756       +2     
+ Misses       1382     1381       -1
Impacted Files Coverage Δ
.../scala/com/salesforce/op/features/types/Maps.scala 92.77% <100%> (+0.08%) ⬆️
...es/src/main/scala/com/salesforce/op/OpParams.scala 85.71% <0%> (-4.09%) ⬇️
.../salesforce/op/features/FeatureBuilderMacros.scala 100% <0%> (+100%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ee7cc7...1ed252d. Read the comment docs.


// Need to make sure we sort the keys by their final index, which comes after an underscore in the apply function
private def keysStartsWith(name: String): Array[String] = value.keys.filter(_.startsWith(name)).toArray
.sortBy(s => s.substring(s.lastIndexOf("_") + 1).toInt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.split('_').last.toInt

val numClasses = 5
val numRows = 10
val vectors = Seq.fill[OPVector](numRows)(Array.fill(numClasses)(4.2).toOPVector)
println(s"vectors: $vectors")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove prints

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or replace with log

@tovbinm
Copy link
Collaborator

tovbinm commented Feb 5, 2019

Sorting condition did not make it?

@Jauntbox Jauntbox merged commit 78bb3b9 into master Feb 6, 2019
@Jauntbox Jauntbox deleted the km/multi-metrics branch February 6, 2019 07:01

// Need to make sure we sort the keys by their final index, which comes after an underscore in the apply function
private def keysStartsWith(name: String): Array[String] = value.keys.filter(_.startsWith(name)).toArray
.sortBy(_.split('_').last.toInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending how wide the keys are, but it would be more efficient to scan keys backwards to look for _ with lastIndexOf to map to a substring. It would also produce less transient garbage: a single string instead of an array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late for the party :)

@Jauntbox Jauntbox mentioned this pull request Feb 8, 2019
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
@tovbinm tovbinm mentioned this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants