Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export model selector defaults + metadata fixes #199

Merged
merged 8 commits into from
Jan 7, 2019

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Dec 18, 2018

Related issues

  1. It's impossible to reuse model selector default models & grids without copy pasting them from our code base.
  2. Currently model selector fails to produce model selector summary metadata when one of the models contain a stage with unsupported params (example - The problem of Xgboost #198).

Describe the proposed solution

  1. Expose model selector defaults on all model selectors (binary, multi, regression) to allow reuse.
  2. Skip unsupported metadata values when producing model selector summary metadata.

Describe alternatives you've considered
N/A

…ti, regression)

2. Skip unsupported metadata values when producing model selector summary metadata
@codecov
Copy link

codecov bot commented Dec 18, 2018

Codecov Report

Merging #199 into master will decrease coverage by 17.25%.
The diff coverage is 84.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #199       +/-   ##
===========================================
- Coverage    86.4%   69.14%   -17.26%     
===========================================
  Files         309      310        +1     
  Lines       10009    10018        +9     
  Branches      351      526      +175     
===========================================
- Hits         8648     6927     -1721     
- Misses       1361     3091     +1730
Impacted Files Coverage Δ
...a/com/salesforce/op/utils/spark/RichMetadata.scala 89.47% <ø> (+1.75%) ⬆️
...alesforce/op/stages/impl/tuning/DataBalancer.scala 84.76% <ø> (-11.43%) ⬇️
...om/salesforce/op/stages/impl/tuning/Splitter.scala 100% <ø> (ø) ⬆️
.../salesforce/op/stages/impl/tuning/DataCutter.scala 93.47% <ø> (-2.18%) ⬇️
...sforce/op/stages/impl/selector/ModelSelector.scala 98.07% <100%> (-0.11%) ⬇️
...op/stages/impl/selector/ModelSelectorFactory.scala 85.71% <100%> (ø) ⬆️
...sification/BinaryClassificationModelSelector.scala 96.42% <100%> (+0.27%) ⬆️
...op/stages/impl/selector/ModelSelectorSummary.scala 91.3% <100%> (ø) ⬆️
...m/salesforce/op/evaluators/EvaluationMetrics.scala 85.71% <100%> (ø) ⬆️
...ages/impl/regression/RegressionModelSelector.scala 96% <100%> (+0.76%) ⬆️
... and 100 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04cd7ff...ea8fcd7. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 18, 2018

Codecov Report

Merging #199 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #199   +/-   ##
=======================================
  Coverage   86.38%   86.38%           
=======================================
  Files         310      310           
  Lines       10019    10019           
  Branches      550      550           
=======================================
  Hits         8655     8655           
  Misses       1364     1364

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4e8669...3883a54. Read the comment docs.

@tovbinm
Copy link
Collaborator Author

tovbinm commented Jan 3, 2019

@kinfaikan please review

}
// if models to use has been specified and the models have been specified - filter the models by the names
else if (
modelTypesToUse.distinct.sortBy(_.entryName) != modelDefaults.modelTypesToUse.distinct.sortBy(_.entryName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply use entire modelsAndParameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, because it would yield incorrect results: List(1,2) == List(2,1) // false. That the reason I am dedupping and sorting here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking whether it made more sense to ignore modelTypes and modelDefaults when modelsAndParameters was not empty.

@tovbinm tovbinm merged commit 421dc9e into master Jan 7, 2019
@tovbinm tovbinm deleted the mt/model-selector-improvements branch January 7, 2019 21:36
@Jauntbox Jauntbox mentioned this pull request Feb 8, 2019
@tovbinm tovbinm mentioned this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants