-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-7831: Add new learning curve plotting function/method #5164
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
5f2f507
Add prototype implementation of learning curve in R
tomasfryda d4af1d8
Update R version
tomasfryda 398497b
Fix NPE when no scoring history available
tomasfryda e8ce498
Add initial python version
tomasfryda 5451e61
Unify colors between R and Python
tomasfryda c76f80c
Add error for models without scoring history in python
tomasfryda 06b5fca
Fix alpha selection in GAM/GLM
tomasfryda c79b90c
Use glm_model_summary as model_summary in GAMs
tomasfryda c1d532a
Add tests and fix bugs
tomasfryda 08ed93a
Fix python cv_ribbon default override
tomasfryda c44f710
Change default colors and improve R legend
tomasfryda 694ab40
Fix logic error in R
tomasfryda 28e4fe7
Add coxPH and rename cv_individual_lines to cv_lines
tomasfryda 019230d
Add CoxPH and IsolationForest
tomasfryda 9cbd15a
Map stopping metric to metric in scoring history
tomasfryda adcfcfe
Adjust docstring
tomasfryda 059b00a
Add examples to docstings and fix R cran check
tomasfryda e1ebe85
Fix legend in matplotlib2
tomasfryda 19e8c0d
Incorporate suggestions from MLI meeting
tomasfryda dae0c4d
Add more docstrings and make logloss as default metric for multiple s…
tomasfryda 37aee53
Copy TwoDimTable as in GAM instead of clone
tomasfryda c94895d
Remove matplotlib import at the top of the _explain.py file
tomasfryda 634e104
Move GAM specific modification of ModelBase to h2o-bindings/bin/custo…
tomasfryda 2b0bd96
Assign default implementation to learning curve plot that complains a…
tomasfryda 996acbe
Adapt to the new features from Wendy's PR
tomasfryda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -936,6 +936,32 @@ public void cv_mainModelScores(int N, ModelMetrics.MetricBuilder mbs[], ModelBui | |
Log.info(mainModel._output._cross_validation_metrics.toString()); | ||
mainModel._output._cross_validation_metrics_summary = makeCrossValidationSummaryTable(cvModKeys); | ||
|
||
// Put cross-validation scoring history to the main model | ||
if (mainModel._output._scoring_history != null) { // check if scoring history is supported (e.g., NaiveBayes doesn't) | ||
mainModel._output._cv_scoring_history = new TwoDimTable[cvModKeys.length]; | ||
for (int i = 0; i < cvModKeys.length; i++) { | ||
TwoDimTable sh = cvModKeys[i].get()._output._scoring_history; | ||
String[] rowHeaders = sh.getRowHeaders(); | ||
String[] colTypes = sh.getColTypes(); | ||
int tableSize = rowHeaders.length; | ||
int colSize = colTypes.length; | ||
TwoDimTable copiedScoringHistory = new TwoDimTable( | ||
sh.getTableHeader(), | ||
sh.getTableDescription(), | ||
sh.getRowHeaders(), | ||
sh.getColHeaders(), | ||
sh.getColTypes(), | ||
sh.getColFormats(), | ||
sh.getColHeaderForRowHeaders()); | ||
for (int rowIndex = 0; rowIndex < tableSize; rowIndex++) { | ||
for (int colIndex = 0; colIndex < colSize; colIndex++) { | ||
copiedScoringHistory.set(rowIndex, colIndex,sh.get(rowIndex, colIndex)); | ||
} | ||
} | ||
mainModel._output._cv_scoring_history[i] = copiedScoringHistory; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does cloning the object not work for this use case? it seems like you are just copying the table |
||
} | ||
} | ||
michalkurka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if (!_parms._keep_cross_validation_models) { | ||
int count = Model.deleteAll(cvModKeys); | ||
Log.info(count+" CV models were removed"); | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scoring_iteration_interval
might change depending on benchmark results.When
_valid
is specified, we use lambda search which provides plenty of information for the learning curve. Otherwise, lambda search is off and the information in scoring history is often just one or two points entries. Even metalearner trained on Airlines dataset subset with 250k rows has less than 10 iterations so even thisscore_iteration_iterval
might be too big.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With benchmark results from @wendycwong we decided to keep the value to 5 as it has less than 2% performance impact but it improves the learning curve in some situations significantly. This affects only the AUTO metalearner in SE.