Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Added fit and score times for learning_curve #13938
What does this implement/fix? Explain your changes.
It would be interesting to have access to the fit and score times of the estimators during the computing of the learning curves. This can be very easily done because
Any other comments?
Here is how fit and score times can be used to get valuable information:
Using the following setup:
Having the times of the fit and scores help us plotting the following:
And also such plots:
As you can see it is easy to determine the best estimator for the considered dataset, taking in account the
NicolasHug left a comment
A few comments, Looks good in general.
I'm not a huge fan of returning tuples with different sizes depending on the input, but I guess it's too late to return a bunch a anyway.
@thomasjpfan wrote on gitter:
Are you trying to suggest changing the return value without deprecation?? I don’t think that’s a good idea, certainly not unless you have a return value that can unpack as three elements.
Or are you trying to suggest that you want feedback on whether we should consider deprecating and moving to a bunch. I agree that deprecation would be disruptive to users and tutorials of this, so I'm not entirely fond of it.
If we want to change to bunch, then instead of
I was looking for feedback for such an API change. (I will be more direct next time). Even in the example we have something like this:
train_sizes, train_scores, test_scores, fit_times, _ = \ learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes, return_times=True)
Every time I see
Otherwise lgtm. We can separately consider a return_bunch option, I suppose.
Please add an entry to the change log at
@H4dr1en , please look at https://github.com/scikit-learn/scikit-learn/pull/13938/files and revert all the unrelated changes.