You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to combine OptBinning with StatsModels estimators, using the sklearn2pmml.statsmodels.StatsModelsClassifier estimator as a StatsModels/SkLearn bridge.
StatsModels estimators generally do not support sample weights. The ones that do, they expect sample weights as a constructor argument, not as a fit method argument.
Now, if we don't care about sample weights, then it should be possible to train a simple binary classification scorecard like this:
When running this example with OptBinning 0.17.2, then the following TypeError is raised:
Traceback (most recent call last):
File "main.py", line 25, in <module>
scorecard.fit(iris_X, iris_y)
File "/home/user/.local/lib/python3.9/site-packages/optbinning/scorecard/scorecard.py", line 303, in fit
return self._fit(X, y, sample_weight, metric_special, metric_missing,
File "/home/user/.local/lib/python3.9/site-packages/optbinning/scorecard/scorecard.py", line 560, in _fit
self.estimator_.fit(X_t, y, sample_weight)
TypeError: fit() takes 3 positional arguments but 4 were given
While we are at this, perhaps it would make sense to support a workflow, where sample weights are passed only to the binning_process_ object or the estimator_ object?
Something similar to Scikit-Learn's "double underscore prefix" mechanism?
For example:
scorecard.fit(X, y, binning_process__sample_weight= ..., estimator__sample_weight=None)
I agree with the quick change you proposed for sample weights, although optbinning is designed mainly having in mind sklearn compatibility.
Regarding the "double underscore prefix", I have some doubts. To do so, I would rely on the sklearn Pipeline class to re-implement the scorecard class internals.
In the Scorecard case, the algorithm would be something like this:
deffit(X, y, **fit_params):
# Params that are exclusive to specific components - has either "binning_process" or "estimator" prefixbinning_process_fit_params=_extract_step_params("binning_process", fit_params)
estimator_fit_params=_extract_step_params("estimator", fit_params)
# Params that are common between components - no prefixiflen(fit_params) >0:
binning_process_fit_params.update(fit_params)
estimator_fit_params.update(fit_params)
# Actual fitbinning_process_.fit(X, y, **binning_process_fit_params)
estimator_.fit(X, y, **estimator_fit_params)
This is just an idea. You decide, whether it's worth having separate fit param sets for individual Scorecard components or not.
I'm trying to combine OptBinning with StatsModels estimators, using the
sklearn2pmml.statsmodels.StatsModelsClassifier
estimator as a StatsModels/SkLearn bridge.StatsModels estimators generally do not support sample weights. The ones that do, they expect sample weights as a constructor argument, not as a fit method argument.
Now, if we don't care about sample weights, then it should be possible to train a simple binary classification scorecard like this:
When running this example with OptBinning 0.17.2, then the following
TypeError
is raised:Essentially, the problem is that the
Scorecard.fit(X, y)
method always passes asample_weight
as a third positional argument, even if such fit param is not present in the current context:https://github.com/guillermo-navas-palencia/optbinning/blob/v0.17.2/optbinning/scorecard/scorecard.py#L559-L560
A quick fix would be to check if
sample_weight
fit param is set, and only then pass it forward to theestimator_
object:The text was updated successfully, but these errors were encountered: