Skip to content

_apply_prediction_method boolean indexing incompatible with standard sklearn format #95

@samtalki

Description

@samtalki

Hello,

When attempting to train a piecewise_estimator function, an error is consistently produced when using the standard (n_samples,n_features) sklearn format. The use of boolean indexing on line 296 of the _apply_prediction_method function is creating this issue.

For example, for the following data:

print(X_train.shape,y_train.shape) print(X_test.shape)
(23476, 1) (23476, 1) (11564, 1)

Attempting to train the model in this fashion:


from sklearn.tree import DecisionTreeRegressor
from mlinsights.mlmodel import PiecewiseRegressor

model = PiecewiseRegressor(verbose=True,
                          binner=DecisionTreeRegressor(min_samples_leaf=300))

model.fit(X_train,y_train)
vvc_predict = model.predict(X_test)

plot_customer(customer1)
plt.plot(X_test,vvc_predict,'g.',label='VVC_predict',alpha=0.2)
plt.legend()

Yields the following errors:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  60 out of  60 | elapsed:    0.0s finished

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-63d7c4526d72> in <module>
      6 
      7 model.fit(X_train,y_train)
----> 8 vvc_predict = model.predict(X_test)
      9 
     10 plot_customer(customer1)

~\anaconda3\lib\site-packages\mlinsights\mlmodel\piecewise_estimator.py in predict(self, X)
    350         :return: predictions
    351         """
--> 352         return self._apply_predict_method(
    353             X, "predict", _predict_piecewise_estimator, self.dim_)
    354 

~\anaconda3\lib\site-packages\mlinsights\mlmodel\piecewise_estimator.py in _apply_predict_method(self, X, method, parallelized, dimout)
    294             if ind is None:
    295                 continue
--> 296             pred[ind] = p
    297             indall = numpy.logical_or(indall, ind)  # pylint: disable=E1111
    298 

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

By observing the TypeError, it seems numpy wants 0 or 1 dimensional input for boolean indexing. But reshaping is incompatible with the mlinsights library.

I have attempted to solve this using a mask: #94 which lets me use PiecewiseRegressor successfully,

But it seems my contribution isn't correct based on the checks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions