Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add feature_names_in_ and n_features_in_ attributes to dummy estimators #27937

Merged
merged 8 commits into from Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 20 additions & 0 deletions doc/whats_new/v1.4.rst
Expand Up @@ -31,6 +31,19 @@ random sampling procedures.
specified `tol`, for small values you will get more precise results.
:pr:`26721` by :user:`Christian Lorentzen <lorentzenchr>`.

.. note::

The lbfgs is the default solver, so this change might effect many models.

This change also means that with this new version of scikit-learn, the resulting
coefficients `coef_` and `intercept_` of your models will change for these two
solvers (when fit on the same data again). The amount of change depends on the
specified `tol`, for small values you will get more precise results.
tvdboom marked this conversation as resolved.
Show resolved Hide resolved

- |Enhancement| :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor` now
have the `n_features_in_` and `feature_names_in_` attributes after `fit`.
:pr:`27937` by :user:`Marco vd Boom <tvdboom>`.
tvdboom marked this conversation as resolved.
Show resolved Hide resolved

- |Fix| fixes a memory leak seen in PyPy for estimators using the Cython loss functions.
:pr:`27670` by :user:`Guillaume Lemaitre <glemaitre>`.

Expand Down Expand Up @@ -381,6 +394,13 @@ Changelog
version 1.6. Use the default value instead.
:pr:`27834` by :user:`Guillaume Lemaitre <glemaitre>`.

:mod:`sklearn.dummy`
.......................

- |Enhancement| :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor` now
have the `n_features_in_` and `feature_names_in_` attributes after `fit`.
:pr:`27937` by :user:`Marco vd Boom <tvdboom>`.

:mod:`sklearn.ensemble`
.......................

Expand Down
18 changes: 18 additions & 0 deletions sklearn/dummy.py
Expand Up @@ -110,6 +110,13 @@ class prior probabilities.
Frequency of each class observed in `y`. For multioutput classification
problems, this is computed independently for each output.

n_features_in_ : int
Number of features seen during :term:`fit`.

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during :term:`fit`. Defined only when `X` has
feature names that are all strings.

n_outputs_ : int
Number of outputs.

Expand Down Expand Up @@ -170,6 +177,8 @@ def fit(self, X, y, sample_weight=None):
self : object
Returns the instance itself.
"""
self._validate_data(X, cast_to_ndarray=False)

self._strategy = self.strategy

if self._strategy == "uniform" and sp.issparse(y):
Expand Down Expand Up @@ -488,6 +497,13 @@ class DummyRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
Mean or median or quantile of the training targets or constant value
given by the user.

n_features_in_ : int
Number of features seen during :term:`fit`.

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during :term:`fit`. Defined only when `X` has
feature names that are all strings.

n_outputs_ : int
Number of outputs.

Expand Down Expand Up @@ -545,6 +561,8 @@ def fit(self, X, y, sample_weight=None):
self : object
Fitted estimator.
"""
self._validate_data(X, cast_to_ndarray=False)

y = check_array(y, ensure_2d=False, input_name="y")
if len(y) == 0:
raise ValueError("y must not be empty.")
Expand Down
2 changes: 1 addition & 1 deletion sklearn/tests/test_dummy.py
Expand Up @@ -376,7 +376,7 @@ def test_quantile_invalid():

def test_quantile_strategy_empty_train():
est = DummyRegressor(strategy="quantile", quantile=0.4)
with pytest.raises(ValueError):
with pytest.raises(IndexError):
est.fit([], [])


Expand Down