Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive Imputer cannot accept PLSRegression() as an estimator due to "shape mismatch" #19352

Open
firshu opened this issue Feb 5, 2021 · 6 comments · Fixed by leechu27/CSCD01-scrumkingdom-scikit-learn#2 or SkuaD01/scikit-learn#1
Labels

Comments

@firshu
Copy link

firshu commented Feb 5, 2021

Describe the bug

When setting the estimator as PLSRegression(), a ValueError is triggered by module '_iteractive.py' in line 348, caused by "shape mismatch"

Steps/Code to Reproduce

Example:

import numpy as np

from sklearn.datasets import fetch_california_housing
from sklearn.cross_decomposition import PLSRegression
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer

rng = np.random.RandomState(42)

X_california, y_california = fetch_california_housing(return_X_y=True)
X_california = X_california[:400]
y_california = y_california[:400]

def add_missing_values(X_full, y_full):
    n_samples, n_features = X_full.shape

    # Add missing values in 75% of the lines
    missing_rate = 0.75
    n_missing_samples = int(n_samples * missing_rate)

    missing_samples = np.zeros(n_samples, dtype=bool)
    missing_samples[: n_missing_samples] = True

    rng.shuffle(missing_samples)
    missing_features = rng.randint(0, n_features, n_missing_samples)
    X_missing = X_full.copy()
    X_missing[missing_samples, missing_features] = np.nan
    y_missing = y_full.copy()

    return X_missing, y_missing

X_miss_california, y_miss_california = add_missing_values(
    X_california, y_california)

imputer = IterativeImputer(estimator=PLSRegression(n_components=2))

X_imputed = imputer.fit_transform(X_miss_california)
print(X_imputed)

Expected Results: after applying the workaround below:

[[   8.3252       41.            6.98412698 ...    2.55555556
    37.88       -122.25930206]
 [   8.3014       21.            6.23813708 ...    2.10984183
    37.86       -122.22      ]
 [   7.2574       52.            8.28813559 ...    2.80225989
    37.85       -122.24      ]
 ...
 [   3.60438721   50.            5.33480176 ...    2.30396476
    37.88       -122.29      ]
 [   5.1675       52.            6.39869281 ...    2.44444444
    37.89       -122.29      ]
 [   5.1696       52.            6.11590296 ...    2.70619946
    37.8709526  -122.29      ]]

Actual Results

File "/home/hushsh/py3/lib/python3.6/site-packages/sklearn/impute/_iterative.py", line 348, in _impute_one_feature
    X_filled[missing_row_mask, feat_idx] = imputed_values
ValueError: shape mismatch: value array of shape (27,1) could not be broadcast to indexing result of shape (27,) 

Versions

System:
python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
executable: /home/hushsh/raid_data/py3/bin/python
machine: Linux-5.4.0-60-generic-x86_64-with-LinuxMint-19.3-tricia

Python dependencies:
pip: 21.0.1
setuptools: 47.3.1
sklearn: 0.24.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.1.3
matplotlib: 3.1.2
joblib: 0.14.1
threadpoolctl: 2.1.0

Built with OpenMP: True

My Workaround that fixed the bug: Insert the following three lines before line 348

shape_imputed_values = imputed_values.shape
if len(shape_imputed_values)>1:
    # convert 2D array to 1D array fixes the bug:
    imputed_values = imputed_values.reshape(shape_imputed_values[0])
@NicolasHug
Copy link
Member

Thanks @firshu, I can reproduce.

The fix is probably to simply call ravel() on the output of predict for single-target predictions.

Would you like to submit a PR for this? On top of the fix we'd need a small non-regression test to make sure the shape is correct.

@NicolasHug NicolasHug added Bug Easy Well-defined and straightforward way to resolve and removed Bug: triage module:impute labels Feb 5, 2021
@firshu
Copy link
Author

firshu commented Feb 5, 2021 via email

@BatoolMM
Copy link

BatoolMM commented Feb 5, 2021

Can I work on this issue, please?

@NicolasHug
Copy link
Member

No problem @firshu
@BatoolMM yes please go ahead!

@ghost
Copy link

ghost commented Jun 25, 2021

@NicolasHug I took this issue due to stalled PRs. Can you please check PR #20355 ? Please.

@thomasjpfan thomasjpfan added Hard Hard level of difficulty and removed Easy Well-defined and straightforward way to resolve labels Apr 17, 2022
@thomasjpfan
Copy link
Member

@zak34drexel This issue is not good first issue because the solution is not simple, as discussed in #20355. For a good first issue, I recommend: #21350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment