Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

Closed
torivor opened this issue Aug 2, 2022 · 4 comments

Comments

@torivor
Copy link

torivor commented Aug 2, 2022

Hello, this package has been a lifesaver for my imputation needs. However, recently I encountered this error.. "ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required." that I can't seem to solve. I encountered the error while trying to fit_transform my Pandas DataFrame. All column of the DataFrame has some values, none of them are completely filled with missing values yet the package still throws this error. Hence, I don't know what triggered the error. I assume this is due to a coding fault within the package.

I can send the data as it's publicly available from a Kaggle competition, but the size is too big. Should anyone require the data, I can email it, just send me a request at andreasparasian@gmail.com.

@kearnz
Copy link
Owner

kearnz commented Aug 3, 2022

Hi @torivor,

Can't say without looking at the dataset, but have a look at my answer in #65 then the thread in #68

Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.

Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.

In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.

@torivor
Copy link
Author

torivor commented Aug 3, 2022 via email

@torivor
Copy link
Author

torivor commented Aug 3, 2022 via email

@kearnz
Copy link
Owner

kearnz commented Sep 10, 2022

Apologies for the late reply. Looked into it more and this is because of the issues I linked to. Right now autoimpute uses listwise_delete instead of placeholders. So if you use a lot of features, there must be at least one row where all those features (expect the imputed column) have a datum present. The chance of this being True gets smaller as the number of features expands.

Again, see #65 and #68. The recommended solution for now is to experiment with your data. Try using fewer columns for the imputation as a start, instead of all the features. I'm planning to work on moving from complete case to mean placeholders at some point soon but don't have a TBD on that yet.

@kearnz kearnz closed this as completed Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants