ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

torivor · 2022-08-02T05:34:17Z

Hello, this package has been a lifesaver for my imputation needs. However, recently I encountered this error.. "ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required." that I can't seem to solve. I encountered the error while trying to fit_transform my Pandas DataFrame. All column of the DataFrame has some values, none of them are completely filled with missing values yet the package still throws this error. Hence, I don't know what triggered the error. I assume this is due to a coding fault within the package.

I can send the data as it's publicly available from a Kaggle competition, but the size is too big. Should anyone require the data, I can email it, just send me a request at andreasparasian@gmail.com.

kearnz · 2022-08-03T06:38:26Z

Hi @torivor,

Can't say without looking at the dataset, but have a look at my answer in #65 then the thread in #68

Won't go into the details as it's already described in those threads, but in short this error can occur if columns are all missing the same value even if each column itself is not completely null.

Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet.

In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible.

torivor · 2022-08-03T17:35:07Z

Hello Kearney, I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response! Sincerely, Andreas Parasian

…

On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney ***@***.***> wrote: Hi @torivor <https://github.com/torivor>, Can't say without looking at the dataset, but have a look at my answer in #65 <#65> then the thread in #68 <#68> Won't go into the details as it's already described in those threads, but in short this error can occur if columns are *all missing the same value* even if each column itself is not completely null. Right now this is expected behavior given the algorithm design, albeit the error is not handled well. In #68 <#68> you'll see a more optimal solution - one in which we use placeholders vs. listwise delete. This is the right way to go, just haven't written the code for it yet. In the meantime i'd play with your featureset, maybe reducing the column space or sampling more rows if possible. — Reply to this email directly, view it on GitHub <#79 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

torivor · 2022-08-03T17:43:00Z

Correction, please use this link to download the CSV file: https://onedrive.live.com/download?resid=94B4399D5B6342E3!29416&authkey=!ADdRFWSxj7zBi64 The link I sent only shows a preview of the file without any option to download (since its larger than 25 megabytes). On Thu, Aug 4, 2022 at 12:34 AM Andreas Parasian ***@***.***> wrote:

…

Hello Kearney, I beg your pardon for the late reply. The dataset that I mentioned before in the Github issue can be found on the following link: https://1drv.ms/u/s!AuNCY1udObSUgeVoN1EVZLGPvMGLrg?e=z4XdYb Should you have any alternative solutions regarding the issue, please let me know. Thank you for the response! Sincerely, Andreas Parasian On Wed, Aug 3, 2022 at 1:38 PM Joe Kearney ***@***.***> wrote: > Hi @torivor <https://github.com/torivor>, > > Can't say without looking at the dataset, but have a look at my answer in > #65 <#65> then the thread in > #68 <#68> > > Won't go into the details as it's already described in those threads, but > in short this error can occur if columns are *all missing the same value* > even if each column itself is not completely null. > > Right now this is expected behavior given the algorithm design, albeit > the error is not handled well. In #68 > <#68> you'll see a more > optimal solution - one in which we use placeholders vs. listwise delete. > This is the right way to go, just haven't written the code for it yet. > > In the meantime i'd play with your featureset, maybe reducing the column > space or sampling more rows if possible. > > — > Reply to this email directly, view it on GitHub > <#79 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AOHMQXEVJ3TKEK6ZEHGYFNLVXIHWZANCNFSM55JZJ6VQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

kearnz · 2022-09-10T17:05:13Z

Apologies for the late reply. Looked into it more and this is because of the issues I linked to. Right now autoimpute uses listwise_delete instead of placeholders. So if you use a lot of features, there must be at least one row where all those features (expect the imputed column) have a datum present. The chance of this being True gets smaller as the number of features expands.

Again, see #65 and #68. The recommended solution for now is to experiment with your data. Try using fewer columns for the imputation as a start, instead of all the features. I'm planning to work on moving from complete case to mean placeholders at some point soon but don't have a TBD on that yet.

kearnz closed this as completed Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

torivor commented Aug 2, 2022

kearnz commented Aug 3, 2022

torivor commented Aug 3, 2022 via email

torivor commented Aug 3, 2022 via email

kearnz commented Sep 10, 2022

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

ValueError: Found array with 0 sample(s) (shape=(0, 174)) while a minimum of 1 is required. #79

Comments

torivor commented Aug 2, 2022

kearnz commented Aug 3, 2022

torivor commented Aug 3, 2022 via email

torivor commented Aug 3, 2022 via email

kearnz commented Sep 10, 2022