-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected MemoryError from IncrementalPCA used with memmap #5173
Comments
git blame points the finger right back on me @lesteve did you have any issues with this for nilearn? I am probably OK with loosening the check_array to only run on things which On Thu, Aug 27, 2015 at 7:58 AM, J. Richard Snape notifications@github.com
|
Hmmm, there may be deeper issues... I was going to submit a PR based on patching like this (following your comment)
This allows However, when I tried this out I then hit a MemoryError further along the road - where Important note This appears to 'all just work'™ in a 64 bit environment (I guess because the array can be copied into memory) I think the person who originally raised the issue is probably using type I wonder if the answer may be to document these limitations. Interested in your thoughts. |
Try reducing the minibatch size by setting the parameter batch_size to If the check array is an issue on the full, we could easily do it on a "per On Fri, Aug 28, 2015 at 6:49 AM, J. Richard Snape notifications@github.com
|
Ah - excellent suggestion. With the patch to the call to But once we're past that, it seems the I'm in a bit of a strange situation, as this issue was raised on Stack overflow and, while I could trace the source of it, I haven't got a known big dataset to test with, so I'm testing this on an empty I guess it is your call on whether to move to doing the |
It is easier to just check at the start - if someone needs it per minibatch Kyle On Fri, Aug 28, 2015 at 8:43 AM, J. Richard Snape notifications@github.com
|
Sorry to be late on this one, but I don't remember getting this kind of issue |
|
So this is an easy fix, and can't really be unit tested? |
So the change is just X = check_array(X, dtype=[np.float64, np.float32, np.float16])
Do we really want to support float16?
|
Hi, this issue can be closed. I wasn't able to reproduce it, and after some digging I see that the fix was implemented here https://github.com/scikit-learn/scikit-learn/pull/5104/files#diff-5e29b5d9b2eeb884f0a1b69aa52e8e7234880af4c71910fae6a1bc1b5604f13bR199 |
Thanks for the update @nakamin ! We also currently have common test to check that all our transformers work with memmapped data. |
Prompted by a question asked in Stackoverflow chat, I investigated why a user would encounter a MemoryError when using the following minimal code:
I found that the MemoryError was called by this call to
check_array
:The problem is with adding the
dtype=np.float
to the call tocheck_array
. This means that thecopy=False
default ofcheck_array
is ignored, because thedtype
has changed in that call tonp.array()
and the docs state that change ofdtype
will force a copy irrespective of thecopy
argument.I made a simple gist demonstrating the issue here
I propose that the line above could be changed to
but am not expert enough in this tool to understand whether that might have negative consequences further along the chain of execution. If that is a good solution, happy to submit a PR.
The text was updated successfully, but these errors were encountered: