multipletests: reducing memory consumption #1394
multipletests wasn't written with memory consumption in mind, it also wasn't optimized for speed. Parts of it are still a mixture of research and production code.
A bigger change would be to work only with part of the p-value array that is close to a decision threshold.
Working in batches on the sorted p-values looks difficult because of the adjustment with minimum.accumulate or maximum.accumulate. (What happens in a later batch could/will influence the numbers in an earlier batch, which is a feature of step-up and step-down methods.)
see #1392 for using out of memory computation instead of tweaking the algorithm for a little bit more memory.
related: can we do anything about memory fragmentation? numpy needs contiguous memory to create a new array.
a bit of timing:
here is the test script that I use on the commandline
I'm using 32bit python, numpy 1.6.1. I restarted my computer and have 3 GB of memory available that should be little fragmented.
After a bit of cleanup for 'holms' in c86e08b , it returns results for 5300**2 pvalues.
I get segfaults (in