New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dti reconst less memory hungry #840
Conversation
Since the change to vectorized operation, it processes 1000 voxels by default instead of 1. While faster, it's also memory hungry (with the first version causing out of memory on simple datasets). Following the computing tracks as f32, here is a computing eigenvalues as f64, but returning them as f32 change. My plain normal dti script was using 12 go of ram on an hcp dataset (and something swapping), so this change should reduce the memory usage. It should also downgrade computed metrics to f32 precision, but as long as only the end result is returned as f32 everything should behave fine I recon.
Will this really fix the root cause of the problem? Of course, it halves the needed memory, but should we be more flexible? In the internal discussion, you told me it might also be related to the parallelization of the computations, computing 1000 voxels at once... |
+1 for @jchoude |
@dimrozakis have you seen this PR? |
If we lower the 1000 voxels, then we just go back to regular speed, that's why I did not touch it. In any case, as long as a simple dti script does not use anymore 12 go of ram I'll be happy. |
Well it helps a bit, as it overs around 4 go (instead of 8) )before still jumping to swapping. Could something weird by happening when the tensor.predict method is used? |
I'm not sure about dropping precision, perhaps this should optionally be configured by the function caller? Also, please note that you can specify a smaller step on runtime, by initializing
The step param is passed by TensorModel to the wrapped fitting function and the value of We could also reduce the default value to, say, its half. It'll reduce peak memory usage by about a half and shouldn't seriously decrease speed. Actually, since it can easily be set by the caller to any value, the default value could be lowered even more. |
Adding a dtype options seems a good idea also, we just recast all outputs 2016-01-20 14:12 GMT+01:00 Dimitris Rozakis notifications@github.com:
|
@samuelstjean I believe @dimrozakis response was satisfactory. So we should close this PR. I am not sure that an dtype parameter would solve the problem. I still don't understand why you need 12GBytes of RAM for the HCP data. Maybe most of this memory is used just to load the data and it is not related to the DTI fitting. So, the trick maybe there is to use a memmap for loading the data. |
It's already using a memmap, and I don't think I had the problem before.
|
It could be. Yes, dig a bit deeper and identify at which point the memory increases. Use a memory profiler like this one https://pypi.python.org/pypi/memory_profiler that can tell you the exact line where the problem (the memory increase appears) and share the script showing the problem. In that way it will be easier to understand what is happening. Indeed it is likely that the reshape is copying memory from the memory map. |
@samuelstjean and @arokem do you have now a clear picture of what creates the memory increase or still investigating? |
Haven't had time to look at it more deeply. Got distracted by #762 last TBH, I am not even sure there is a problem. For example, see this memory profiling experiment: I don't really see any dramatic increase in memory as we go from looping But I wouldn't mind reducing the default to 1000 or so, to be on the safe NOTE: I was definitely wrong about that reshaping. This couldn't have On Fri, Jan 29, 2016 at 10:39 AM, Eleftherios Garyfallidis <
|
Have you used this with big data? HCP big? |
Not yet. But there it is -- download it, run it on the data that was On Fri, Jan 29, 2016 at 11:04 AM, Eleftherios Garyfallidis <
|
Is the CENIR data big enough for you? I am running that right now. On Fri, Jan 29, 2016 at 11:08 AM, Ariel Rokem arokem@gmail.com wrote:
|
[image: Inline image 1] On Fri, Jan 29, 2016 at 11:25 AM, Ariel Rokem arokem@gmail.com wrote:
|
Cenir is fine @arokem ! @samuelstjean this time and next time please give us more specific information about where the problem appears. I feel we are replicating effort because you are not giving us enough information. What did you get from the memory profiler? Which lines create the problem? Be specific and pedantic please. |
I don't see anything too alarming here: https://gist.github.com/arokem/06717f7b7336429be38f Slight increase in memory is to be expected, but it's definitely not out of control. |
Please run this on other systems. There might be something idiosyncratic about my mac! |
Ask mic, he had the same problem long before on a single simulated voxel. Cenir dataset probably wont cut it, I'm using an hcp and they have a weird
|
This isn't the right solution here, If there is a way to re-structure the On Fri, Jan 29, 2016 at 1:49 PM Samuel St-Jean notifications@github.com
|
Sorry I somehow missed some of the comments. I see that step is already On Fri, Jan 29, 2016 at 2:12 PM Bago mrbago@gmail.com wrote:
|
Apparently chunk size has nothing to do with it, which explains also why
|
Found it, its in the predict method, can't %memit as memory explodes before then. So something weird is indeed happening in there, as in this line it fill up pretty quickly. The only weird stuff is mixing mmap with np array bvals, bvecs = read_bvals_bvecs('b3000.bvals', 'b3000.bvecs')
gtab = gradient_table(bvals, bvecs, b0_threshold=5)
tenmodel = TensorModel(gtab, fit_method='WLS')
tenfit = tenmodel.fit(data, mask)
S0 = np.mean(data[..., gtab.b0s_mask], -1)
%memit data_p = tenfit.predict(gtab, S0) Could be due to that, but tenfit.predict does not use data, so could it be a problem with S0? In [8]: data.flags In [9]: mask.flags |
Oh yeah the screenshot is taken directly during the %memit, as I had to kill it because it jumps to around 18 go of ram total, starting at around 2 go and just going up until my whole computer freezes. |
OK -- that actually make sense, considering that we're doing the https://github.com/nipy/dipy/blob/master/dipy/reconst/dti.py#L683 We should probably loop over segments in a similar manner to what we do on On Mon, Feb 1, 2016 at 6:17 AM, Samuel St-Jean notifications@github.com
|
Still wondering why I am the only one with problems on an hcp dataset, I 2016-02-01 16:33 GMT+01:00 Ariel Rokem notifications@github.com:
|
Closing this in favor of #857 |
Since the change to vectorized operation, it processes 1000 voxels by default instead of 1. While faster, it's also memory hungry (with the first version causing out of memory on simple datasets). Following the computing tracks as f32, here is a computing eigenvalues as f64, but returning them as f32 change.
My plain normal dti script was using 12 go of ram on an hcp dataset (and something swapping), so this change should reduce the memory usage. It should also downgrade computed metrics to f32 precision, but as long as only the end result is returned as f32 everything should behave fine I recon.