Make dti reconst less memory hungry #840

samuelstjean · 2016-01-19T10:42:40Z

Since the change to vectorized operation, it processes 1000 voxels by default instead of 1. While faster, it's also memory hungry (with the first version causing out of memory on simple datasets). Following the computing tracks as f32, here is a computing eigenvalues as f64, but returning them as f32 change.

My plain normal dti script was using 12 go of ram on an hcp dataset (and something swapping), so this change should reduce the memory usage. It should also downgrade computed metrics to f32 precision, but as long as only the end result is returned as f32 everything should behave fine I recon.

Since the change to vectorized operation, it processes 1000 voxels by default instead of 1. While faster, it's also memory hungry (with the first version causing out of memory on simple datasets). Following the computing tracks as f32, here is a computing eigenvalues as f64, but returning them as f32 change. My plain normal dti script was using 12 go of ram on an hcp dataset (and something swapping), so this change should reduce the memory usage. It should also downgrade computed metrics to f32 precision, but as long as only the end result is returned as f32 everything should behave fine I recon.

jchoude · 2016-01-19T19:20:06Z

Will this really fix the root cause of the problem? Of course, it halves the needed memory, but should we be more flexible? In the internal discussion, you told me it might also be related to the parallelization of the computations, computing 1000 voxels at once...

Garyfallidis · 2016-01-19T19:22:32Z

+1 for @jchoude

Garyfallidis · 2016-01-19T19:23:25Z

@dimrozakis have you seen this PR?

samuelstjean · 2016-01-20T09:04:16Z

If we lower the 1000 voxels, then we just go back to regular speed, that's why I did not touch it. In any case, as long as a simple dti script does not use anymore 12 go of ram I'll be happy.

samuelstjean · 2016-01-20T12:56:55Z

Well it helps a bit, as it overs around 4 go (instead of 8) )before still jumping to swapping. Could something weird by happening when the tensor.predict method is used?

dimrozakis · 2016-01-20T13:11:59Z

I'm not sure about dropping precision, perhaps this should optionally be configured by the function caller?

Also, please note that you can specify a smaller step on runtime, by initializing TensorModel as

model = dti.TensorModel(gtab, 'LS', step=10)

The step param is passed by TensorModel to the wrapped fitting function and the value of step overrides the default of 1e4=10000 (not 1000). Here's the relevant code .

We could also reduce the default value to, say, its half. It'll reduce peak memory usage by about a half and shouldn't seriously decrease speed. Actually, since it can easily be set by the caller to any value, the default value could be lowered even more.

samuelstjean · 2016-01-20T13:18:00Z

Adding a dtype options seems a good idea also, we just recast all outputs
to float32 with no problem whatsoever, having both should probably help,
I'll try using only 1000 elements.

2016-01-20 14:12 GMT+01:00 Dimitris Rozakis notifications@github.com:

I'm not sure about dropping precision, perhaps this should optionally be
configured by the function caller?

Also, please note that you can specify a smaller step on runtime, by
initializing TensorModel as

model = dti.TensorModel(gtab, 'LS', step=10)

The step param is passed by TensorModel to the wrapped fitting function
and the value of step overrides the default of 1e4=10000 (not 1000).

We could also reduce the default value to, say, its half. It'll reduce
peak memory usage by about a half and shouldn't seriously decrease speed.
Actually, since it can easily be set by the caller to any value, the
default value could be lowered even more.

—
Reply to this email directly or view it on GitHub
#840 (comment).

Garyfallidis · 2016-01-21T20:50:17Z

@samuelstjean I believe @dimrozakis response was satisfactory. So we should close this PR. I am not sure that an dtype parameter would solve the problem. I still don't understand why you need 12GBytes of RAM for the HCP data. Maybe most of this memory is used just to load the data and it is not related to the DTI fitting. So, the trick maybe there is to use a memmap for loading the data.

samuelstjean · 2016-01-22T10:31:59Z

It's already using a memmap, and I don't think I had the problem before.
I'll try using a smaller number of voxel, but could the 1d reshaping be
making a ton of useless background copy? I'll check this side also.
On Jan 21, 2016 21:50, "Eleftherios Garyfallidis" notifications@github.com
wrote:

@samuelstjean https://github.com/samuelstjean I believe @dimrozakis
https://github.com/dimrozakis response was satisfactory. So we should
close this PR. I am not sure that an dtype parameter would solve the
problem. I still don't understand why you need 12GBytes of RAM for the HCP
data. Maybe most of this memory is used just to load the data and it is not
related to the DTI fitting. So, the trick maybe there is to use a memmap
for loading the data.

—
Reply to this email directly or view it on GitHub
#840 (comment).

Garyfallidis · 2016-01-22T11:45:17Z

It could be. Yes, dig a bit deeper and identify at which point the memory increases. Use a memory profiler like this one https://pypi.python.org/pypi/memory_profiler that can tell you the exact line where the problem (the memory increase appears) and share the script showing the problem. In that way it will be easier to understand what is happening. Indeed it is likely that the reshape is copying memory from the memory map.

Garyfallidis · 2016-01-29T18:39:57Z

@samuelstjean and @arokem do you have now a clear picture of what creates the memory increase or still investigating?

arokem · 2016-01-29T18:59:05Z

Haven't had time to look at it more deeply. Got distracted by #762 last
week.

TBH, I am not even sure there is a problem.

For example, see this memory profiling experiment:
https://gist.github.com/arokem/648909e1943011de661b

I don't really see any dramatic increase in memory as we go from looping
over individual voxels (step=1) to using the default setting (step=10000).
I would have to see evidence to the contrary, to be convinced that the
previous problems observed aren't due to user error.

But I wouldn't mind reducing the default to 1000 or so, to be on the safe
side.

NOTE: I was definitely wrong about that reshaping. This couldn't have
anything to do with any new memory issues, because that reshape has been
there since a very long time. Well before the recent multi-voxel
improvements.

On Fri, Jan 29, 2016 at 10:39 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

@samuelstjean https://github.com/samuelstjean and @arokem
https://github.com/arokem do you have now a clear picture of what
creates the memory increase or still investigating?

—
Reply to this email directly or view it on GitHub
#840 (comment).

Garyfallidis · 2016-01-29T19:04:07Z

Have you used this with big data? HCP big?

arokem · 2016-01-29T19:09:10Z

Not yet. But there it is -- download it, run it on the data that was
causing you trouble and report back, please.

On Fri, Jan 29, 2016 at 11:04 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Have you used this with big data? HCP big?

—
Reply to this email directly or view it on GitHub
#840 (comment).

arokem · 2016-01-29T19:25:43Z

Is the CENIR data big enough for you? I am running that right now.

On Fri, Jan 29, 2016 at 11:08 AM, Ariel Rokem arokem@gmail.com wrote:

Not yet. But there it is -- download it, run it on the data that was
causing you trouble and report back, please.

On Fri, Jan 29, 2016 at 11:04 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Have you used this with big data? HCP big?

—
Reply to this email directly or view it on GitHub
#840 (comment).

arokem · 2016-01-29T19:26:47Z

[image: Inline image 1]

On Fri, Jan 29, 2016 at 11:25 AM, Ariel Rokem arokem@gmail.com wrote:

Is the CENIR data big enough for you? I am running that right now.

On Fri, Jan 29, 2016 at 11:08 AM, Ariel Rokem arokem@gmail.com wrote:

Not yet. But there it is -- download it, run it on the data that was
causing you trouble and report back, please.

On Fri, Jan 29, 2016 at 11:04 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Have you used this with big data? HCP big?

—
Reply to this email directly or view it on GitHub
#840 (comment).

Garyfallidis · 2016-01-29T19:45:22Z

Cenir is fine @arokem !

@samuelstjean this time and next time please give us more specific information about where the problem appears. I feel we are replicating effort because you are not giving us enough information. What did you get from the memory profiler? Which lines create the problem? Be specific and pedantic please.

arokem · 2016-01-29T21:05:30Z

I don't see anything too alarming here: https://gist.github.com/arokem/06717f7b7336429be38f

Slight increase in memory is to be expected, but it's definitely not out of control.

arokem · 2016-01-29T21:05:56Z

Please run this on other systems. There might be something idiosyncratic about my mac!

samuelstjean · 2016-01-29T21:49:04Z

Ask mic, he had the same problem long before on a single simulated voxel.
It might happen because of really unusual data, such as a memmap with a
mask as a numpy array which has a different striding. This would trigger
because of whatever on indexing, maybe during the tensor predict. Memmap
just don't play well on regular operations with array as I have found out
lately.

Cenir dataset probably wont cut it, I'm using an hcp and they have a weird
memory layout.
On Jan 29, 2016 22:06, "Ariel Rokem" notifications@github.com wrote:

Please run this on other systems. There might be something idiosyncratic
about my mac!

—
Reply to this email directly or view it on GitHub
#840 (comment).

MrBago · 2016-01-29T22:12:51Z

This isn't the right solution here, If there is a way to re-structure the
code to reduce copies that's obviously preferred, but in the mean time I
suggest making the chuck size configurable so it can be adjusted for
different dataset and computers.

On Fri, Jan 29, 2016 at 1:49 PM Samuel St-Jean notifications@github.com
wrote:

Ask mic, he had the same problem long before on a single simulated voxel.
It might happen because of really unusual data, such as a memmap with a
mask as a numpy array which has a different striding. This would trigger
because of whatever on indexing, maybe during the tensor predict. Memmap
just don't play well on regular operations with array as I have found out
lately.

Cenir dataset probably wont cut it, I'm using an hcp and they have a weird
memory layout.
On Jan 29, 2016 22:06, "Ariel Rokem" notifications@github.com wrote:

Please run this on other systems. There might be something idiosyncratic
about my mac!

—
Reply to this email directly or view it on GitHub
#840 (comment).

—
Reply to this email directly or view it on GitHub
#840 (comment).

MrBago · 2016-01-29T22:15:16Z

Sorry I somehow missed some of the comments. I see that step is already
configurable.

On Fri, Jan 29, 2016 at 2:12 PM Bago mrbago@gmail.com wrote:

This isn't the right solution here, If there is a way to re-structure the
code to reduce copies that's obviously preferred, but in the mean time I
suggest making the chuck size configurable so it can be adjusted for
different dataset and computers.

On Fri, Jan 29, 2016 at 1:49 PM Samuel St-Jean notifications@github.com
wrote:

Ask mic, he had the same problem long before on a single simulated voxel.
It might happen because of really unusual data, such as a memmap with a
mask as a numpy array which has a different striding. This would trigger
because of whatever on indexing, maybe during the tensor predict. Memmap
just don't play well on regular operations with array as I have found out
lately.

Cenir dataset probably wont cut it, I'm using an hcp and they have a weird
memory layout.
On Jan 29, 2016 22:06, "Ariel Rokem" notifications@github.com wrote:

Please run this on other systems. There might be something idiosyncratic
about my mac!

—
Reply to this email directly or view it on GitHub
#840 (comment).

—
Reply to this email directly or view it on GitHub
#840 (comment).

samuelstjean · 2016-01-29T22:16:16Z

Apparently chunk size has nothing to do with it, which explains also why
only two of us had the problem. Between the first message and now, we
looked at it internally, but there has been something weird going on for
some time, only in rare cases.
On Jan 29, 2016 23:12, "Bago Amirbekian" notifications@github.com wrote:

This isn't the right solution here, If there is a way to re-structure the
code to reduce copies that's obviously preferred, but in the mean time I
suggest making the chuck size configurable so it can be adjusted for
different dataset and computers.

On Fri, Jan 29, 2016 at 1:49 PM Samuel St-Jean notifications@github.com
wrote:

Ask mic, he had the same problem long before on a single simulated voxel.
It might happen because of really unusual data, such as a memmap with a
mask as a numpy array which has a different striding. This would trigger
because of whatever on indexing, maybe during the tensor predict. Memmap
just don't play well on regular operations with array as I have found out
lately.

Cenir dataset probably wont cut it, I'm using an hcp and they have a
weird
memory layout.
On Jan 29, 2016 22:06, "Ariel Rokem" notifications@github.com wrote:

Please run this on other systems. There might be something
idiosyncratic
about my mac!

—
Reply to this email directly or view it on GitHub
#840 (comment).

—
Reply to this email directly or view it on GitHub
#840 (comment).

—
Reply to this email directly or view it on GitHub
#840 (comment).

samuelstjean · 2016-02-01T09:43:42Z

Found it, its in the predict method, can't %memit as memory explodes before then. So something weird is indeed happening in there, as in this line it fill up pretty quickly. The only weird stuff is mixing mmap with np array

bvals, bvecs = read_bvals_bvecs('b3000.bvals', 'b3000.bvecs')
gtab = gradient_table(bvals, bvecs, b0_threshold=5)
tenmodel = TensorModel(gtab, fit_method='WLS')
tenfit = tenmodel.fit(data, mask)
S0 = np.mean(data[..., gtab.b0s_mask], -1)
%memit data_p = tenfit.predict(gtab, S0)

Could be due to that, but tenfit.predict does not use data, so could it be a problem with S0?

In [8]: data.flags
Out[8]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

In [9]: mask.flags
Out[9]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False

samuelstjean · 2016-02-01T14:17:01Z

Oh yeah the screenshot is taken directly during the %memit, as I had to kill it because it jumps to around 18 go of ram total, starting at around 2 go and just going up until my whole computer freezes.

arokem · 2016-02-01T15:33:39Z

OK -- that actually make sense, considering that we're doing the
multiplication in one fell swoop over the entire array at once here:

https://github.com/nipy/dipy/blob/master/dipy/reconst/dti.py#L683

We should probably loop over segments in a similar manner to what we do on
the fitting side.

On Mon, Feb 1, 2016 at 6:17 AM, Samuel St-Jean notifications@github.com
wrote:

Oh yeah the screenshot is taken directly during the %memit, as I had to
kill it because it jumps to around 18 go of ram total, starting at around 2
go and just going up until my whole computer freezes.

—
Reply to this email directly or view it on GitHub
#840 (comment).

samuelstjean · 2016-02-01T15:44:26Z

Still wondering why I am the only one with problems on an hcp dataset, I
can't imagine nobody tried to predict signals on a large dataset before me.
And also we first investigated that internally and I am the only one with
the problem, so there might be something more complicated happening under
the hood, but feel free to check on your side with a different hcp subject.

2016-02-01 16:33 GMT+01:00 Ariel Rokem notifications@github.com:

OK -- that actually make sense, considering that we're doing the
multiplication in one fell swoop over the entire array at once here:

https://github.com/nipy/dipy/blob/master/dipy/reconst/dti.py#L683

We should probably loop over segments in a similar manner to what we do on
the fitting side.

On Mon, Feb 1, 2016 at 6:17 AM, Samuel St-Jean notifications@github.com
wrote:

Oh yeah the screenshot is taken directly during the %memit, as I had to
kill it because it jumps to around 18 go of ram total, starting at
around 2
go and just going up until my whole computer freezes.

—
Reply to this email directly or view it on GitHub
#840 (comment).

—
Reply to this email directly or view it on GitHub
#840 (comment).

arokem · 2016-02-03T13:21:16Z

Closing this in favor of #857

arokem mentioned this pull request Feb 2, 2016

DTI memory: use the same step in prediction as you use in fitting. #857

Merged

arokem closed this Feb 3, 2016

samuelstjean deleted the patch-4 branch February 3, 2016 13:35

arokem mentioned this pull request Feb 11, 2016

NF: Free water tensor model #835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dti reconst less memory hungry #840

Make dti reconst less memory hungry #840

samuelstjean commented Jan 19, 2016

jchoude commented Jan 19, 2016

Garyfallidis commented Jan 19, 2016

Garyfallidis commented Jan 19, 2016

samuelstjean commented Jan 20, 2016

samuelstjean commented Jan 20, 2016

dimrozakis commented Jan 20, 2016

samuelstjean commented Jan 20, 2016

Garyfallidis commented Jan 21, 2016

samuelstjean commented Jan 22, 2016

Garyfallidis commented Jan 22, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

samuelstjean commented Jan 29, 2016

MrBago commented Jan 29, 2016

MrBago commented Jan 29, 2016

samuelstjean commented Jan 29, 2016

samuelstjean commented Feb 1, 2016

samuelstjean commented Feb 1, 2016

arokem commented Feb 1, 2016

samuelstjean commented Feb 1, 2016

arokem commented Feb 3, 2016

Make dti reconst less memory hungry #840

Make dti reconst less memory hungry #840

Conversation

samuelstjean commented Jan 19, 2016

jchoude commented Jan 19, 2016

Garyfallidis commented Jan 19, 2016

Garyfallidis commented Jan 19, 2016

samuelstjean commented Jan 20, 2016

samuelstjean commented Jan 20, 2016

dimrozakis commented Jan 20, 2016

samuelstjean commented Jan 20, 2016

Garyfallidis commented Jan 21, 2016

samuelstjean commented Jan 22, 2016

Garyfallidis commented Jan 22, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

Garyfallidis commented Jan 29, 2016

arokem commented Jan 29, 2016

arokem commented Jan 29, 2016

samuelstjean commented Jan 29, 2016

MrBago commented Jan 29, 2016

MrBago commented Jan 29, 2016

samuelstjean commented Jan 29, 2016

samuelstjean commented Feb 1, 2016

samuelstjean commented Feb 1, 2016

arokem commented Feb 1, 2016

samuelstjean commented Feb 1, 2016

arokem commented Feb 3, 2016