New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Fitting Overflows Memory #763

Closed
bcmcpher opened this Issue Nov 5, 2015 · 8 comments

Comments

Projects
None yet
4 participants
@bcmcpher

bcmcpher commented Nov 5, 2015

I'm running a updated clone of dipy from current master (0.10.0-dev I believe) to fit a tensor model in python 2.7.10:

from dipy.data import read_stanford_hardi
import dipy.reconst.dti as dti

img, gtab = read_stanford_hardi()
dti_model = dti.TensorModel(gtab)
data = img.get_data()
dti_fit = dti_model.fit(data)

I have 8GB system RAM and 8GB swap, all of which fills up before the process dies. Something appears to have changed in how memory is utilized, as it didn't used to do this.

This is on a laptop running Linux Debian Stable (8.2) with the kernel from backports (4.2.0-0.bpo.1-amd64 #1 SMP Debian 4.2.5-1~bpo8+1 (2015-11-02) x86_64 GNU/Linux) and Continuum conda 3.18.3.

Let me know if you need any other information.

@arokem

This comment has been minimized.

Member

arokem commented Nov 5, 2015

Could this be related to #727

@arokem

This comment has been minimized.

Member

arokem commented Nov 5, 2015

Any thoughts about this @dimrozakis ?

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Nov 5, 2015

In #727 all fitting operations were vectorized, bringing a great increase in speed. But when all operations are performed on all voxels at the same time, then any temporary arrays created are a lot larger than when iterating over single voxels. To work around this, one can split the large input array into a number of chunks and apply the vectorized fitting function by iterating on those chunks. If the number of chunks is considerably less than the total number of voxels, then there's no significant decrease in speed, but the temporary memory consumption decreases a lot. I've written something like that in advantis-io@c080e89 . @arokem should I open a PR?

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Nov 5, 2015

@bcmcpher would you like to merge advantis-io@c080e89 and tell us if this fixes your out of memory issue?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Nov 5, 2015

@dimrozakis indeed you need to use chunks to be memory efficient with this. Dimitri can you provide us with your current memory consumption in your system with and without using chunks and also the speedup gain in comparison to the initial version? It would be great to have all this information so we can decide how to move forward with this. Thx in advance.

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Nov 6, 2015

With the stanford hardi dataset (81, 106, 76, 160):

Before PR #727 (0503c72):

Completed in 173.94 secs.
Memory usage: 897.9 MiB.

Master:

Memory Error

Master with c080e89 merged:

Completed in 113.81 secs.
Memory usage: 898.5 MiB
@bcmcpher

This comment has been minimized.

bcmcpher commented Nov 6, 2015

This appears to fix the issue - my system memory usage for the same run stays right around 2GB at peak (observed on htop) with c080e89

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Nov 6, 2015

Okay good. @dimrozakis please make a PR with the fix. We need to resolve this before the release. We should try this later with an even larger dataset for example HCP data. And maybe add a switch variable so people can disable it if too much memory is created. Preserving memory is more important than speed for DTI. DTI should be able to run in systems with modest RAM size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment