Estimate S0 from data (DTI) #1036

arokem · 2016-04-18T18:58:05Z

See conversation: https://mail.python.org/pipermail/neuroimaging/2016-March/000853.html

Needs more thinking/testing.

arokem · 2016-04-25T23:23:17Z

This is now ready for a review.

I have also profiled this with HCP data. The memory usage and time it takes with these changes is essentially identical to performance of the code we have on master.

arokem · 2016-04-26T00:15:38Z

I stand corrected: I reran the benchmark and it (not too suprising) takes longer with the changes. Fitting the model for the entire WM of a single HCP brain takes 1.5 minute without this change, and about 4.5 minutes with this change. So, it's a 3-fold slowdown, but it's not like it becomes prohibitively slow.

[EDIT]: Still no change in memory consumption.

arokem · 2016-05-04T17:19:32Z

One more thought: if people are worried about the extra time, we can make this an option, rather than the default. Anyone have thoughts on that?

Garyfallidis · 2016-05-05T18:55:57Z

Hi @arokem. Can you explain what creates the change in execution time? And why is that expected?

arokem · 2016-05-05T22:29:21Z

OK - turns out that my timing might have been slightly off. I ran a line profiler on this to see where the added time comes from. Here are the results on master:

Timer unit: 1e-06 s

Total time: 80.1795 s
File: /Users/arokem/source/dipy/dipy/reconst/dti.py
Function: fit at line 768

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   768                                               def fit(self, data, mask=None):
   769                                                   """ Fit method of the DTI model class
   770                                           
   771                                                   Parameters
   772                                                   ----------
   773                                                   data : array
   774                                                       The measured signal from one voxel.
   775                                           
   776                                                   mask : array
   777                                                       A boolean array used to mark the coordinates in the data that
   778                                                       should be analyzed that has the shape data.shape[:-1]
   779                                           
   780                                                   """
   781         1            7      7.0      0.0          if mask is None:
   782                                                       # Flatten it to 2D either way:
   783         1      1454744 1454744.0      1.8              data_in_mask = np.reshape(data, (-1, data.shape[-1]))
   784                                                   else:
   785                                                       # Check for valid shape of the mask
   786                                                       if mask.shape != data.shape[:-1]:
   787                                                           raise ValueError("Mask is not the same shape as data.")
   788                                                       mask = np.array(mask, dtype=bool, copy=False)
   789                                                       data_in_mask = np.reshape(data[mask], (-1, data.shape[-1]))
   790                                           
   791         1            4      4.0      0.0          if self.min_signal is None:
   792         1      1881512 1881512.0      2.3              min_signal = _min_positive_signal(data)
   793                                                   else:
   794                                                       min_signal = self.min_signal
   795                                           
   796         1       295767 295767.0      0.4          data_in_mask = np.maximum(data_in_mask, min_signal)
   797         1            3      3.0      0.0          params_in_mask = self.fit_method(self.design_matrix, data_in_mask,
   798         1     76547431 76547431.0     95.5                                           *self.args, **self.kwargs)
   799                                           
   800         1            2      2.0      0.0          if mask is None:
   801         1            2      2.0      0.0              out_shape = data.shape[:-1] + (-1, )
   802         1            3      3.0      0.0              dti_params = params_in_mask.reshape(out_shape)
   803                                                   else:
   804                                                       dti_params = np.zeros(data.shape[:-1] + (12,))
   805                                                       dti_params[mask, :] = params_in_mask
   806                                           
   807         1            7      7.0      0.0          return TensorFit(self, dti_params)

And with this branch:

Timer unit: 1e-06 s

Total time: 106.631 s
File: /Users/arokem/source/dipy/dipy/reconst/dti.py
Function: fit at line 771

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   771                                               def fit(self, data, mask=None):
   772                                                   """ Fit method of the DTI model class
   773                                           
   774                                                   Parameters
   775                                                   ----------
   776                                                   data : array
   777                                                       The measured signal from one voxel.
   778                                           
   779                                                   mask : array
   780                                                       A boolean array used to mark the coordinates in the data that
   781                                                       should be analyzed that has the shape data.shape[:-1]
   782                                           
   783                                                   """
   784         1            5      5.0      0.0          if mask is None:
   785                                                       # Flatten it to 2D either way:
   786         1      1454305 1454305.0      1.4              data_in_mask = np.reshape(data, (-1, data.shape[-1]))
   787                                                   else:
   788                                                       # Check for valid shape of the mask
   789                                                       if mask.shape != data.shape[:-1]:
   790                                                           raise ValueError("Mask is not the same shape as data.")
   791                                                       mask = np.array(mask, dtype=bool, copy=False)
   792                                                       data_in_mask = np.reshape(data[mask], (-1, data.shape[-1]))
   793                                           
   794         1            5      5.0      0.0          if self.min_signal is None:
   795         1      1868442 1868442.0      1.8              min_signal = _min_positive_signal(data)
   796                                                   else:
   797                                                       min_signal = self.min_signal
   798                                           
   799         1       214406 214406.0      0.2          data_in_mask = np.maximum(data_in_mask, min_signal)
   800         1            4      4.0      0.0          params_in_mask = self.fit_method(self.design_matrix, data_in_mask,
   801         1     93734337 93734337.0     87.9                                           *self.args, **self.kwargs)
   802                                           
   803         1            9      9.0      0.0          if mask is None:
   804         1            6      6.0      0.0              out_shape = data.shape[:-1] + (-1, )
   805         1            7      7.0      0.0              dti_params = params_in_mask.reshape(out_shape)
   806                                                   else:
   807                                                       dti_params = np.zeros(data.shape[:-1] + (12,))
   808                                                       dti_params[mask, :] = params_in_mask
   809                                           
   810         1            7      7.0      0.0          evecs = dti_params[..., 3:12].reshape((dti_params.shape[:-1] + (3, 3)))
   811         1            3      3.0      0.0          evals = dti_params[..., :3]
   812         1       148007 148007.0      0.1          qform = vec_val_vect(evecs, evals)
   813                                           
   814         1          528    528.0      0.0          sphere = Sphere(xyz=self.gtab.bvecs[~self.gtab.b0s_mask])
   815         1      3808902 3808902.0      3.6          ADC = apparent_diffusion_coef(qform, sphere)
   816                                           
   817         1       119189 119189.0      0.1          S0_hat = np.mean(data[..., ~self.gtab.b0s_mask] /
   818         1      5077759 5077759.0      4.8                           np.exp(-self.gtab.bvals[~self.gtab.b0s_mask] * ADC),
   819         1       205139 205139.0      0.2                           -1)
   820         1           23     23.0      0.0          return TensorFit(self, dti_params, S0_hat=S0_hat)

arokem · 2016-05-05T22:31:46Z

So it looks like it's adding about 25% to the computation time (this is a whole brain data-set by the way -- the Stanford HARDI set), and these are spent mostly in computing the ADC and in the computation of S0_hat, on line 817-819.

arokem · 2016-05-05T22:38:50Z

Any ideas? I don't see how these calculations could be speeded up, but I might be missing something.

arokem · 2016-05-09T14:37:49Z

@Garyfallidis: Any further thoughts about this?

Garyfallidis · 2016-05-09T16:08:59Z

There is something which is not clear. You said that your timing is slightly off but before you said that there is 3 fold time increase and now that there is a 25% increase. This is a large difference. Can you please explain? Are we now from 300% to 25% increase?

arokem · 2016-05-09T22:14:23Z

The two things that were different:

Different data (HCP on the first timing, Stanford HARDI on the second).
Different timing mechanisms (%%timeit magic on first timing and
line_profiler on the second).

I can set up a program to run timing on these things with the standard lib
timing, and collect a sample of runs, just in case this varies a lot (which
would surprise me, but there you have it). But are there any more things I
should check? Assuming the worst-case scenario (3-fold slow down: 5 minutes
instead of 90 seconds), do you have any ideas on how to speed this up? If
there's nothing we can do about it, maybe it's pointless going through an
even more careful profiling of the code?

On Mon, May 9, 2016 at 9:08 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

There is something which is not clear. You said that your timing is
slightly off but before you said that there is 3 fold time increase and now
that there is a 25% increase. This is a large difference. Can you please
explain?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1036 (comment)

Garyfallidis · 2016-05-17T22:09:21Z

Hi, this is interesting. What we need to understand is what makes the HCP datasets slower. So, we need to run the same profiling strategy for both datasets.
I would also put the memory profiler to run on both datasets too. We may discover something unexpected that we haven't realized before or that we forgot about. I know that with big datasets like HCP some times there is too much swapping if the system doesn't have enough memory but this is a perfect case to actually study this in close distance. I think after having the memory/cpu profiling for both datasets then we can move with the merge. This function will be used by many other models so lets take our time and be extra careful and proactive. Thanks for working on this. It's a cool addition. GGT!!!

arokem · 2016-05-17T22:12:52Z

Yep. I will get back to this, and do something more comprehensive, but not
this week...

On Tue, May 17, 2016 at 3:09 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi, this is interesting. What we need to understand is what makes the HCP
datasets slower. So, we need to run the same profiling strategy for both
datasets.

I would also put the memory profiler to run on both datasets too. We may
discover something unexpected that we haven't realized before or that we
forgot about. I know that with big datasets like HCP some times there is
too much swapping if the system doesn't have enough memory but this is a
perfect case to actually study this in close distance. I think after having
the memory/cpu profiling for both datasets then we can move with the merge.
This function will be used by many other models so lets take our time and
be extra careful and proactive. Thanks for working on this. It's a cool
addition. GGT!!!

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1036 (comment)

Garyfallidis · 2016-05-17T22:16:24Z

Sure, np.

Garyfallidis · 2016-07-14T17:28:31Z

Hi @arokem I would like to merge this. But first I need the analysis on what creates the slowdown and hopefully a fix that resolves the problem. Are you on it? Or this PR is replaced by another one?

arokem · 2016-07-15T00:28:24Z

Hey Eleftherios - the problem is that, even in the best case, there is no
way for this not to incur memory and time.

So - on further thought, and after some conversations with @RafaelNH about
this, I think that the best approach is to delegate this to models that
require an estimate of S0, to do it on their own in their fit methods.
That's what @sahmed95 is implementing in the IVIM model, and I believe is
also what's going on in the free water DTI implementation.

I propose that - unless someone objects - we close this for now.

On Thu, Jul 14, 2016 at 12:28 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi @arokem https://github.com/arokem I would like to merge this. But
first I need the analysis on what creates the slowdown and hopefully a fix
that resolves the problem. Are you on it? Or this PR is replaced by another
one?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1036 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAHPNlg_nqPYtDlOwCMvkEYlk0QpdAJ3ks5qVnG_gaJpZM4IKBFy
.

arokem · 2016-10-21T02:39:27Z

Sorry -- I should have closed this one a while back. Closing now...

arokem force-pushed the dti-s0-hat branch from 4893f23 to 4cf34c0 Compare April 18, 2016 19:04

arokem added 2 commits April 25, 2016 15:41

PEP8 in dti.

f265944

NF: First pass implementation of S0 estimation from data.

f75b47e

Needs more thinking/testing.

arokem force-pushed the dti-s0-hat branch from 4cf34c0 to f75b47e Compare April 25, 2016 22:41

arokem added 2 commits April 25, 2016 15:44

Test that The estimate of S0 is accurate.

d0ae790

Add some testing using S0_hat.

3b673eb

arokem changed the title ~~WIP: Estimate S0 from data (DTI)~~ Estimate S0 from data (DTI) Apr 25, 2016

arokem mentioned this pull request May 5, 2016

NF: Free water tensor model #835

Merged

arokem added this to the 0.12 milestone May 21, 2016

arokem referenced this pull request Jun 16, 2016

f_guess = 1 - S/S_

f8a89a4

arokem closed this Oct 21, 2016

arokem mentioned this pull request Nov 11, 2016

Actual S0 for DTI data #1148

Closed

etpeterson mentioned this pull request Dec 10, 2016

Return S0 value for DTI fits #1162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate S0 from data (DTI) #1036

Estimate S0 from data (DTI) #1036

arokem commented Apr 18, 2016

arokem commented Apr 25, 2016

arokem commented Apr 26, 2016 •

edited

arokem commented May 4, 2016

Garyfallidis commented May 5, 2016

arokem commented May 5, 2016 •

edited

arokem commented May 5, 2016

arokem commented May 5, 2016

arokem commented May 9, 2016

Garyfallidis commented May 9, 2016 •

edited

arokem commented May 9, 2016

Garyfallidis commented May 17, 2016

arokem commented May 17, 2016

Garyfallidis commented May 17, 2016

Garyfallidis commented Jul 14, 2016

arokem commented Jul 15, 2016

arokem commented Oct 21, 2016

Estimate S0 from data (DTI) #1036

Estimate S0 from data (DTI) #1036

Conversation

arokem commented Apr 18, 2016

arokem commented Apr 25, 2016

arokem commented Apr 26, 2016 • edited

arokem commented May 4, 2016

Garyfallidis commented May 5, 2016

arokem commented May 5, 2016 • edited

arokem commented May 5, 2016

arokem commented May 5, 2016

arokem commented May 9, 2016

Garyfallidis commented May 9, 2016 • edited

arokem commented May 9, 2016

Garyfallidis commented May 17, 2016

arokem commented May 17, 2016

Garyfallidis commented May 17, 2016

Garyfallidis commented Jul 14, 2016

arokem commented Jul 15, 2016

arokem commented Oct 21, 2016

arokem commented Apr 26, 2016 •

edited

arokem commented May 5, 2016 •

edited

Garyfallidis commented May 9, 2016 •

edited