New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSD fit issue #340

Merged
merged 9 commits into from Jul 30, 2014

Conversation

Projects
None yet
5 participants
@MrBago
Contributor

MrBago commented Mar 21, 2014

I believe there is a bug in the CSD code, it seems that we're fitting on spherical harmonic coefficients instead of the data in csdeconv. I've made a gist here which allows you to compare the results of fitting using the two approaches. The fix is pretty easy, but the documentation should also be update to reflect the fix. Also there are a few other functions in this module that might need a similar fix and we should write some tests for this. I don't have time to work on this any more tonight and I'd love some help if anyone feels like jumping in.

@mdesco

This comment has been minimized.

Contributor

mdesco commented Mar 23, 2014

Hi all,

I will try to have a look at this in the next few days.

Max

On Fri, Mar 21, 2014 at 3:52 AM, MrBago notifications@github.com wrote:

I believe there is a bug in the CSD code, it seems that we're fitting on
spherical harmonic coefficients instead of the data in csdeconv. I've
made a gist here https://gist.github.com/MrBago/9681461 which allows
you to compare the results of fitting using the two approaches. The fix is
pretty easy, but the documentation should also be update to reflect the
fix. Also there are a few other functions in this module that might need a
similar fix and we should write some tests for this. I don't have time to
work on this any more tonight and I'd love some help if anyone feels like

jumping in.

You can merge this Pull Request by running

git pull https://github.com/MrBago/dipy bf_csd_fits

Or view, comment on, or merge it at:

#340
Commit Summary

  • BF - csd model is fitting coefficients instead of data

File Changes

  • M dipy/reconst/csdeconv.pyhttps://github.com//pull/340/files#diff-0(6)

Patch Links:

Reply to this email directly or view it on GitHubhttps://github.com//pull/340
.

Maxime Descoteaux, PhD
Professor
Sherbrooke Connectivity Imaging Laboratory (SCIL)
Centre de Recherche CHUS
Computer Science department
Sherbrooke University
2500 Boul. Université
Sherbrooke, Québec
J1K 2R1, Canada
phone: +1 819 821-8000 x66129
fax: +1 819 821-8200
http://scil.dinf.usherbrooke.ca

Scientific director
Visualization and Image Analysis Plateform (PAVI)
http://pavi.dinf.usherbrooke.ca/

@mdesco

This comment has been minimized.

Contributor

mdesco commented Mar 30, 2014

I do not believe that the csdeconv is bugged. The fix you suggest is theoretically equivalent to what is coded now. The way it is coded now, everything is done in the SH space, as indicated in the signature of the function. s_sh is the SH projection of the signal and R the fwdSH matrix.

So, s_sh = R * fodf_sh, which means that fodf_sh = leastsquare( R, s_sh), as seen in Eq. (1) of Tournier et al 2007 and as done in the first line of the code.

What you suggest is to pass to the function the signal S, i.e the spherical function (SF) of the raw diffusion-weighted images (DWI), SF_dwi. From the SH relation above (s_sh = R * fodf_sh), we can see that B_dwi * s_sh = B_dwi * R * fodf_sh, which is now the relation on the sphere. Say we are at SH order 8, s_sh is 45x1 and R is 45x1, whereas B_dwi * R is Nx45, where is N is the number of DW images.

Having said that, I had a look at your compareCsdFits.py. First, I get a Bus Error when the code reaches model16 after the warning that we have more unknowns than data points. I don't know why. I will try to test on my linux box on Monday.

Your suggestion might initialize a better fodf_sh in your example, which makes the underdetermined system better conditioned and numerically better to solve. To be honest, I have not played with the "super-resolution" part here. But for the model8 part, both the old way on the SH and the new way from the SF seems to be performing similarly.

Maybe we can have a hangout about this this week. You can have a look at my branch mdesco/bf_csd_fits where I have put comments and intermediate visualization.

@mdesco

This comment has been minimized.

Contributor

mdesco commented Mar 30, 2014

Ok. I see why I have a bus error on my side with your example at order16. I wonder why you do not have the same problem on your side.

In the init of ConstrainedSphericalDeconvModel
r_sh = np.linalg.lstsq(self.B_dwi, S_r[self._where_dwi])[0]

With 150 DW images and 153 coefficients to estimate at order16, this line crashes for me. This is a bug and should not happen.

Do you have check or a fix for this on your side?
I will try to patch this up.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Mar 30, 2014

On Sun, Mar 30, 2014 at 7:43 AM, mdesco notifications@github.com wrote:

I do not believe that the csdeconv is bugged. The fix you suggest is
theoretically equivalent to what is coded now. The way it is coded now,
everything is done in the SH space, as indicated in the signature of the
function. s_sh is the SH projection of the signal and R the fwdSH matrix.

So, s_sh = R * fodf_sh, which means that fodf_sh = leastsquare( R, s_sh),
as seen in Eq. (1) of Tournier et al 2007 and as done in the first line of
the code.

As an aside, leastsquare( R, s_sh) == s_sh / R.diagonal() since R is a
diagonal matrix.

I believe the problem is that the two are only theoretically equivalent
when there is no noise. I believe doing a least squares fit on the singal
assumes homoscedastic noise in the diffusion signal, while fitting in the
SH space assumes homoscedastic noise in the SH coefficients. I don't think
these assumptions are equivalent, but maybe I'm wrong.

What you suggest is to pass to the function the signal S, i.e the spherical

function (SF) of the raw diffusion-weighted images (DWI), SF_dwi. From the
SH relation above (s_sh = R * fodf_sh), we can see that B_dwi * s_sh =
B_dwi * R * fodf_sh, which is now the relation on the sphere. Say we are at
SH order 8, s_sh is 45x1 and R is 45x1, whereas B_dwi * R is Nx45, where is
N is the number of DW images.

Having said that, I had a look at your compareCsdFits.py. First, I get a
Bus Error when the code reaches model16 after the warning that we have more
unknowns than data points. I don't know why. I will try to test on my linux
box on Monday.

Your suggestion might initialize a better fodf_sh in your example, which
makes the underdetermined system better conditioned and numerically better
to solve. To be honest, I have not played with the "super-resolution" part
here. But for the model8 part, both the old way on the SH and the new way
from the SF seems to be performing similarly.

I'll take a closer look at this, I believe the initialization should be the
same.

Maybe we can have a hangout about this this week. You can have a look at
my branch mdesco/bf_csd_fits where I have put comments and intermediate
visualization.

This week might not work for me, but I'll take a look at your branch and I
should be able to hangout next week if you're available.

Reply to this email directly or view it on GitHubhttps://github.com//pull/340#issuecomment-39027075
.

@JianCheng

This comment has been minimized.

JianCheng commented Mar 30, 2014

When I implemented CSD in my c++ code, I have thought about it. I partially agree @MrBago .

SD models solve the least square

$\min \| B_sh * R * fodf_sh - S  \|^2$, 

where S is the signal vector, B_sh is SH basis matrix, fodf_SH is fODF coefficient vector.
There is another form

$\min \|  R * fodf_sh - B_sh^{+} *S  \|^2$, 

where $B_sh^{+}$ is the pseudo-inverse of $B_sh$. This form is @mdesco suggested.

Note that if least square is used, these two forms are equivalent if

  1. if the sampling in the single shell is uniform and dense, where $B_sh^T * B_sh = I$, i.e. $B_sh$ has orthonormal columns, $B_sh^{+} = B_sh^T$.
    Note even if there is some noise, these two forms are still equivalent if columns of $B_sh$ are orthonormal (dense and uniform sampling). That is for least square in filter SD (FSD) by Tournier 2004.

Note when considering the regularization in CSD $\lambda \neq 0$, we have to add another condition
2. S can be represented by columns of B_sh without error, i.e. S has no higher frequency,
That is because if S has higher frequency, then these two forms of least square have different residuals which have different weight in the regularization term in CSD.
So in CSD these two forms are equivalent if columns of $B_sh$ are orthonormal (dense and uniform sampling) and S can be fully represented by $B_sh$.
However, if the sampling is not uniform and dense or if S has high frequency, they are not equivalent.

Basically I think the first form which works directly on signals is better, because the first form needs one step estimation and the second form has two steps. Intuitively the less steps we use, the less error we introduce. The estimation of sh coefficients of S always has some error, i.e. S may have high frequency.

@mdesco

This comment has been minimized.

Contributor

mdesco commented Mar 31, 2014

I agree with @JianCheng. Thanks for your comment.

@MrBago, the function seems more robust to noise in this example with your implementation when used with super-resolution.

I will try to push something this week to unify all of this.

@JianCheng

This comment has been minimized.

JianCheng commented Mar 31, 2014

@mdesco what I wrote yesterday is not rigorous. sorry. I have updated it.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Mar 31, 2014

Since we're rooting in this code anyways, how do you guys feel about making lambda**2 proportional to N_data / N_sphere_points? This normalization for lambda will mean that repeating a data set and/or repeating a sphere will not change the fit. Let me know if you'd like an example to demonstrate what I mean.

@MrBago MrBago changed the title from BF - csd model is fitting coefficients instead of data to CSD fit issue Apr 11, 2014

@arokem

This comment has been minimized.

Member

arokem commented May 1, 2014

What's the status on this one? Concerning @MrBago's recent comment - I am against that suggestion. I think that users can do that themselves (normalize lambda**2) if they want to. It might be more confusing than helpful if it's tucked in somewhere in the code.

@arokem arokem closed this May 1, 2014

@arokem arokem reopened this May 1, 2014

@arokem

This comment has been minimized.

Member

arokem commented May 1, 2014

Sorry. Wrong button.

@MrBago

This comment has been minimized.

Contributor

MrBago commented May 1, 2014

@arokem sorry I think I should have been more clear, currently lambda is scaled to N_data / N_sphere_points. This means that if I use a sphere that's twice as big, even if I simply repeat all the points twice, I get a slightly different fit. Since we're normalizing anyways, I would prefer to normalize so that lambda ** 2, not lambda, is proportional to N_data / N_sphere_points, that way if I repeat all my points twice I get exactly the same answer as I would have with the smaller sphere.

@MrBago

This comment has been minimized.

Contributor

MrBago commented May 1, 2014

Also @mdesco what is the status of this? Can I help?

@arokem

This comment has been minimized.

Member

arokem commented May 1, 2014

OK - gotcha. That makes sense, but would break backwards compatibility with
0.7, no?

On Thu, May 1, 2014 at 10:34 AM, MrBago notifications@github.com wrote:

Also @mdesco https://github.com/mdesco what is the status of this? Can
I help?


Reply to this email directly or view it on GitHubhttps://github.com//pull/340#issuecomment-41933752
.

@MrBago

This comment has been minimized.

Contributor

MrBago commented May 1, 2014

It depends on what you mean by backwards compatibility, .7 code would still run but give a slightly different answer.

But we're doing that anyways because we're changing how the least squares is done. I think the fix to the least squares is important and trumps backwards compatibility, and because the result is going to be slightly different anyways I would prefer to do the lambda fix now that way we only need to have a compatibility issue once.

@arokem

This comment has been minimized.

Member

arokem commented May 1, 2014

Makes sense too. Onwards and upwards. Let's wait and see what @mdesco says.

On Thu, May 1, 2014 at 10:38 AM, MrBago notifications@github.com wrote:

It depends on what you mean by backwards compatibility, .7 code would
still run but give a slightly different answer.

But we're doing that anyways because we're changing how the least squares
is done. I think the fix to the least squares is important and trumps
backwards compatibility, and because the result is going to be slightly
different anyways I would prefer to do the lambda fix now that way we only
need to have a compatibility issue once.


Reply to this email directly or view it on GitHubhttps://github.com//pull/340#issuecomment-41934192
.

@mdesco

This comment has been minimized.

Contributor

mdesco commented May 2, 2014

Hi guys, I need 3-4 hours to sit down and think about this one. With the end of term and preparing ISMRM with my gang, it's really hard for me to find this time.

@MrBago, will you be at ISMRM? If so, lets bloc some time there to finish it. If not, lets bloc a hangout anyway to discuss this issue in person. Maybe @Garyfallidis can join also while we are at ISMRM.

Sorry about the delay...

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jun 18, 2014

How are things progressing with this one?

@MrBago

This comment has been minimized.

Contributor

MrBago commented Jun 18, 2014

I got a bit stuck on this, sorry I didn't update you guys with what I've done. Here is what's been done so far:

  • I've updated the CSD model to use dwi signals to fit coefficients instead of the first fitting SH functions to the signal.
  • I've added a test to show that this produces better FODs.
  • I've updated the scaling of lambda in the CSD model so that the same lambda gives similar results regardless of the regularization sphere size.
  • I've added a test for lambda scaling and a test to ensure that the csdfit using all defaults did not change because of the new scaling.
  • I've updated variable names and some docs.

Still to do:

  • I got stuck when I tried to apply this change to the ODF deconvolution model. I'm less familiar with this model and my first pass at applying the change made it pretty much the same as the CSD deconvolution model. If someone could take a look the ODF deconvolution it would be really helpful.
  • Lamba is still being scaled with the number of SH coefficients, does that sound right? I'd like to check on that before we merge this.
  • Can someone go over the docs and make sure I didn't miss anything.
@MrBago

This comment has been minimized.

Contributor

MrBago commented Jun 18, 2014

One more thing, I've been testing this on a few different systems. And it seems that the FOD values are somewhat system dependent. Fitting the csd model on a noise free simulation on a system seems to give the same result but the results from different systems might be slightly different. Does anyone know where this might be coming from?

@mdesco

This comment has been minimized.

Contributor

mdesco commented Jun 20, 2014

Hey Bago,

I have been behind on this one too.

On Wed, Jun 18, 2014 at 6:51 PM, MrBago notifications@github.com wrote:

I got a bit stuck on this, sorry I didn't update you guys with what I've
done. Here is what's been done so far:

  • I've updated the CSD model to use dwi signals to fit coefficients
    instead of the first fitting SH functions to the signal.
  • I've added a test to show that this produces better FODs.
  • I've updated the scaling of lambda in the CSD model so that the same
    lambda gives similar results regardless of the regularization sphere size.
  • I've added a test for lambda scaling and a test to ensure that the
    csdfit using all defaults did not change because of the new scaling.
  • I've updated variable names and some docs.

This is great.

Still to do:

  • I got stuck when I tried to apply this change to the ODF
    deconvolution model. I'm less familiar with this model and my first pass at
    applying the change made it pretty much the same as the CSD deconvolution
    model. If someone could take a look the ODF deconvolution it would be
    really helpful.

I can have a look.

  • Lamba is still being scaled with the number of SH coefficients, does
    that sound right? I'd like to check on that before we merge this.

I think so. At least, that is what I've always done. The more SH
coefficients you have, the more likely you will have spurious peaks and
negative values dues to the high order SH coefficients. But maybe we should
think of an experiment to investigate this issue. To be honest, I don't
remember if I ever did.

  • Can someone go over the docs and make sure I didn't miss anything.

    Yes. I will do that too.

Should I just checkout your branch and do this on top of yours? I plan on
having time to do this on Monday.

Also, for your system dependent variation, it is weird. It can only be the
pseudo-inverse and dot operations being computed, no?

Best,

Max

@arokem

This comment has been minimized.

Member

arokem commented Jun 20, 2014

Just a suggestion - the new cross-validation code (#335) might be useful for investigating the regularization issues. I believe it's rather close to being merged, so you should be able to rebase this and make it go (?). From my (very limited) experiments with this it does seem that our implementation of CSD is over-regularizing, and under-fitting the signal, but as @mdesco indicates, you might expect some over-fitting with higher numbers of parameters.

Rebasing might be a bit tricky, because we are probably touching similar parts of the code. I'd be happy to help.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Jun 20, 2014

coefficients you have, the more likely you will have spurious peaks and
negative values dues to the high order SH coefficients. But maybe we should
think of an experiment to investigate this issue. To be honest, I don't
remember if I ever did.

I agree it would be nice to have an experiment confirm our intuition. Maybe
we can merge this PR and implement the experiment in a separate PR before
the next release. @arokem, using cross validation for this sounds like a
good idea, I'll probably need to talk to you more offline to flush that out.

Should I just checkout your branch and do this on top of yours? I plan on
having time to do this on Monday.

Working on top of my branch sounds perfect.

Also, for your system dependent variation, it is weird. It can only be the
pseudo-inverse and dot operations being computed, no?

I'll see if I can track this down a little more soon.

Bago

@MrBago

This comment has been minimized.

Contributor

MrBago commented Jul 7, 2014

@mdesco have you had time to look over the odf deconvolution? Does that need to be updated before we can merge this? This file seems to be a popular candidate for updates and fixes so the longer this sits the harder it's going to be to merge. Also once this is in I'd like to start talking about incorporating the performance improvements we talked about at ISMRM.

detection where peaks below 0.1 amplitude are usually considered noise peaks. Because SDT
is based on a q-ball ODF deconvolution, and not signal deconvolution, using the max instead
of mean (as in CSD), is more stable.
Threshold controlling the amplitude below which the corresponding fODF

This comment has been minimized.

@mdesco

mdesco Jul 8, 2014

Contributor

Lets change fODF for FOD in the documentation of csdeconv to be consistent
fODF -> FOD.

@mdesco

This comment has been minimized.

Contributor

mdesco commented Jul 8, 2014

Good to be merged after this minor documentation problem. This looks great. Sorry for the delay.

I will raise an issue for the SDT. If I have time, I will play with the odf_deconv to see if passing the ODF values on the sphere as opposed to the odf_sh helps the deconvolution, especially in the case of the super-resolution.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Jul 30, 2014

@arokem can you take a look at the last commit on this PR and make sure it makes sense. After that I think this is ready to be merged unless anyone else has any objections.

@arokem

This comment has been minimized.

Member

arokem commented Jul 30, 2014

The last commit looks good to me. 1% improvement in the variance explained :-)

@arokem

This comment has been minimized.

Member

arokem commented Jul 30, 2014

Let me just take a quick look at all the rest

@@ -119,14 +119,17 @@ def __init__(self, gtab, response, reg_sphere=None, sh_order=8, lambda_=1,
# scale lambda_ to account for differences in the number of
# SH coefficients and number of mapped directions
# This is exactly what is done in [4]_
self.lambda_ = lambda_ * self.R.shape[0] * r_rh[0] / self.B_reg.shape[0]
self.lambda_ = (lambda_ * self.R.shape[0] * r_rh[0] /
(np.sqrt(self.B_reg.shape[0]) * np.sqrt(362.))

This comment has been minimized.

@arokem

arokem Jul 30, 2014

Member

I am guessing that this only works well when you use the 362 point symmetrical sphere for the ODF? Do we want to write this in the docs somewhere?

This comment has been minimized.

@MrBago

MrBago Jul 30, 2014

Contributor

No this is not specific to the 362 sphere, it should work for any sphere. Any value close to 20 should work fine here, it's an empirically determined constant. I put sqrt(362) because that way the default lambda is not effected by this change (the sqrt(self.B_reg.shape[0]) part).

This comment has been minimized.

@arokem

arokem Jul 30, 2014

Member

OK - cool. I am also running the kfold-xval example right now to see whether we need to change anything there.

This comment has been minimized.

@arokem

arokem Jul 30, 2014

Member

I just put in a PR against your branch with changes to the example.

On Wed, Jul 30, 2014 at 2:46 PM, MrBago notifications@github.com wrote:

In dipy/reconst/csdeconv.py:

@@ -119,14 +119,17 @@ def init(self, gtab, response, reg_sphere=None, sh_order=8, lambda_=1,
# scale lambda_ to account for differences in the number of
# SH coefficients and number of mapped directions
# This is exactly what is done in [4]_

  •    self.lambda_ = lambda_ \* self.R.shape[0] \* r_rh[0] / self.B_reg.shape[0]
    
  •    self.lambda_ = (lambda_  \* self.R.shape[0] \* r_rh[0] /
    
  •                    (np.sqrt(self.B_reg.shape[0]) \* np.sqrt(362.))
    

No this is not specific to the 362 sphere, it should work for any sphere.
Any value close to 20 should work fine here, it's an empirically determined
constant. I put sqrt(362) because that way the default lambda is not
effected by this change (the sqrt(self.B_reg.shape[0]) part).


Reply to this email directly or view it on GitHub
https://github.com/nipy/dipy/pull/340/files#r15614590.

arokem and others added some commits Jul 30, 2014

Merge pull request #8 from arokem/MrBago-bf_csd_fits
DOC: The cross-validation example changed with improvements to csdeconv.
@arokem

This comment has been minimized.

Member

arokem commented Jul 30, 2014

I don't think this Travis build issue is a real thing. I am going to go ahead and merge this.

arokem added a commit that referenced this pull request Jul 30, 2014

@arokem arokem merged commit 6360da3 into nipy:master Jul 30, 2014

1 check failed

continuous-integration/travis-ci The Travis CI build could not complete due to an error
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment