New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tissue classification using MAP-MRF #670

Closed
wants to merge 107 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@villalonreina
Contributor

villalonreina commented Jun 26, 2015

This is an implementation of the MAP-MRF (Maximum a posteriori - Markov Random Field) by using the Iterative Conditional Modes (ICM) minimization algorithm. There are three inputs, the grey level T1 brain image, its corresponding mask and an initial three class segmentation of the T1.

For now we have hard-coded paths and I can share the images with whoever wants to test the code.

This code is still experimental and needs to be extended to estimate the partial volumes of the tissue classes.

@matthew-brett

This comment has been minimized.

Member

matthew-brett commented Jun 26, 2015

Can you put the images into their own repo, so we can add them as a submodule for testing?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jun 26, 2015

I believe that Julio cannot share these images right now. I mean he cannot share those specific images publicly. We can try to change them with some existing ones in the following days. Until then he can give you a dropbox link so you can have a look.

"""
mu = np.zeros(3)
std = np.zeros(3)

This comment has been minimized.

@MrBago

MrBago Jun 26, 2015

Contributor

You have nclass as an argument than you hard code 3 here.

This comment has been minimized.

@MrBago

MrBago Jun 26, 2015

Contributor

Try this:

import numpy as np

def seg_stats(input_image, seg_image, ddof=0):
    img = input_image.ravel()
    seg = seg_image.ravel()
    if not isinstance(seg.dtype, np.integer):
        seg = seg.astype(np.int64)

    seg_vol = np.bincount(seg)
    sum_img = np.bincount(seg, img)
    sum_sq = np.bincount(seg, img ** 2)

    mean = sum_img / seg_vol
    var = sum_sq / (seg_vol + ddof) - mean ** 2
    std = np.sqrt(var)
    return mean[1:], std[1:]
@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jun 26, 2015

For testing purposes, we may stack a few copies of this coronal slice of a T1-weighted image:
https://github.com/nipy/dipy/blob/master/dipy/data/t1_coronal_slice.npy

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jun 26, 2015

Thx @omarocegueda :)

@MrBago

This comment has been minimized.

Contributor

MrBago commented Jun 26, 2015

It seems like most of functions are getting their own files, is there a reason most of this isn't in one file?

It is intended to work on a masked T1 brain image.
It is intended to give threshold values for CSF/WM and WM/GM.
Copied from scikit-image to remove dependency.

This comment has been minimized.

@arokem

arokem Jun 27, 2015

Member

Please don't forget to copy over the license from scikit-image as well

This comment has been minimized.

@samuelstjean
return energytotal


def log_likelihood(img, mu, var, index, label):

This comment has been minimized.

@omarocegueda

omarocegueda Jun 27, 2015

Contributor

This is actually the negative log-likelihood, right?

This comment has been minimized.

@villalonreina

villalonreina Jun 27, 2015

Contributor

Yes it is. My understanding is that this is the one you minimize. The positive one is the one that you would maximize, is that right?

Thanks

This comment has been minimized.

@omarocegueda

omarocegueda Jun 27, 2015

Contributor

Yes, that's right, I just thought that something like neg_log_likelihood would be a better name for this function

return loglike


def gibbs_energy(seg, index, label, beta):

This comment has been minimized.

@villalonreina

villalonreina Jun 27, 2015

Contributor

This is great. Yes I saw this somewhere else, but first wanted to try the simpler version of it to test it out. Thanks a lot.


segmented = np.zeros(dataimg.shape)

while True:

This comment has been minimized.

@omarocegueda

omarocegueda Jun 27, 2015

Contributor

With for N in range(niter): you no longer need the break at lines 60-61


energy = 0

if label == seg[index[0] - 1, index[1], index[2]]:

This comment has been minimized.

@omarocegueda

omarocegueda Jun 27, 2015

Contributor

I think you need to check if you are at the boundary of the image and do not accumulate the energy in that case, otherwise this will wrap to the other side in Python (not the desired behavior) and may cause a memory access violation in Cython (I think we will definitely need to cythonize)

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jun 27, 2015

I think we definitely need to code this in cython. Can we define a base class? like ImageSegmenter that has a method segment(image) and returns the segmented image. Then we can derive FASTImageSegmenter(ImageSegmenter) which contains this particular implementation of the Maximum A-Posteriori estimation of the HMRF with ICM (ImageSegmenter doesn't sound good, any ideas?).

I agree with @MrBago, I think you can put most of this code into one single file.

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Jun 29, 2015

As a late reply to all the comments at the top for a t1 image (isn't there a way to reply to a specific comment, or this is a linear discussion?) : There is already one from the standford dataset, see this example http://nipy.org/dipy/examples_built/linear_fascicle_evaluation.html. It seems to be at 2 mm iso instead of the more common 1 mm, but at least anyone can play with it easily and get the same results you would have.

else:
energy = energy + beta

if label == seg[index[0] + 1, index[1], index[2]]:

This comment has been minimized.

@omarocegueda

omarocegueda Jul 1, 2015

Contributor

Just curious: here, the Ising/Potts potential is -beta if the labels are equal and +beta if the labels are different. I remember that I have seen this "symmetric" version before, but it is not the standard Ising/Potts potential, it should be -beta*Delta(x_{i}-x_{j}). In other words: it should be zero if the labels are equal instead of -beta, as appears in Zhang's paper, first paragraph page 49. Any reason why we prefer the symmetric version here?

This comment has been minimized.

@villalonreina

villalonreina Jul 1, 2015

Contributor

Hi Omar,

Thanks for you comment. Can you point me to where exactly in the Zhang paper they mention the "non-symmetric" version?

Thanks

This comment has been minimized.

@omarocegueda

omarocegueda Jul 1, 2015

Contributor

Sure!, it's 6th line, first paragraph $V_{C}(x) = -\delta(x_{i}-x_{j})$ (the Dirac delta)

This comment has been minimized.

@omarocegueda

omarocegueda Jul 1, 2015

Contributor

Sorry, sixt line, first paragraph, page 49

This comment has been minimized.

@villalonreina

villalonreina Jul 1, 2015

Contributor

Yes, you are right. I completely missed that one. Thanks!

This comment has been minimized.

@villalonreina

villalonreina Jul 1, 2015

Contributor

Hi Omar, one question regarding the Delta function. Do you know if anyone has implemented it in Dipy?

This comment has been minimized.

@omarocegueda

omarocegueda Jul 1, 2015

Contributor

I don't know, but it would be something like

return 1 if x == 0 else 0
@villalonreina

This comment has been minimized.

Contributor

villalonreina commented Jul 1, 2015

Hi all,

I got confused when trying to follow the algorithm proposed in the Zhang 2001 paper (http://www.ncbi.nlm.nih.gov/pubmed/11293691). I am not sure about which equation(s) goes into each of the steps of the EM algorithm.

So, for step E should I use the function for Q (equation 24) and then I maximize in step M equations 25 and 26?

This is a list of steps of what I am thinking about:

  1. Estimate label parameters for L range(1, 3) from input segmentation

  2. First Loop of ICM, which will give me a segmentation (\hat{x}) and P(\ell | N_i) (the probability of L given the neighborhood)

  3. EM loop to estimate the parameters mu and sigma

  4. Back to ICM

Does this sound reasonable? What do you think?

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 1, 2015

yes, Q in equation 24 is the function that needs to be maximized with respect to the parameters (the parameters in this case are the mu and sigma for each class). The good news is that the authors already did it for us! they did the math and found the the mu and sigma that maximize Q are given by equations 25 and 26.

Note that the key quantities you need to compute in preparation for eqs. 25 and 25 are given by eq. 27: for each voxel i (with coordinates (x,y,z), say), and for each label k (in equation 27 they use \ell instead of k, but I will use k because it reads better in plain text) you need to compute $P^{(t)}(k | y_{i})$, let's denote that P[i, k].

Now in order to compute P[i,k] you will need another quantity first, which is the second factor in the numerator of eq. 27, wich is $P^{t}(k, x_{N_{i}})$, let's denote this quantity Q[i, k], this quantity is the probability that the label of voxel i is $k$ given that its neighbors are known (remember that you already know all the labels because that's the output of ICM). You can actually find how to compute Q[i,k] in equation 2.18 of Stan Z. Li book "Markov Random Field Modeling in Image Analysis" (page 27) . In that equation V1 doesn't concern to us and V2 is, in our case, the potential of the Ising/Potts model (exponential of -beta or +beta, depending on the neighboring voxels having the same or different label, respectively). The following pseudo code may be more clear:

The following computes Q[i, k] for only one voxel i, and for all labels k

Initialize Q[i, k] = 0 for all labels k
normalization = 0
for each label k:
    for each neighbor j of voxel i:
        Q[i, k] += PottsIssing(k, x_{j}) 
    Q[i,k] = exp(Q[i,k])
    normalization += Q[i,k]
Q[i,k]/=normalization

here, PottsIssing(a, b)= -beta if a==b, else beta, and x_{j} is the label of voxel j (remember, the output from ICM). From these Q, you can compute P[i,k] using eq. 27 and then use that to compute the new mus and sigmas from eq. 25 and 26.

In summary, you initialize the mu and sigma for all models using median otsu, and then iterate many times the following 2 steps(we can discuss when to stop later):

  1. ICM to find the labels using your current mus and sigmas
  2. Update mu and sigma according to eq. 25 and 25, using the current labels

I hope it helps

@villalonreina

This comment has been minimized.

Contributor

villalonreina commented Jul 1, 2015

Hi Omar,

I trying to implement equation 27 from the same paper and I am not sure about g^t(y_i) and P(y_i). I am assuming that g^t(y_i) comes from equation 11 and that P(y_i) comes from equation 23. Is that right?

Thanks!

@omarocegueda

This comment has been minimized.

Contributor

omarocegueda commented Jul 1, 2015

Hi Julio,
exactly, $g^{(t)}(y_{i}; \theta_{k})$ is in this case the Normal density (eq. 11) with parameters \theta_{k}, which means the mu and sigma corresponding to class k (denoted \ell in the paper). Regarding $p(y_{i})$, please note that it cannot be given by eq. 23, because eq. 23 is conditioned on the knowledge of $\theta$, but $p(y_{i})$ in eq. 27 is not conditioned on $\theta$. Here, $p(y_{i})$ is the probability that intensity $y_{i}$ is observed at voxel $i$ (without knowing anything about the parameters $\theta$). Also note that in theory you could obtain $p(y_{i})$ by integrating $p(y_{i} | \theta)$ (eq. 23) for all possible values of $theta$ (a huuuge amount of possible values, uncountable many) but, a much simpler way is to observe that $p(y_{i})$ is just a normalization constant that ensures that if you sum all the probabilities $p(k | y_{i})$ (eq. 27) for all the classes k, the sum is equal to 1. You can see this for example in Bishop's Book page 61, eq. 2.76 (which is reference [9] in the paper, and the ebook is here: http://cs.du.edu/~mitchell/mario_books/Neural_Networks_for_Pattern_Recognition_-_Christopher_Bishop.pdf).

So, in summary, you can first ignore $p(y_{i})$ in eq 27, compute only the numerator of eq. 27 for all classes $\ell$ and divide all of them by their sum.


def seg_stats(self, input_image, seg_image, nclass):

r""" Mean and standard variation for N desired tissue classes

This comment has been minimized.

@samuelstjean

samuelstjean Nov 20, 2015

Contributor

you probably mean standard deviation

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Dec 13, 2015

ping @villalonreina :)

mu = np.zeros(nclass)
std = np.zeros(nclass)

for i in range(0, nclass):

This comment has been minimized.

@omarocegueda

omarocegueda Feb 12, 2016

Contributor

This algorithm is Theta(nclass * N^3). We can compute everything in Theta(N^3).

This comment has been minimized.

@villalonreina

villalonreina Apr 28, 2016

Contributor

Hi Omar,

Thanks for your comment. What exactly do you mean with "Theta(N^3)" ? What specific line is it that you are referring to?

Thanks!

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

For each label k=0, 1, ..., K-1 you are extracting all voxel values in input_image which are marked as k in seg_image. This operation needs to iterate over all voxels, which has complexity Theta(N^3) for an image of size NxNxN. Since you are doing this once for each label, the full complexity is Theta(K*N^3).

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

Instead of doing that you can initialize mu[...] and std[...] to zero. Then iterate over each voxel (x, y, z), grab its label s = seg_image[x, y, z] and its value v = input_image[x, y, z]. Then update only the entry in mu, std corresponding to labe s: mu[s] += v, std[s] += v*v. Since you are sweeping the full volume only once and for each voxel you are updating only one entry of the arrays mu and std, the full complexity is just Theta(N^3).

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

Of course, at the end you have the sum of the values and the sum of their squares (not the mean and std), but from that information and the number of voxels for each label you can directly compute the means and the standard deviations.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Mar 8, 2016

Hey guys and especially @villalonreina and @omarocegueda, any progress with this PR? We need to move on for the next release. The clock is ticking!

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Apr 26, 2016

@villalonreina any progress with the validation? Keep it up!

villalonreina added some commits Apr 29, 2016

Added tolerance value as a percentage to stop iterations of the ICM l…
…oop. Changed the tissue_classification example to reflect the absence of number of iterations as an input.
Merge remote-tracking branch 'nipy-dipy/master' into pve
Need to update pve branch with latest commits to nipy-dipy/master
Fixed and modified the test file that was erroring. Made small change…
…s to the example file to include the tolerance value for the tissue classification
@coveralls

This comment has been minimized.

coveralls commented Jun 16, 2016

Coverage Status

Changes Unknown when pulling caeee84 on villalonreina:pve into * on nipy:master*.

@codecov-io

This comment has been minimized.

codecov-io commented Jun 16, 2016

Current coverage is 80.41%

No coverage report found for master at c5103fd.

Powered by Codecov. Last updated by c5103fd...caeee84


mu, sigma = com.initialize_param_uniform(image, nclasses)
sigmasq = sigma ** 2
npt.assert_equal(mu, np.array([0., 0.25, 0.5, 0.75]))

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

In general, we should not compare floating point values for equality, we should use assert_array_almost_equal instead.

npt.assert_equal(sigmasq, np.array([1.0, 1.0, 1.0, 1.0]))

neglogl = com.negloglikelihood(image, mu, sigmasq, nclasses)
npt.assert_equal((neglogl[100, 100, 1, 0] != neglogl[100, 100, 1, 1]),

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

Sorry, I don't understand what is the purpose of these tests. Why is it important to verify that the negative log-likelihood is different at these voxels?

This comment has been minimized.

@villalonreina

villalonreina Jun 25, 2016

Contributor

Hi Omar. So basically I am trying to test if the neg-loglikelihood is different for that one voxel across the different tissue types (the last axis of the array). Do you think that the neg-loglikelihood will invariably be always different across the tissue types?


PLN = icm.prob_neighborhood(initial_segmentation, beta, nclasses)
npt.assert_equal(PLN.all() >= 0.0, True)
npt.assert_equal(PLN.all() <= 1.0, True)

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

For example, these tests really are meaningful! but I think the above (the ones that check for inequality) are unlikely to catch any bugs.

npt.assert_equal(PLN.all() <= 1.0, True)

if beta == 0.0:
npt.assert_equal((PLN[50, 50, 1, 0] == 0.25), True)

This comment has been minimized.

@omarocegueda

omarocegueda Jun 16, 2016

Contributor

We can put all these coordinates in a list and just iterate over it. And we should use ...almost_equal instead of equal to compare floating point values

"""
Created on Wed Apr 20 15:57:42 2016
@author: jvillalo

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

We should remove this auto-generated code by spyder

"""

import numpy as np
#import nibabel as nib

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

Remove import if unnecessary.

print('t1.shape (%d, %d, %d)' % t1.shape)

"""
Once we have fetched the T1 volume, we proceed to denoise it with the NLM algorithm.

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

Why is this necessary? Note that if we denoise the volume first, then the effect of the regularization parameter will be less noticeable. Also this process is quite heavy, people may just want to see segmentation results quickly.

This comment has been minimized.

@villalonreina

villalonreina Jun 25, 2016

Contributor

Ok, yes I agree. Got rid of the denoising of the T1. It makes the example complicated and made the run time longer.

final_segmentation = np.empty_like(image)
initial_segmentation = seg_init.copy()

for i in range(100):

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

We should not hard-code the number of iterations. I think users will expect this to be an input parameter.

self.energies.append(energy)
self.energies_sum.append(energy[energy > -np.inf].sum())

if i % 10 == 0 and i != 0:

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

Why we only check convergence every 10 iterations?

self.segmentations.append(final_segmentation)
self.pves.append(PVE)
self.energies.append(energy)
self.energies_sum.append(energy[energy > -np.inf].sum())

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

Sorry, I don't understand what the stop criterion is, could you please comment on this?

This comment has been minimized.

@villalonreina

villalonreina Jun 25, 2016

Contributor

The converge is tested on the total energy given by ICM at every loop. Every 10 iterations (it could be 20, 25 or 30), a percentage of the total range of change (the tolerance) is compared to the range of change of the last 5 iterations. If this range of change of the last 5 iterations is smaller than the full range of change then the loop stops. Note that the range of change of the last 5 iterations is compared to the total range of change as the loop continues in multiples of 10 (10,20,30, etc....). Shall we change it to be tested every 20 iterations?


pass

def classify(self, image, nclasses, beta, tolerance):

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

tolerance documentation missing in the docstring

beta : float,
smoothing parameter, the higher this number the smoother the
output will be.
max_iter : float,

This comment has been minimized.

@omarocegueda

omarocegueda Jun 17, 2016

Contributor

max_iter is currently not an input parameter (I think it should be)

return PLN


cdef void _initialize_maximum_likelihood(double[:,:,:,:] nloglike, double[:,:,:] seg) nogil:

This comment has been minimized.

@omarocegueda

omarocegueda Jun 20, 2016

Contributor

seg should be of an integral type (int or short), not double



cdef void _icm_ising(double[:,:,:,:] nloglike, double beta, double[:,:,:] seg,
double[:,:,:] energy, double[:,:,:] new_seg) nogil:

This comment has been minimized.

@omarocegueda

omarocegueda Jun 20, 2016

Contributor

Same here, seg and new_seg don't need to be double, they should be of an integral type

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jul 2, 2016

Hello @villalonreina. Ready with the rebase and the new PR?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jul 2, 2016

How did the squashing/interactive rebase go?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Sep 8, 2016

This PR is replaced by #1115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment