New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecoBundles - recognition of bundles #1443

Merged
merged 39 commits into from Apr 28, 2018

Conversation

Projects
7 participants
@Garyfallidis
Member

Garyfallidis commented Mar 5, 2018

This PR implements the super 馃敟馃敟馃敟馃敟 paper:

Garyfallidis et al. Recognition of white matter bundles using local and global streamline-based registration and clustering NeuroImage, 2017

https://www.sciencedirect.com/science/article/pii/S1053811917305839

This PR is ready for reviews.

I would like that first we merge #1379 so I can move from using directly the ArraySequence to using the Streamlines class. #1379 only needs a small coverage boost.

Some comments on the methods of this PR.

RecoBundles extracts bundles from a given tractogram using model bundles from another tractogram.
Therefore it is important to make sure that the two tractograms are registered first. So, what we usually do is apply the remarkably robust streamline registration first (see function whole_brain_slr) and then start RecoBundles. This strategy can be very useful with clinical datasets where image-based registration is uncertain. Of course you are happy to use other methods for registration too.

https://www.ncbi.nlm.nih.gov/pubmed/25987367

After this PR is merged will be working on providing an easy to use workflow and with access to an atlas of many bundles. And build the tutorial. Nonetheless, be happy to start using this and do please cite us gracefully.

Remark; If the two tractograms are already registered well with nonrigid registration make sure you disable the local_slr parameter as it will be probably an overkill.

Oh and Welcome to the Matrix! :) This PR brings infinite possibilities in the game. More cool additions and enhancements coming soon.

@frheault after this is merged make a PR for the centroid-based metric directly to the DIPY master. I am talking about your previous work here https://github.com/Garyfallidis/dipy/pull/50/files

@Garyfallidis Garyfallidis requested a review from BramshQamar Mar 5, 2018

@skoudoro

Travis complain but I can not reproduce this error on windows... all tests pass

Parameters
-----------
static : array

This comment has been minimized.

@skoudoro

skoudoro Mar 8, 2018

Member

The variable name static does not match with input parameter stat

This comment has been minimized.

@Garyfallidis
-----------
static : array
Static streamlines
moving : array

This comment has been minimized.

@skoudoro

skoudoro Mar 8, 2018

Member

The variable name moving does not match with input parameter mov

This comment has been minimized.

@Garyfallidis
Notes
-----
The difference with ``_bundle_minimum_distance`` is that we sum the
minimum values only for the static. Therefore, this is assymetric.

This comment has been minimized.

@skoudoro

skoudoro Mar 8, 2018

Member

typo: asymmetric

@codecov-io

This comment has been minimized.

codecov-io commented Mar 9, 2018

Codecov Report

Merging #1443 into master will increase coverage by 0.03%.
The diff coverage is 88.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1443      +/-   ##
==========================================
+ Coverage   87.49%   87.52%   +0.03%     
==========================================
  Files         241      244       +3     
  Lines       30789    31189     +400     
  Branches     3322     3426     +104     
==========================================
+ Hits        26939    27299     +360     
- Misses       3074     3079       +5     
- Partials      776      811      +35
Impacted Files Coverage 螖
dipy/tracking/streamline.py 96.03% <酶> (+0.2%) 猬嗭笍
dipy/tracking/utils.py 88.57% <酶> (酶) 猬嗭笍
dipy/segment/clustering.py 94.63% <100%> (+0.97%) 猬嗭笍
dipy/align/tests/test_streamlinear.py 97.74% <100%> (酶) 猬嗭笍
dipy/align/streamlinear.py 86.85% <75.69%> (-9.24%) 猬囷笍
dipy/segment/bundles.py 90.21% <90.21%> (酶)
dipy/align/tests/test_whole_brain_slr.py 93.75% <93.75%> (酶)
dipy/segment/tests/test_qbx.py 97.12% <96.15%> (+0.19%) 猬嗭笍
dipy/segment/tests/test_rb.py 96.87% <96.87%> (酶)
dipy/core/gradients.py 92.1% <0%> (-5.27%) 猬囷笍
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update f6af802...72fcf7f. Read the comment docs.

@Garyfallidis Garyfallidis changed the title from WIP: RecoBundles - automatic recognition of bundles to RecoBundles - automatic recognition of bundles Mar 11, 2018

@Garyfallidis Garyfallidis changed the title from RecoBundles - automatic recognition of bundles to RecoBundles - recognition of bundles Mar 11, 2018

@Garyfallidis Garyfallidis added this to Needs a review in Dipy 0.14.0 Mar 12, 2018

@Garyfallidis Garyfallidis force-pushed the Garyfallidis:rb_rb branch from 81c625f to d26b31a Mar 12, 2018

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Mar 14, 2018

@kesshijordan it would be great if you can have a look at this PR. Let me know what you think. @MarcCote please have a look at the qbx_merge function.

@kesshijordan

This comment has been minimized.

Contributor

kesshijordan commented Mar 14, 2018

Thanks, @Garyfallidis ! I've been looking forward to this. I'm working on pyAFQ (@arokem) and think this approach would be a great way to clean up the results.

I made some atlases (calculated centroids from quickbundles and streamline registered a series of manually segmented healthy control tracks) that would probably work well for this.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Mar 14, 2018

Great strategy! Cool @kesshijordan, give it a go and let me know if you have questions. Happy to help :)

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Mar 30, 2018

@kesshijordan and @BramshQamar please provide your review. It would be great to have this merged for this release.

@MarcCote

Adde minor suggestions to qbx_merge. Otherwise, looks good to me.

len_s = len(streamlines)
if select_randomly is None:
select_randomly = len_s
indices = np.random.choice(len_s, min(select_randomly, len_s),

This comment has been minimized.

@MarcCote

MarcCote Apr 5, 2018

Contributor

Personally, I prefer using a dedicated random generator instead of relying on np.random. It gives you more control for reproducibility.

I would add a new optional argument rng=None to the function and do

indices = np.arange(len(streamlines))
if select_randomly:
    rng = np.random.RandomState(None) if rng is None else rng
    rng.shuffle(indices)

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

Just a note. I personally almost always replace the pattern x = y if x is None else x by x = x or y

rng = rng or np.random.RandomState(None)

I know that this could be a trap because False is not the same as None but some programmers prefer the or trick.

final_level = len(thresholds)
qbx_ordering_final = np.random.choice(

This comment has been minimized.

@MarcCote

MarcCote Apr 5, 2018

Contributor

You could reuse the rng from above here instead of relying on the global RandomState.

print(' Duration %0.3f sec. \n' % (time() - t1, ))
return merged_cluster_map

This comment has been minimized.

@MarcCote

MarcCote Apr 5, 2018

Contributor

Missing a newline.

@kesshijordan

This comment has been minimized.

Contributor

kesshijordan commented Apr 5, 2018

@Garyfallidis I think it's worth having a basic tutorial with a quick demo/guide for parameter tuning in the same release as Recobundles. I would imagine that when it's released people will give it a quick try to see if it works on their data as well as the examples in the publication. You want them to know which knobs to turn when they do; the tuning isn't necessarily intuitive (it made a huge difference in how well it worked on my data when I got feedback from you on tuning). It would help make a good first impression.

# In essence whole_brain_slr can be thought as a combination of
# SLF and QuickBundles
whole_brain_slr = slr_with_qb

This comment has been minimized.

@BramshQamar

BramshQamar Apr 5, 2018

Contributor

whole_brain_slr and slr_with_qb basically have same implementation?

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 24, 2018

Member

Yes, the same function can be used for brain tractograms and other tractograms. In the paper I called it whole_brain_slr but then realized that the function can be more generic. Adding some more documentation to explain.

len_s = len(streamlines)
if select_randomly is None:
select_randomly = len_s
indices = np.random.choice(len_s, min(select_randomly, len_s),

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

Just a note. I personally almost always replace the pattern x = y if x is None else x by x = x or y

rng = rng or np.random.RandomState(None)

I know that this could be a trap because False is not the same as None but some programmers prefer the or trick.

double inf = np.finfo('f8').max
double dist=0
double * min_j
# double * min_i

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

Could you please clean all the commented code in this function?

def test_rb_no_neighb():
# what if neighbors are found? No recognition

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

Is this a typo? What if NO neighbors are found?

MAX_DIST = 1e10
LOG_MAX_DIST = np.log(MAX_DIST)
DEFAULT_BOUNDS = [(-45, 45), (-45, 45), (-45, 45),
(-30, 30), (-30, 30), (-30, 30),

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

Are you sure it's 45, 30, and not 30, 45? I ask because I have seen another version with translation (-30, 30) and rotation (-45, 45) and I believe it makes more sense that way.

EDIT: Oh, I see that you have another set of bounds below :) Can you explain why it's different from this set that you jsut defined?

clustering.
"""
self.reduction_thr = reduction_thr

This comment has been minimized.

@nilgoyette

nilgoyette Apr 6, 2018

Contributor

I have been working with this codebase (a really similar one from Fran莽ois Rheault) in the last weeks and please, for the love of gods, don't do that! Stop putting mutable stuff in self like a madman. It's so much harder to debug! I would know, I did it! I took me days to debug instead of hours.

I'm not talking only about reduction_thr. It wouldn't be a real problem if it was only for this variable. The problem happen because you have reduction_thr, model_bundle, pruned_streamlines, nb_pruned_streamlines, transf_streamlines, transf_matrix, labels, model_clust_thr, model_cluster_map, model_centroids, nb_model_centroids, centroid_matrix, neighb_streamlines, neighb_clusters, neighb_centroids, neighb_indices, nb_neighb_streamlines, slr_bmd, slr_iterations, slr_initial_matrix, slr_final_matrix, slr_xopt, rtransf_cluster_map, rtransf_centroids, nb_rtransf_centroids, pruning_matrix, pruned_indices and labeled_streamlines that are not in self at creation and are modified at random places and times. I may or may not have missed some mutable variables, so there may be more.

Some of them are only used in one function. Some of them are used in many functions but it could be easily be avoided. Some of them are simply the len of another. Maybe some of them are never used, I didn't check.

I will not write an essay here about why a programmer should never do that, I'll simply say that it's super hard to follow and to debug this kind of code. By the way, sorry for the "tone" of this review, I swear I did it for the good of DiPy.

This comment has been minimized.

@MarcCote

MarcCote Apr 6, 2018

Contributor

What an old-fashioned / C++ way of thinking :P ... just kidding! I totally agree with you that would definitively clarify the code.

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 20, 2018

Member

What is the alternative that you are proposing @nilgoyette ? Also let's all try to be polite to each other. The tone was not necessary. Please be specific.

This comment has been minimized.

@nilgoyette

nilgoyette Apr 20, 2018

Contributor

Nothing fancy: return value(s) from functions, add arguments, try to have 0 mutable member.

As a generul rule, try to only have immutable objects. self should never be modified after construction. I know it's not Python way of thinking and it's maybe "too much", so I wouldn't care if 1 or 2 members are modified but you took it to another level :)

openmp.omp_init_lock(&lock)
min_j = <double *> malloc(static_size * sizeof(double))
# min_i = <double *> malloc(moving_size * sizeof(double))

This comment has been minimized.

@nilgoyette

nilgoyette Apr 11, 2018

Contributor

Because you only keep the min of the first loop, you don't need an array. A simple min and sum var will do.

Notes
-----
The difference with ``_bundle_minimum_distance`` is that we sum the
minimum values only for the static. Therefore, this is asymetric.

This comment has been minimized.

@nilgoyette

nilgoyette Apr 11, 2018

Contributor

Can you please explain more what are the consequences, if any? This function is a little less heavy than _bundle_minimum_distance, ok, but what are we losing? When can we use it?

# return recognized bundle in original streamlines, labels of
# recognized bundle and transformed recognized bundle
return self.streamlines[self.labels], self.labels, \
self.pruned_streamlines

This comment has been minimized.

@BramshQamar

BramshQamar Apr 19, 2018

Contributor

It will be good to have detailed documentation about difference between self.streamlines[self.labels] and self.pruned_streamlines returned by recognize function.

self.cluster_map.refdata = self.streamlines
self.centroids = self.cluster_map.centroids
self.nb_centroids = len(self.centroids)
self.indices = [cluster.indices for cluster in self.cluster_map]

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

Thank you for the update, it's incredibly better. You can go further by removing self.clust_thr and self.indices (never used), and self.nb_centroids (only used in a print, so simply use len). I think I won't complain after that :)

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 24, 2018

Member

I am removing self.clust_thr but the others are useful to stay there. I have reduced number of attributes incredibly there is no need to move directly and made _cluster_streamlines a "private" function. So, you should be very happy. Some utility attributes are useful.

def _cluster_streamlines(self, clust_thr=15, nb_pts=20):
np.random.seed(42)

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

Shouldn't this be a parameter? It would be nice if the user could simply choose a seed to have reproducable results. It's probably more complex than this simple line though?

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 24, 2018

Member

Yes, needs a proper RandomState there. Fixing ...

centroid_matrix[centroid_matrix > reduction_thr] = np.inf
mins = np.min(centroid_matrix, axis=0)
close_clusters_indices = list(np.where(mins != np.inf)[0])

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

I don't get this strategy. Why not forget the infinity and simply use your reduction_thr directly in the np.where?

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 24, 2018

Member

Multiple ways to go to Rome.

# TODO this can be speeded up by using directly the centroids
static = select_random_set_of_streamlines(model_bundle,
select_model)

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

This can fit in 80 characters.

mins = np.min(pruning_matrix, axis=0)
pruned_indices = [rtransf_cluster_map[i].indices
for i in np.where(mins != np.inf)[0]]

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

Idem here?

pruned_streamlines = [transf_streamlines[i]
for i in pruned_indices]
pruned_indices = pruned_indices

This comment has been minimized.

@nilgoyette

nilgoyette Apr 23, 2018

Contributor

That might be useless.

This comment has been minimized.

@Garyfallidis

Garyfallidis Apr 24, 2018

Member

Good thanks.

Garyfallidis added some commits Apr 24, 2018

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Apr 24, 2018

@kesshijordan I will have to make a tutorial on another PR. Here is why. We are currently cleaning and preparing an atlas to be used. And it takes time. I want you to look at this atlas when you visit us next week. Ideally I want to make sure that we have a tutorial that can run with multiple input atlases. But we can start with one first.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Apr 26, 2018

Thank you all @kesshijordan @MarcCote @BramshQamar @nilgoyette and @MarcCote for your reviews. There were all very useful. All your comments have been address. Apologies also for the initial coding style with the many unnecessary attributes. But all this is fixed now. We are going to merge this PR today so we can move with the release in preparation for the big DIPY workshop next week. During https://brainhack.sice.indiana.edu. More updates and tutorials and atlases will be available in other PRs.

@skoudoro skoudoro merged commit 26f99c5 into nipy:master Apr 28, 2018

3 checks passed

codecov/patch 88.18% of diff hit (target 87.49%)
Details
codecov/project 87.52% (+0.03%) compared to f6af802
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Dipy 0.14.0 automation moved this from Needs a review to Done Apr 28, 2018

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request nipy#1443 from Garyfallidis/rb_rb
RecoBundles - recognition of bundles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment