Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle Analysis and Linear mixed Models Workflows #1755

Merged
merged 20 commits into from Mar 10, 2019

Conversation

@BramshQamar
Copy link
Contributor

commented Mar 7, 2019

In this PR, Bundle Analysis Workflow and Linear mixed Models Workflow are implemented

  • Bundle Analysis Workflow

  • Linear mixed Models Workflow

  • Tests

  • Data

  • Example

Reference:
Chandio, B.Q., S. Koudoro, D. Reagan, J. Harezlak, E. Garyfallidis,
Bundle Analytics: a computational and statistical analyses framework
for tractometric studies, Proceedings of: International Society of
Magnetic Resonance in Medicine (ISMRM), Montreal, Canada, 2019.

@pep8speaks

This comment has been minimized.

Copy link

commented Mar 7, 2019

Hello @BramshQamar, Thank you for updating !

Line 167:9: E117 over-indented

Comment last updated at 2019-03-09 20:07:32 UTC

@BramshQamar BramshQamar changed the title Bundle Analysis Workflow and Linear mixed Models Workflows Bundle Analysis and Linear mixed Models Workflows Mar 7, 2019

@arokem

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

I think that a lot of the code in this PR would be much more useful if it was implemented as standard API code, including tests, before we implement a workflow for this functionality.

@arokem

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

This also seems to introduce two new dependencies: pandas and statstmodels. I don't really object to that, but we might want to proceed carefully. Are both of these dependencies really necessary?

store.close()


def peak_values(bundle, peaks, dt, pname, bname, subject, group, ind, dir):

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Move to segment/bundles.py

dt["subject"].extend([subject]*count)
dt["group"].extend([group]*count)

save_hdf5(dt, os.path.join(dir, pname))

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

move save_hdf5 to dipy.io.dataframe (create new filename called dataframe.py)

values = map_coordinates(metric, bundle._data.T,
order=1)

print("ind = ", len(ind), " value= ", len(values))

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Do we need these prints? I would add a verbose flag.



def dti_measures(bundle, metric, dt, pname, bname, subject, group, ind, dir):

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Remove empty line.


def dti_measures(bundle, metric, dt, pname, bname, subject, group, ind, dir):

""" calculates dti measure (eg: FA, MD) per point on streamlines and

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Document the parameters

Path to the input dti metric or/and peak files. It will be used as
metric for statistical analysis of bundles.
group : string
what group subject belongs to eg control or patient

This comment has been minimized.

Copy link
@Garyfallidis
References
----------
Chandio, B.Q., S. Koudoro, D. Reagan, J. Harezlak, E. Garyfallidis,

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Start with .. [Chandio19] and then link from the top


dt = dict()

mb = os.listdir(model_bundle_files)

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Use from glob import glob instead of listdir

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Actually it's okay for now. Ignore last comment.

clusters = qb.cluster(mbundle_streamlines)
centroids = Streamlines(clusters.centroids)

logging.info('number of centroids {0}'.format(len(centroids.data)))

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

If logging is used. Then there is no need for print.

ind, out_dir)


class AutoBundleAnalysisFlow(Workflow):

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

We need to rename this to BundleAnalysisGroupFlow

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Or BundleAnalysisPopulationFlow

groups = os.listdir(subject_files)

for group in groups:
print(group)

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Use logging...


for file in all_files:

print(file)

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Use logging.

out_dir : string, optional
Output directory (default input file directory)

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

remove empty line

print(file)
df = pd.read_hdf(os.path.join(metric_files, file))
all_bundles = df.bundle.unique()
# all_pvalues = []

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

remove comment

x = list(range(1, len(pvalues)+1))
y = -1*np.log10(pvalues)

title = bundle+" on "+file[:-3]+" Values"

This comment has been minimized.

Copy link
@Garyfallidis
pvalues[i] = mdf.pvalues[1]

x = list(range(1, len(pvalues)+1))
y = -1*np.log10(pvalues)

This comment has been minimized.

Copy link
@Garyfallidis
@Garyfallidis

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

This needs to optionally depend to a recent statsmodel version - old statsmodels have a bug with LMM.
I am good to add optional dependency to pandas and statsmodels. These are extremely well established packages.
@skoudoro please help @BramshQamar to update a travis bot to have these included.

@Garyfallidis

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

Also tests and workflows tests need to pass too.

@@ -0,0 +1,332 @@
from __future__ import division, print_function, absolute_import

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

analysis.py cool! We should have something similar inside the DIPY API too. Something to talk during the workshop.

@arokem

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

BramshQamar added 2 commits Mar 7, 2019
plt.clf()


def save_hdf5(dt, fname):

This comment has been minimized.

Copy link
@arokem

arokem Mar 7, 2019

Member

Can't we use the pandas to_hdf method instead of writing our own function for this?

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Fine with me.

This comment has been minimized.

Copy link
@BramshQamar

BramshQamar Mar 7, 2019

Author Contributor

within the save_hdf5 function, I am using pandas. I have created function just for the re-usability purpose.

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 7, 2019

Member

Yes, it makes sense to have a wrapper at io. With load and save options. If you go that way you should have fname first and then dt. However, if this is not used so often and given the release is very soon I would suggest going with to_hdf. Fine with either.

This comment has been minimized.

Copy link
@BramshQamar

BramshQamar Mar 8, 2019

Author Contributor

Hi @arokem,
I tried to_hdf fucntion of pandas but it didn't work for me. It's not very flexible.
Now, I am creating _save_hdf5 internal function.

@codecov-io

This comment has been minimized.

Copy link

commented Mar 8, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@997c074). Click here to learn what that means.
The diff coverage is 77.32%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1755   +/-   ##
=========================================
  Coverage          ?   83.72%           
=========================================
  Files             ?      119           
  Lines             ?    14441           
  Branches          ?     2277           
=========================================
  Hits              ?    12091           
  Misses            ?     1826           
  Partials          ?      524
Impacted Files Coverage Δ
dipy/viz/regtools.py 36.79% <100%> (ø)
dipy/viz/__init__.py 90% <100%> (ø)
dipy/stats/analysis.py 67.27% <67.27%> (ø)
dipy/workflows/stats.py 92.18% <94.44%> (ø)
@@ -18,6 +18,37 @@ def _tile_plot(imgs, titles, **kwargs):
return fig


def simple_plot(file_name, title, x, y, xlable, ylabel):

This comment has been minimized.

Copy link
@Garyfallidis
label for x-axis
ylable : string
label for y-axis

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 8, 2019

Member

remove empty lines



def _save_hdf5(fname, dt, col_name, col_size=5):

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 8, 2019

Member

Remove line

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 8, 2019

Member

Change ... everywhere ...


def _save_hdf5(fname, dt, col_name, col_size=5):

""" saves the given input dataframe to .h5 file

This comment has been minimized.

Copy link
@Garyfallidis
def get_short_name(cls):
return 'ba'

def run(self, model_bundle_files, subject_files, no_disks=100, out_dir=''):

This comment has been minimized.

Copy link
@Garyfallidis

Garyfallidis Mar 9, 2019

Member

should be model_bundle_folder, subjects_folder


def bundle_analysis(model_bundle_files, bundle_files, orig_bundle_files,
dti_metric_files, group, subject, no_disks=100,
out_dir=''):

This comment has been minimized.

Copy link
@Garyfallidis
BramshQamar added 2 commits Mar 9, 2019
@skoudoro
Copy link
Member

left a comment

hi @BramshQamar, below some comments for making Travis happy

from dipy.testing import assert_true


def test_ba():

This comment has been minimized.

Copy link
@skoudoro

skoudoro Mar 9, 2019

Member

you need to add:

_, have_pd, _ = optional_package("pandas")
_, have_smf, _ = optional_package("statsmodels")

@npt.dec.skipif(not have_pd or not have_smf)
def test_ba():
    .......

import pandas as pd

if have_smf:
import tatsmodels.formula.api as smf

This comment has been minimized.

Copy link
@skoudoro

skoudoro Mar 9, 2019

Member

typo missing the s

"""

df = pd.DataFrame(dt)
filename_hdf5 = fname+'.h5'

This comment has been minimized.

Copy link
@skoudoro

skoudoro Mar 9, 2019

Member

pep8: space around operator

for st in bundle:
di = st[1:] - st[0:-1]
dnorm = np.linalg.norm(di, axis=1)
di = di/dnorm[:, None]

This comment has been minimized.

Copy link
@skoudoro

skoudoro Mar 9, 2019

Member

pep8: space around operator

BramshQamar added 5 commits Mar 9, 2019
@Garyfallidis

This comment has been minimized.

Copy link
Member

commented Mar 10, 2019

Go Go Team!! Thanks @BramshQamar !

@Garyfallidis Garyfallidis merged commit 7b76d74 into nipy:master Mar 10, 2019

4 checks passed

codecov/patch No report found to compare against
Details
codecov/project No report found to compare against
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.