New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New hdf5 file format for saving PeaksAndMetrics objects #1235

Merged
merged 29 commits into from May 12, 2017

Conversation

Projects
None yet
4 participants
@Garyfallidis
Member

Garyfallidis commented Apr 30, 2017

Hi @matthieudumont and all this PR is resolving the issue with saving the peaks object. Making it an HDF5 file allows to be read by other languages etc.

Mr. Dumont you will need to update your workflow PRs to use this one and then we can move to merging them asap.

Do you agree with the file extension .pam or .pam5 (as in hdf5)?

Also have in mind that the helper peaks_to_niftis is not in this PR so you will have to use it from your other PR. Let me know how it goes.

@@ -66,6 +66,7 @@
the ODF shows significant restricted diffusion by thresholding on
the general fractional anisotropy (GFA).
"""
1/0

This comment has been minimized.

@arokem

arokem Apr 30, 2017

Member

This probably shouldn't be here anymore 😄

This comment has been minimized.

@Garyfallidis
# show_manager.initialize()
# show_manager.render()
# show_manager.start()
show_manager.initialize()

This comment has been minimized.

@arokem

arokem Apr 30, 2017

Member

These should probably be recommented

This comment has been minimized.

@Garyfallidis

This comment has been minimized.

@arokem

arokem May 1, 2017

Member

Please don't reply to these comments, unless there is something to discuss. I get an email every time you make a comment like this one.

def load_peaks(fname, verbose=True):
""" Load PeaksAndMetrics PAM file (HDF5)

This comment has been minimized.

@arokem

arokem Apr 30, 2017

Member

Docstring?

def save_peaks(fname, pam):
""" Save NPZ file with all important attributes of object PeaksAndMetrics

This comment has been minimized.

@arokem

arokem Apr 30, 2017

Member

Docstring

@arokem

This comment has been minimized.

Member

arokem commented Apr 30, 2017

Why not use just use the hdf5 extension?

@coveralls

This comment has been minimized.

coveralls commented Apr 30, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.236% when pulling d2cf96c on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

save_peaks(fname, pam)
pam2 = load_peaks(fname, verbose=True)
npt.assert_array_equal(pam.peak_dirs, pam2.peak_dirs)

This comment has been minimized.

@arokem

arokem Apr 30, 2017

Member

Why not test that all the attributes survived the round-trip?

This comment has been minimized.

@arokem

arokem May 8, 2017

Member

This comment is still relevant.

This comment has been minimized.

@arokem

arokem May 11, 2017

Member

I think this comment is still relevant?

@codecov-io

This comment has been minimized.

codecov-io commented May 1, 2017

Codecov Report

Merging #1235 into master will increase coverage by 0.94%.
The diff coverage is 93.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1235      +/-   ##
==========================================
+ Coverage    86.1%   87.04%   +0.94%     
==========================================
  Files         223      225       +2     
  Lines       27895    28109     +214     
  Branches     2829     3014     +185     
==========================================
+ Hits        24018    24468     +450     
+ Misses       3180     2968     -212     
+ Partials      697      673      -24
Impacted Files Coverage Δ
dipy/io/peaks.py 91.89% <91.89%> (ø)
dipy/io/tests/test_io_peaks.py 96.96% <96.96%> (ø)
dipy/fixes/argparse.py 35.36% <0%> (+0.17%) ⬆️
dipy/viz/ui.py 92.28% <0%> (+0.3%) ⬆️
dipy/viz/actor.py 78.18% <0%> (+0.45%) ⬆️
dipy/viz/tests/test_ui.py 84.95% <0%> (+0.48%) ⬆️
dipy/workflows/tests/test_docstring_parser.py 93.85% <0%> (+0.87%) ⬆️
dipy/segment/tests/test_feature.py 99.02% <0%> (+0.97%) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d98cae...cc0e656. Read the comment docs.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 1, 2017

Hi @arokem. The reason why I suggest not using the extension hdf5 is that hdf5 is too generic and then .pam_hdf5 is is too long. Also, we are using hdf5 to store streamlines too in the .dpy format. This is used by quite some people who were really unpleased with trk format (for a good reason). However, .pam is already used by netpbm as an image format. So, I suggest to use .pam5 instead. I think with the clue of the 5 and some info in the docstring it will be difficult for people to miss that this is actually an hdf5 file.
Agree? Also if we use the convention _pam.hdf5 we violate the extension rules. I mean the file format extension definition should be after the last dot.

@coveralls

This comment has been minimized.

coveralls commented May 1, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.236% when pulling b7b7f6a on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@arokem

This comment has been minimized.

Member

arokem commented May 1, 2017

As you say, hdf5 is generic. That is also an advantage. For example, imagine that someone created an application that is a tool for visualizing the content of an hdf5 file (just an example, it can be other things). This kind of application might not work on these files for prosaic reasons, because the application wouldn't 'see' these files as hdf5, even though that's what they are.

If you insist on creating a new file format, I suggest versioning it and checking for the version. It's possible that you will want to change this in the future and you might run into cryptic incompatibilities unless you check for the version of the file.

But I really wouldn't hurry to proliferate a new standard -- seems like something that should be done only when there is clear necessity for it, which I don't see here.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 1, 2017

There is a necessity for the following reasons:

a) If we do use a different extension name then it is becomes easier for our visualization software to read the data. Is it clear why? The viz software will check the file extension and then directly call load_peaks (from dipy.viz). Otherwise if the file is .hdf it will have first to open the file and then see if it has the expected attributes and then see if it has loaded other datasets of .hdf files that are of different type. This makes the visualization duplicate effort with IO which I think is a bad idea.

b) We really need a file to store results between the reconstruction and tracking this has been a bottleneck for some time. Recently we started pickling the pam objects which was not possible before. But hdf5 allows us to use memory maps too and it should be faster than pickling too.

c) External software can build a similar functionality as we could potentially do in dipy.viz
There is also a function that we have implemented with Mr. Dumont that allows to extract some important arrays from PeaksAndMetrics to niftis to help with visualization tools that only read nifti files. This functionality is still on Mr. Dumont's PR.

Nonetheless I think we can both be happy if I use the name .pam_hdf5 rather than .pam5?
Adding a version is good idea too. I am happy to do that.
Let me know what you think.

@coveralls

This comment has been minimized.

coveralls commented May 2, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.191% when pulling e5ea13f on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@arokem

This comment has been minimized.

Member

arokem commented May 2, 2017

@coveralls

This comment has been minimized.

coveralls commented May 2, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.156% when pulling 40a2c40 on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@coveralls

This comment has been minimized.

coveralls commented May 3, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.156% when pulling 40a2c40 on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@coveralls

This comment has been minimized.

coveralls commented May 3, 2017

Coverage Status

Coverage decreased (-0.4%) to 88.159% when pulling 785f6c2 on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 6, 2017

Added version and improved testing. Waiting for travis. On the name I do prefer .pam5 than .pam.hdf5

@coveralls

This comment has been minimized.

coveralls commented May 6, 2017

Coverage Status

Coverage decreased (-0.5%) to 88.044% when pulling ef5063d on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

Why do you prefer .pam5 over .pam.hdf5? The latter seems to cover all the use-cases described above, while the former will fail at least for the case of software written for hdf5 files by people unaware of the pam5 format.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

We do use the two dots for example when we say that a file has two encodings for example .tar.gz and .nii.gz but those are actually two different processing levels in both cases. In our case there is only one processing level. This is one reason why I do not like the .pam.hdf5. Pam is an already available fileformat (for images). The second reason is that people will need to be checking for two last extensions rather than one. Which is not slick.
Now, .pam_hdf5 is too long. So a shortened version of that is .pam5 which is short and unique.
Alternatively we can use .peaks but I would keep that for when the fit is separated from the peaks_and_models. But I am fine with .peaks too if you prefer that.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

About something else. The coverage I am getting in my machine for dipy/io/peaks.py is 96% but one of the bots is reporting 9%. Similarly it reports very low coverage for dipy/io/dpy.py 24% when in my machine I am getting 90%.

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

Yes, it skips the test but still gives coverage for the file. Okay let me add pytables in.

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

Yes. It gives coverage for everything that it can import from dipy.

@coveralls

This comment has been minimized.

coveralls commented May 7, 2017

Coverage Status

Coverage increased (+0.3%) to 88.862% when pulling fb0740f on Garyfallidis:pam_hdf5 into 662e6b8 on nipy:master.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

Travis and coveralls look happy. :)

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

Now trying to reach maximum possible coverage.

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

What I do not like is that one needs to parse for two extensions rather than one. Also the filename size is increasing. But more importantly .pam is already reserved for another popular format. So .pam.hdf is misleading. It's like we put a picture file (.pam) to an .hdf5 file. I hope this clarifies more.

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

@arokem

This comment has been minimized.

Member

arokem commented May 7, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 7, 2017

It's part of the pbm suite of file formats http://netpbm.sourceforge.net/doc/pam.html

I made .dpy back in 2009. There was nothing with that extension at the time but also we were at the very early stages of development. I am happy to change .dpy extension in the near future. I think we should change it but this time promote the tracking format. We didn't promote it enough so others used the same extension name for different projects. But maybe I am wrong perhaps there was something similar before 2009 but at least I was not aware of that.

@Garyfallidis Garyfallidis force-pushed the Garyfallidis:pam_hdf5 branch from e0f5b50 to 4d2fa2c May 10, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 10, 2017

Hi, I think after this release we should actually start having tables, sklearn, cvxpy and vtk as not optional.
For now, I updated .travis.yml and rebased. Waiting for travis to see what we are getting. Added coverage for 3.5 and 2.7 as suggested.

@coveralls

This comment has been minimized.

coveralls commented May 11, 2017

Coverage Status

Coverage decreased (-3.0%) to 85.607% when pulling 4d2fa2c on Garyfallidis:pam_hdf5 into 7d98cae on nipy:master.

1 similar comment
@coveralls

This comment has been minimized.

coveralls commented May 11, 2017

Coverage Status

Coverage decreased (-3.0%) to 85.607% when pulling 4d2fa2c on Garyfallidis:pam_hdf5 into 7d98cae on nipy:master.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 11, 2017

Hi Ariel, can you look into the change I made in travis for some reason the coverage was decreased after rebasing. This seems to be unrelated to the focus of this PR. And we may want to put a not to increase coverage but move on with the merging this and updating the workflow PRs. Let me know what you think.

@arokem

This comment has been minimized.

Member

arokem commented May 11, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 11, 2017

I am happy to test the attributes more. But what do you mean with round-tripping? Do you mean rounding errors?

@arokem

This comment has been minimized.

Member

arokem commented May 11, 2017

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented May 12, 2017

Thanks for this tip. Good idea! Done and waiting for travis.

@coveralls

This comment has been minimized.

coveralls commented May 12, 2017

Coverage Status

Coverage decreased (-3.0%) to 85.607% when pulling cc0e656 on Garyfallidis:pam_hdf5 into 7d98cae on nipy:master.

1 similar comment
@coveralls

This comment has been minimized.

coveralls commented May 12, 2017

Coverage Status

Coverage decreased (-3.0%) to 85.607% when pulling cc0e656 on Garyfallidis:pam_hdf5 into 7d98cae on nipy:master.

@arokem arokem merged commit ceed367 into nipy:master May 12, 2017

3 of 4 checks passed

coverage/coveralls Coverage decreased (-3.0%) to 85.607%
Details
codecov/patch 93.45% of diff hit (target 86.1%)
Details
codecov/project 87.04% (+0.94%) compared to 7d98cae
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request nipy#1235 from Garyfallidis/pam_hdf5
New hdf5 file format for saving PeaksAndMetrics objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment