Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRCZ io_plugin #1716

Merged
merged 19 commits into from
Feb 20, 2018
Merged

Conversation

robbmcleod
Copy link
Contributor

Couple points to look at:

  • both the mrc and mrcz plugins support the '.mrc' file extension. I'm guessing you pick the plugin by the
    extension somewhere?
  • The signal axes are correctly ordered but the mixed C/Fortran ordering is a little confusing.
  • Why no __getitem__ and __setitem__ methods for the metadata attribute?

@robbmcleod
Copy link
Contributor Author

That's strange that TravisCI can't find mrcz for MacOSX. It's a pure-python repo.

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments to get the tests working. I can come back to the rest later.

.travis.yml Outdated
@@ -33,7 +33,7 @@ before_install:
install:

- if [[ $MINIMAL_ENV == 'False' ]] ; then
DEPS="pip numpy scipy matplotlib>=2.0.2 ipython h5py sympy scikit-learn dill natsort setuptools scikit-image cython lxml ipyparallel dask traits traitsui";
DEPS="pip numpy scipy matplotlib>=2.0.2 ipython h5py sympy scikit-learn dill natsort setuptools scikit-image cython lxml ipyparallel dask traits traitsui blosc mrcz";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the dependencies installed from the conda-forge channel. I cann't find any repo for the mrcz, so I guess it would be better to remove mrcz from here and leave pip install it automatically from pypi when hyperspy is installed.

appveyor.yml Outdated
@@ -139,7 +139,7 @@ before_deploy:
- "ECHO %SCIKIT_IMAGE%"
# - "%CMD_IN_ENV% wget http://www.lfd.uci.edu/~gohlke/pythonlibs/tuoh5y4k/%SCIKIT_IMAGE% --header User-Agent:Chrome/23.0.1271.97"
- "%CMD_IN_ENV% pip install %SCIKIT_IMAGE%"
- "%CMD_IN_ENV% pip install --upgrade tqdm notebook cython ipython configobj start_jupyter_cm ipywidgets ipyparallel sympy pytest"
- "%CMD_IN_ENV% pip install --upgrade tqdm notebook cython ipython configobj start_jupyter_cm ipywidgets ipyparallel sympy pytest blosc mrcz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for .travis.yml, install blosc from the conda-forge channel with conda and install mrcz with pip from pypi.

mrcName = os.path.join( tmpDir, "testMage.mrcz" )
MAX_ASYNC_TIME = 2.0

class PythonMrczTests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class is not discovered by pytest, the name needs to start with "Test".

@robbmcleod
Copy link
Contributor Author

Ok, getting closer, JSON version problem now.

@robbmcleod
Copy link
Contributor Author

Looks good to me.

@ericpre
Copy link
Member

ericpre commented Sep 22, 2017

Indeed, it is working fine!
There are two more things worth looking at:

  • for lazy signal, hyperspy use dask. As it is currently, it is working, except for saving signal (the data are stored as a dask.Array instance). Do you think it would possible to support writing mrcz from dask.Array in python-mrcz?
  • this PR introduce a non pure python dependency (blosc), which can possibly brake hyperspy installation on some system. For example, I could not install blosc out of the box (install issue? Blosc/python-blosc#127). I am not sure if this issue is solved.

I have tested the Gohlke's blosc binary in the Winpython HyperSpy bundle and it works.

@robbmcleod
Copy link
Contributor Author

I'll look into writing from a dask.Array, it's probably just a matter of calling numpy.asarray().

For blosc, perhaps we can ask Matthew Brett for help. I'll ask Francesc if he would like to move blosc into having manywheels support.

@robbmcleod
Copy link
Contributor Author

Also, quick point: MRCZ doesn't list blosc as a requirement because the import is inside a try block. If blosc isn't present, compression is disabled, but it won't break anyone's install.

@ericpre
Copy link
Member

ericpre commented Sep 28, 2017

Ok great, then we will able to merge this PR even if the blosc wheels/binaries are not ready yet.
Regarding writing from dask.Array, I am not sure if this is relevant here but it may be worth at having a quick look at #1743.

@robbmcleod
Copy link
Contributor Author

I added support for saving from dask.array.core.Array in python-mrcz 0.2.5. My tests suggest the ideal way would be to pull off the chunks one-by-one and compress them but blosc doesn't really support that. c-blosc2 might but it's not Python-aware.

import dask.array as da
import numpy as np
from time import perf_counter
import mrcz

t0 = perf_counter()
foo = da.ones( (50,2048,2048), chunks=(1,2048,2048) )
t1 = perf_counter()
np.array(foo)
t2 = perf_counter()
foo.__array__()
t3 = perf_counter()
for I in np.arange(foo.shape[0]):
    foo[I,:,:].__array__()
t4 = perf_counter()
mrcz.asyncWriteMRC( foo, 'foo.mrcz', compressor='lz4' )
t5 = perf_counter()

print( f"Create da time: {t1-t0:.2e} s" )
print( f"Convert to array: {t2-t1:.3f} s" )
print( f"Call __array__: {t3-t2:.3f} s" )
print( f"Sliced: {t4-t3:.3f} s" )
print( f"MRCZ save: {t5-t4:.2e} s" )

>>> Create da time: 3.96e-04 s
>>> Convert to array: 1.458 s
>>> Call __array__: 0.888 s
>>> Sliced: 0.497 s
>>> MRCZ save: 1.16e-03 s

I'm not sure it's a good idea to use asyncWriteMRC because Dask is pure-python, but it might help if the user calls some GIL-releasing function afterward.

@ericpre
Copy link
Member

ericpre commented Sep 29, 2017

This is great, thanks @robbmcleod for this nice contribution! I just added the tests for the lazy case and move blosc to extras-require to make it optional. Once appveyor and travis tests are passed, we can merge.

@ericpre
Copy link
Member

ericpre commented Oct 9, 2017

Looking at this travis log, mrcz still depends on blosc.

@robbmcleod
Copy link
Contributor Author

Quick status update: we're in the process of getting multiwheels support for blosc. Hopefully that will be ready by the end of the week.

@ericpre
Copy link
Member

ericpre commented Feb 19, 2018

@robbmcleod, since the master branch of mrcz now raises an ImportError when blosc is not installed, I have fixed the tests accordingly in this PR. If you release a new version of mrcz, then we can merge this PR. Do you have an idea when you will make a new release?

@robbmcleod
Copy link
Contributor Author

The master branch of mrcz 0.3.4 does not raise an ImportError if blosc is missing, it swallows it and disables the compression tools. The mrcz.test() unit tests do generate an error, so is that what you mean?

I just tested it with a fresh conda install, and then installed numpy and then mrcz via pip and it seems to work as intended.

@ericpre
Copy link
Member

ericpre commented Feb 19, 2018

I am talking about this version, which raises an ImportError (only when compression is needed). This is good because it informs the user to install blocs when it is required. If you test this branch with em-MRCZ/python-mrcz@ce67e49, then it all works nicely: the tests catch the error when it is needed and blosc can be an optional dependency.

@robbmcleod
Copy link
Contributor Author

Oh, right, I forgot about that. Let me clean up the docstrings so they meet NumPy standards and I'll release 0.3.5.

@robbmcleod
Copy link
Contributor Author

Eric,

The 0.3.5 release is live now:

em-MRCZ/python-mrcz@3cd4ffe

https://pypi.python.org/pypi/mrcz

@ericpre ericpre merged commit 9142576 into hyperspy:RELEASE_next_minor Feb 20, 2018
@robbmcleod
Copy link
Contributor Author

robbmcleod commented Feb 20, 2018

I see you had a random test failure on the asynchronous tests. You might want to consider re-triggering those builds. I haven't had trouble with them, but weird things can happen between when Python thinks the file write is finished, and when the OS thinks the file write is finished. If you encounter this problem more than once, the only fix I could propose would be a small sleep in-between getting the result from the Future and reading. That or disabling that test.

@ericpre
Copy link
Member

ericpre commented Mar 3, 2018

Yes, it did happen a few times, so I added (#1852) a sleep, as you recommended and so far it seems to do the job.

@francisco-dlp francisco-dlp added this to the v1.4 milestone Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants