New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Numpy pickle to single file #260

Merged
merged 6 commits into from May 10, 2016

Conversation

Projects
None yet
5 participants
@aabadie
Contributor

aabadie commented Oct 23, 2015

I gave a try to gzip.GzipFile to dump numpy object. This PR is a POC of such usage and also contains a fairly large refactoring of numpy_pickle.

In terms of performance, I reused the bench script from #255. With Numpy 1.10, there's no memory copy but the read/write is slightly slower (31s dump, 5.5s read). It also makes the code more 'readable'.
I had to comment out the test regarding the compatibility between pickles.

Waiting for the CI's


This change is Review on Reviewable

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Oct 24, 2015

Member
Member

GaelVaroquaux commented Oct 24, 2015

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Oct 26, 2015

Contributor

How does it compare to what we currently have?

For an array of 763MB, it adds a read/write time overhead of approximately 10% : 28s write with the actual master against 31s with this PR (4.8s against 5s for reading). This extra time consuming is balanced with a stable memory consumption (for both compress and not compressed serialization), at least for recent versions of numpy, and be explained by the CRC computation used in GzipFile, which is not performed with the direct usage of zlib.
Note also that this PR is not fully complete, it misses improvements:

  • the control file (containing the main serialized object without arrays/matrices) is not compressed
  • the whole object (including arrays) could be serialized in a single file using a smart seeking in the file. This could be part of an extra PR.
  • last, but not least : this PR should also contains a retrocompatibility mecanism with previous versions of the cache format.
Contributor

aabadie commented Oct 26, 2015

How does it compare to what we currently have?

For an array of 763MB, it adds a read/write time overhead of approximately 10% : 28s write with the actual master against 31s with this PR (4.8s against 5s for reading). This extra time consuming is balanced with a stable memory consumption (for both compress and not compressed serialization), at least for recent versions of numpy, and be explained by the CRC computation used in GzipFile, which is not performed with the direct usage of zlib.
Note also that this PR is not fully complete, it misses improvements:

  • the control file (containing the main serialized object without arrays/matrices) is not compressed
  • the whole object (including arrays) could be serialized in a single file using a smart seeking in the file. This could be part of an extra PR.
  • last, but not least : this PR should also contains a retrocompatibility mecanism with previous versions of the cache format.
@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 12, 2015

Contributor

the control file (containing the main serialized object without arrays/matrices) is not compressed

This was done in 4f58113

the whole object (including arrays) could be serialized in a single file using a smart seeking in the file. This could be part of an extra PR.

I just pushed caeb704 which does this. But it has several drawbacks:

  • tests are broken because the actual state doesn't support memory mapping
  • I had to patch numpy locally in order to make the gzip file compression work. Here, numpy overload GzipFile object seek() function in order to be able to seek using negative values. The problem is that numpy assumes the file only contains one array which is not our case here. The only solution is to include numpy array load/save implementation in joblib.

last, but not least : this PR should also contains a retrocompatibility mecanism with previous versions of the cache format.

This is done in this branch (works in 4f58113)

Contributor

aabadie commented Nov 12, 2015

the control file (containing the main serialized object without arrays/matrices) is not compressed

This was done in 4f58113

the whole object (including arrays) could be serialized in a single file using a smart seeking in the file. This could be part of an extra PR.

I just pushed caeb704 which does this. But it has several drawbacks:

  • tests are broken because the actual state doesn't support memory mapping
  • I had to patch numpy locally in order to make the gzip file compression work. Here, numpy overload GzipFile object seek() function in order to be able to seek using negative values. The problem is that numpy assumes the file only contains one array which is not our case here. The only solution is to include numpy array load/save implementation in joblib.

last, but not least : this PR should also contains a retrocompatibility mecanism with previous versions of the cache format.

This is done in this branch (works in 4f58113)

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 12, 2015

Member
file only contains one array which is not our case here. The only
solution is to include numpy array load/save implementation in
joblib.

Can we then only reimplement the minimal amount that we need, and not
copy the whole codebase?

Member

GaelVaroquaux commented Nov 12, 2015

file only contains one array which is not our case here. The only
solution is to include numpy array load/save implementation in
joblib.

Can we then only reimplement the minimal amount that we need, and not
copy the whole codebase?

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 12, 2015

Contributor

Can we then only reimplement the minimal amount that we need, and not
copy the whole codebase?

Yes, that's the plan. It should also helps with the memmap case.

Contributor

aabadie commented Nov 12, 2015

Can we then only reimplement the minimal amount that we need, and not
copy the whole codebase?

Yes, that's the plan. It should also helps with the memmap case.

@aabadie aabadie changed the title from [WIP] Gzip pickling to [WIP] Numpy pickle to single file Nov 13, 2015

@aabadie aabadie changed the title from [WIP] Numpy pickle to single file to [MRG] Numpy pickle to single file Nov 13, 2015

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 13, 2015

Contributor

As the status of this PR is imho in a good shape, I changed the status to MRG and renamed it to better reflect what it contains : now object are serialized in a single file, even compressed file and memory maps.
Note that there is a basic retrocompatibility mecanism which means that the future new version of joblib will be able to read files containing object cached with older versions. The dump will use the new format version.

All remaining problems mentionned above are solved:

  • as suggested, the gzip.GzipFile issue was solved by reimplementing (not to say copy-pasting), the minimum amount of required code from numpy in joblib. I'm not if I did that the best way, but at least it works now. The compression level is 3 (it can be from 0 to 9 as said in the doc).
  • the memory mapping works as well using the same technique. I just had to slightly change the way the implementation of the numpy open_memmap function : I introduced a parameter for setting the offset in the file where the numpy array is serialized.

Maybe some more tests could be added. I'll post here some memory/speed benchmarks just to compare with the actual implmentation.

Appveyor is stuck, I don't know why.

Waiting for you comments.

Contributor

aabadie commented Nov 13, 2015

As the status of this PR is imho in a good shape, I changed the status to MRG and renamed it to better reflect what it contains : now object are serialized in a single file, even compressed file and memory maps.
Note that there is a basic retrocompatibility mecanism which means that the future new version of joblib will be able to read files containing object cached with older versions. The dump will use the new format version.

All remaining problems mentionned above are solved:

  • as suggested, the gzip.GzipFile issue was solved by reimplementing (not to say copy-pasting), the minimum amount of required code from numpy in joblib. I'm not if I did that the best way, but at least it works now. The compression level is 3 (it can be from 0 to 9 as said in the doc).
  • the memory mapping works as well using the same technique. I just had to slightly change the way the implementation of the numpy open_memmap function : I introduced a parameter for setting the offset in the file where the numpy array is serialized.

Maybe some more tests could be added. I'll post here some memory/speed benchmarks just to compare with the actual implmentation.

Appveyor is stuck, I don't know why.

Waiting for you comments.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 13, 2015

Member

For benchmarks, have a look at the following link (hint: there is a link
to a gist with code to benchmark in it)
http://gael-varoquaux.info/programming/joblib-beta-release-fast-compressed-persistence-python-3.html

Member

GaelVaroquaux commented Nov 13, 2015

For benchmarks, have a look at the following link (hint: there is a link
to a gist with code to benchmark in it)
http://gael-varoquaux.info/programming/joblib-beta-release-fast-compressed-persistence-python-3.html

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 13, 2015

Contributor

Thanks, I'll try it asap !

Contributor

aabadie commented Nov 13, 2015

Thanks, I'll try it asap !

Show outdated Hide outdated joblib/numpy_pickle.py
###############################################################################
# Utility objects for persistence.
class NDArrayWrapper(object):
class NPArrayWrapper(object):

This comment has been minimized.

@lesteve

lesteve Nov 17, 2015

Contributor

We need to find a better name than this.

@lesteve

lesteve Nov 17, 2015

Contributor

We need to find a better name than this.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

We need to find a better name than this.

What about NumpyArrayWrapper ?

@aabadie

aabadie Nov 18, 2015

Contributor

We need to find a better name than this.

What about NumpyArrayWrapper ?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Nov 18, 2015

Member
@GaelVaroquaux

GaelVaroquaux via email Nov 18, 2015

Member
Show outdated Hide outdated joblib/numpy_pickle.py
file_handle.write(zlib.compress(asbytes(data), compress))
Utility that check if the version is supported. An exception is raised if
version is not supported.
Parameters

This comment has been minimized.

@lesteve

lesteve Nov 17, 2015

Contributor

Not that important at this time of the PR but it's probably worth reading https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt and fixing up the docstrings.

For example on this particular docstring: short description that fit on one-line, newline, longer description, newline,
sections like Parameters etc ...

@lesteve

lesteve Nov 17, 2015

Contributor

Not that important at this time of the PR but it's probably worth reading https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt and fixing up the docstrings.

For example on this particular docstring: short description that fit on one-line, newline, longer description, newline,
sections like Parameters etc ...

Show outdated Hide outdated joblib/numpy_pickle.py
"Store the useful information for later"
self.filename = filename
def __init__(self, subclass, allow_mmap=True, offset=c_int64(-1)):

This comment has been minimized.

@lesteve

lesteve Nov 17, 2015

Contributor

According to https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#class-docstring, __init__ should be documented inside the class docstring.

@lesteve

lesteve Nov 17, 2015

Contributor

According to https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#class-docstring, __init__ should be documented inside the class docstring.

This comment has been minimized.

@aabadie

aabadie Nov 17, 2015

Contributor

I agree and I'll change this. But notice that this docstring was already there before by changes (I know it's not an excuse ;) ).

@aabadie

aabadie Nov 17, 2015

Contributor

I agree and I'll change this. But notice that this docstring was already there before by changes (I know it's not an excuse ;) ).

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 17, 2015

Contributor

As promised to @GaelVaroquaux, I tested your gist locally, just to compare with other implementations.

See the results:

  • Read speed with this PR:
    compare_libraries_read
  • Read speed with master:
    compare_libraries_read_master
  • Write speed with this PR:
    compare_libraries_write
  • write speed with master:
    compare_libraries_write_master

The conclusion is that the speed is clearly slower with this PR. And as you noticed in your blog post, gzip.GZipFile might not be an option for cache compression.

Contributor

aabadie commented Nov 17, 2015

As promised to @GaelVaroquaux, I tested your gist locally, just to compare with other implementations.

See the results:

  • Read speed with this PR:
    compare_libraries_read
  • Read speed with master:
    compare_libraries_read_master
  • Write speed with this PR:
    compare_libraries_write
  • write speed with master:
    compare_libraries_write_master

The conclusion is that the speed is clearly slower with this PR. And as you noticed in your blog post, gzip.GZipFile might not be an option for cache compression.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 17, 2015

Contributor

Looking at the plots, it seems that the datasets labels are mixed. I'll update them.

Contributor

aabadie commented Nov 17, 2015

Looking at the plots, it seems that the datasets labels are mixed. I'll update them.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 17, 2015

Contributor

Looking at the plots, it seems that the datasets labels are mixed. I'll update them.

Replying to myself. It double checked and all is ok. It's just the write speed using F-Contiguous data which has a significant impact on performance (especially when uwing pytables with zlib).

Contributor

aabadie commented Nov 17, 2015

Looking at the plots, it seems that the datasets labels are mixed. I'll update them.

Replying to myself. It double checked and all is ok. It's just the write speed using F-Contiguous data which has a significant impact on performance (especially when uwing pytables with zlib).

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 17, 2015

Member

It doesn't seem to me that the impact of this PR on read/write speed is very large.

Member

GaelVaroquaux commented Nov 17, 2015

It doesn't seem to me that the impact of this PR on read/write speed is very large.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 17, 2015

Member

As a side note, when this PR is merged, it would be useful to test lzma for compression under Python 3: it could be much faster. I have added an enhancement issue for this: #273.

Member

GaelVaroquaux commented Nov 17, 2015

As a side note, when this PR is merged, it would be useful to test lzma for compression under Python 3: it could be much faster. I have added an enhancement issue for this: #273.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 17, 2015

Contributor

I also want to try the mmap case using read-write mode and especially when the persisted object contains multiple arrays. I'm wondering if the arrays could overlap at the end of the file resulting in a corrupted file.

Contributor

aabadie commented Nov 17, 2015

I also want to try the mmap case using read-write mode and especially when the persisted object contains multiple arrays. I'm wondering if the arrays could overlap at the end of the file resulting in a corrupted file.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 17, 2015

Member
Member

GaelVaroquaux commented Nov 17, 2015

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 17, 2015

Contributor

Out of interest have you checked the memory usage of this PR? I am suspecting that everything will be fine since we are using np.save and np.load but it'd be good to make sure this is the case.

Contributor

lesteve commented Nov 17, 2015

Out of interest have you checked the memory usage of this PR? I am suspecting that everything will be fine since we are using np.save and np.load but it'd be good to make sure this is the case.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 17, 2015

Contributor

As promised to @GaelVaroquaux, I tested your gist locally, just to compare with other implementations.

See the results:

I would compare master against this PR on the same plot use a single dataset (either MNI or Juelich), and possibly removing all the pytables stuff to make the comparison easier.

According to the original blog post the arrays are of the order of ~100MB. Maybe we should try with bigger arrays and/or multiple arrays.

Contributor

lesteve commented Nov 17, 2015

As promised to @GaelVaroquaux, I tested your gist locally, just to compare with other implementations.

See the results:

I would compare master against this PR on the same plot use a single dataset (either MNI or Juelich), and possibly removing all the pytables stuff to make the comparison easier.

According to the original blog post the arrays are of the order of ~100MB. Maybe we should try with bigger arrays and/or multiple arrays.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 17, 2015

Contributor

You'll need to add generated pickles in joblib/test/data. The script joblib/test/data/create_numpy_pickle.py should help you. It would be great to use similar numpy versions as the one that were used for the joblib 0.9.2 ones.

By the way we probably want to bump up the version to 0.10.0 because of this change of numpy pickle format.

Contributor

lesteve commented Nov 17, 2015

You'll need to add generated pickles in joblib/test/data. The script joblib/test/data/create_numpy_pickle.py should help you. It would be great to use similar numpy versions as the one that were used for the joblib 0.9.2 ones.

By the way we probably want to bump up the version to 0.10.0 because of this change of numpy pickle format.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 17, 2015

Member
Member

GaelVaroquaux commented Nov 17, 2015

Show outdated Hide outdated joblib/numpy_pickle.py
np_array_wrapper = self.stack.pop()
array = np_array_wrapper.read(self)
# push back the reconstructed on the unpickler stack
self.stack.append(array)

This comment has been minimized.

@lesteve

lesteve Nov 17, 2015

Contributor

This code is duplicated from above you could tackle both exception types using a tuple as the isinstance argument:

if isinstance(self.stack[-1], (NDArrayWrapper, NPArrayWrapper)):

Not sure about that, but if we want to get rid of the deprecated pickle format one day, maybe we want to have a deprecation warning for the NDArrayWrapper case, asking the user nicely to regenerate his pickle. This could get very quickly annoying for old joblib caches though.

@lesteve

lesteve Nov 17, 2015

Contributor

This code is duplicated from above you could tackle both exception types using a tuple as the isinstance argument:

if isinstance(self.stack[-1], (NDArrayWrapper, NPArrayWrapper)):

Not sure about that, but if we want to get rid of the deprecated pickle format one day, maybe we want to have a deprecation warning for the NDArrayWrapper case, asking the user nicely to regenerate his pickle. This could get very quickly annoying for old joblib caches though.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

This code is duplicated from above

Done.
About the deprecation warning, it's an interesting idea but indeed it can be annoying if there are several arrays in the pickle file.

@aabadie

aabadie Nov 18, 2015

Contributor

This code is duplicated from above

Done.
About the deprecation warning, it's an interesting idea but indeed it can be annoying if there are several arrays in the pickle file.

Show outdated Hide outdated joblib/numpy_pickle.py
raise ValueError(msg % (version,))
def _read_bytes(fp, size, error_template="ran out of data"):

This comment has been minimized.

@lesteve

lesteve Nov 17, 2015

Contributor

You needed to copy and paste some code from numpy IIRC. Can you remind me why?

It' be great if it was labeled clearly as such in each function docstring and you mentioned which version of numpy you took it from.

@lesteve

lesteve Nov 17, 2015

Contributor

You needed to copy and paste some code from numpy IIRC. Can you remind me why?

It' be great if it was labeled clearly as such in each function docstring and you mentioned which version of numpy you took it from.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

Added a line about that in the docstrings.

@aabadie

aabadie Nov 18, 2015

Contributor

Added a line about that in the docstrings.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

You needed to copy and paste some code from numpy IIRC. Can you remind me why?

I forgot to reply to this, sorry. I did that because the actual numpy code for reading an ndarray from a file make the assumption that the file only contains a serialized array (same story with memory mapping). First I could make it work by patching my local installation of numpy, but we agreed with @GaelVaroquaux that copying the minimum required numpy code (which is quite important I admit) was a solution.

@aabadie

aabadie Nov 18, 2015

Contributor

You needed to copy and paste some code from numpy IIRC. Can you remind me why?

I forgot to reply to this, sorry. I did that because the actual numpy code for reading an ndarray from a file make the assumption that the file only contains a serialized array (same story with memory mapping). First I could make it work by patching my local installation of numpy, but we agreed with @GaelVaroquaux that copying the minimum required numpy code (which is quite important I admit) was a solution.

This comment has been minimized.

@lesteve

lesteve Nov 18, 2015

Contributor

You need to explain that somewhere in a comment (the exact reason why we can not use numpy directly) and also point out where you needed to make the change to make it work for our use case.

Just wondering for the functions that are labeled as "Taken from numpy 1.10" can we not import them from numpy?

@lesteve

lesteve Nov 18, 2015

Contributor

You need to explain that somewhere in a comment (the exact reason why we can not use numpy directly) and also point out where you needed to make the change to make it work for our use case.

Just wondering for the functions that are labeled as "Taken from numpy 1.10" can we not import them from numpy?

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

Just wondering for the functions that are labeled as "Taken from numpy 1.10" can we not import them from numpy?

Yes, we can... I applied this change but it might have unexpected behaviour in ci matrix.

@aabadie

aabadie Nov 18, 2015

Contributor

Just wondering for the functions that are labeled as "Taken from numpy 1.10" can we not import them from numpy?

Yes, we can... I applied this change but it might have unexpected behaviour in ci matrix.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

Apparently, CI is not happy at all. It's strange because those functions are present in previous versions of joblib.

@aabadie

aabadie Nov 18, 2015

Contributor

Apparently, CI is not happy at all. It's strange because those functions are present in previous versions of joblib.

This comment has been minimized.

@aabadie

aabadie Nov 19, 2015

Contributor

Ok I dug a bit more in numpy code and it appears that the numpy array read function I took from numpy were added in 1.9 (read_array_header, etc). They are used with the open_mmap function, which is really the one that needs this code and that I had to patch to cope with the single file pickling.

@aabadie

aabadie Nov 19, 2015

Contributor

Ok I dug a bit more in numpy code and it appears that the numpy array read function I took from numpy were added in 1.9 (read_array_header, etc). They are used with the open_mmap function, which is really the one that needs this code and that I had to patch to cope with the single file pickling.

This comment has been minimized.

@aabadie

aabadie Nov 19, 2015

Contributor

@lesteve, I did some cleanup in the git history and reverted my previous changes (removed code taken from numpy) to use only the strict minimum to support numpy > 1.6.

@aabadie

aabadie Nov 19, 2015

Contributor

@lesteve, I did some cleanup in the git history and reverted my previous changes (removed code taken from numpy) to use only the strict minimum to support numpy > 1.6.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 18, 2015

Contributor

Out of interest have you checked the memory usage of this PR?

Yes, the memory footprint is stable. See the following tests, with a 763 MB array using compressed and non compressed cases for python 2.7 and python 3.4.

  • Python 2.7:
    python27-pr260
  • Python 3.4:
    python34-pr260
Contributor

aabadie commented Nov 18, 2015

Out of interest have you checked the memory usage of this PR?

Yes, the memory footprint is stable. See the following tests, with a 763 MB array using compressed and non compressed cases for python 2.7 and python 3.4.

  • Python 2.7:
    python27-pr260
  • Python 3.4:
    python34-pr260
Show outdated Hide outdated joblib/test/test_numpy_pickle.py
for fname in data_filenames:
_check_pickle(fname, expected_list)
expected_list = [np.arange(5, dtype=np.int64),

This comment has been minimized.

@lesteve

lesteve Nov 18, 2015

Contributor

revert your additional space here and in all the lines below.

@lesteve

lesteve Nov 18, 2015

Contributor

revert your additional space here and in all the lines below.

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

Done

@aabadie

aabadie Nov 18, 2015

Contributor

Done

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
@@ -274,7 +260,7 @@ def test_compressed_pickle_dump_and_load():
# or smaller than cache_size)
for cache_size in [0, 1e9]:
try:
dumped_filenames = numpy_pickle.dump(
dumped_file = numpy_pickle.dump(

This comment has been minimized.

@aabadie

aabadie Nov 18, 2015

Contributor

This should be reverted as well.

@aabadie

aabadie Nov 18, 2015

Contributor

This should be reverted as well.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 19, 2015

Contributor

Sound like we need a test :)

@GaelVaroquaux, I just added a test that raises an issue when unpickling an object containing multiple arrays using memory map => the next array offset was not correctly updated and the returned array could be wrong.

Contributor

aabadie commented Nov 19, 2015

Sound like we need a test :)

@GaelVaroquaux, I just added a test that raises an issue when unpickling an object containing multiple arrays using memory map => the next array offset was not correctly updated and the returned array could be wrong.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 19, 2015

Contributor

I would compare master against this PR on the same plot use a single dataset (either MNI or Juelich), and possibly removing all the pytables stuff to make the comparison easier.

Here are some comparison with master. When using the compression, this PR is clearly slower than the actual implementation in master (of course, in this PR "zlibX" == compression level of X).

compare_impl_read
compare_impl_write

Contributor

aabadie commented Nov 19, 2015

I would compare master against this PR on the same plot use a single dataset (either MNI or Juelich), and possibly removing all the pytables stuff to make the comparison easier.

Here are some comparison with master. When using the compression, this PR is clearly slower than the actual implementation in master (of course, in this PR "zlibX" == compression level of X).

compare_impl_read
compare_impl_write

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 19, 2015

Member

It's particularly slower in read. It would be interesting to understand why. I don't see a fundemental reason for this to happen (other than the fact that master has been carefully crafted).

Member

GaelVaroquaux commented Nov 19, 2015

It's particularly slower in read. It would be interesting to understand why. I don't see a fundemental reason for this to happen (other than the fact that master has been carefully crafted).

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 19, 2015

Contributor

Here are some comparison with master

zlib3 is the default (i.e. compressed=True). I think you should show it in your comparison plot, maybe show zlib3 rather than zlib1.

Contributor

lesteve commented Nov 19, 2015

Here are some comparison with master

zlib3 is the default (i.e. compressed=True). I think you should show it in your comparison plot, maybe show zlib3 rather than zlib1.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 19, 2015

Contributor

Also what is the unit of the y axis?

Contributor

lesteve commented Nov 19, 2015

Also what is the unit of the y axis?

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 20, 2015

Contributor

Also what is the unit of the y axis?

seconds

Contributor

aabadie commented Nov 20, 2015

Also what is the unit of the y axis?

seconds

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 20, 2015

Contributor

It's particularly slower in read. It would be interesting to understand why. I don't see a fundemental reason for this to happen (other than the fact that master has been carefully crafted).

I remember @ogrisel talking of CRC check perfomed by GzipFile.

Contributor

aabadie commented Nov 20, 2015

It's particularly slower in read. It would be interesting to understand why. I don't see a fundemental reason for this to happen (other than the fact that master has been carefully crafted).

I remember @ogrisel talking of CRC check perfomed by GzipFile.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 20, 2015

Contributor

zlib3 is the default (i.e. compressed=True). I think you should show it in your comparison plot, maybe show zlib3 rather than zlib1.

I updated the plots in my previous comment.

Contributor

aabadie commented Nov 20, 2015

zlib3 is the default (i.e. compressed=True). I think you should show it in your comparison plot, maybe show zlib3 rather than zlib1.

I updated the plots in my previous comment.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Nov 20, 2015

Member
Member

GaelVaroquaux commented Nov 20, 2015

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Nov 20, 2015

Contributor

I think that in master I used zlib's interface, and not GzipFile

There's also the seeking in the file that could explain this slowdown : the content is decompress on-the-fly while unpickling and once an array wrapper is found on the stack, this PR seek forward to go to the beginning of the array (which is right after the pickle) and once it's read, it goes back to initial position in the pickle. My guess is that seeking requires extra decompression steps but I might be wrong.

Contributor

aabadie commented Nov 20, 2015

I think that in master I used zlib's interface, and not GzipFile

There's also the seeking in the file that could explain this slowdown : the content is decompress on-the-fly while unpickling and once an array wrapper is found on the stack, this PR seek forward to go to the beginning of the array (which is right after the pickle) and once it's read, it goes back to initial position in the pickle. My guess is that seeking requires extra decompression steps but I might be wrong.

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
# We don't reconstruct memmaps
nose.tools.assert_true(isinstance(obj_, type(obj)))
np.testing.assert_array_equal(obj_, obj)

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

Should you not be testing even when obj is a np.memmap? I.e. could this line could be dedented by 4 spaces?

@lesteve

lesteve Mar 10, 2016

Contributor

Should you not be testing even when obj is a np.memmap? I.e. could this line could be dedented by 4 spaces?

This comment has been minimized.

@aabadie

aabadie Mar 10, 2016

Contributor

It's failing with numpy 1.6. Changing this line to create a regular np.memmap seems to work.

@aabadie

aabadie Mar 10, 2016

Contributor

It's failing with numpy 1.6. Changing this line to create a regular np.memmap seems to work.

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
@with_numpy
def test_numpy_persistence_bufferred_array_compression():
big_array = np.ones((_IO_BUFFER_SIZE + 100), dtype=np.uint8)
small_array = np.ones((100), dtype=np.uint8)

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

I would just keep the big array, I reckon we have plenty of other tests already for small arrays.

@lesteve

lesteve Mar 10, 2016

Contributor

I would just keep the big array, I reckon we have plenty of other tests already for small arrays.

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
nose.tools.assert_true(isinstance(obj_loaded.array_int, np.memmap))
nose.tools.assert_false(obj_loaded.array_int.flags.writeable)
# Memory map not allowed for numpy object arrays
nose.tools.assert_true(isinstance(obj_loaded.array_obj,

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

This test would pass even if obj_loaded.array_obj was memmaped. since memmap is a subclass of ndarray.

Use something like:

nose.tools.assert_false(isinstance(obj_loaded.array_obj, np.memmap))
@lesteve

lesteve Mar 10, 2016

Contributor

This test would pass even if obj_loaded.array_obj was memmaped. since memmap is a subclass of ndarray.

Use something like:

nose.tools.assert_false(isinstance(obj_loaded.array_obj, np.memmap))
Show outdated Hide outdated joblib/test/test_numpy_pickle.py
@@ -358,6 +502,10 @@ def test_joblib_pickle_across_python_versions():
# compatibility alias for .tobytes which was
# added in 1.9.0
np.arange(256, dtype=np.uint8).tostring(),
# np.matrix is a subclass of nd.array, here we want

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

np.ndarray

@lesteve

lesteve Mar 10, 2016

Contributor

np.ndarray

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

There are multiple instances of this (probably was like this originally ...) do a search and replace.

@lesteve

lesteve Mar 10, 2016

Contributor

There are multiple instances of this (probably was like this originally ...) do a search and replace.

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
################################################################################
def _check_compression_format(filename, expected_list):
if (sys.version_info[:2] < (3, 3) and (filename.endswith('xz') or

This comment has been minimized.

@lesteve

lesteve Mar 10, 2016

Contributor

You could use not PY3_OR_LATER which in our case means the same.

@lesteve

lesteve Mar 10, 2016

Contributor

You could use not PY3_OR_LATER which in our case means the same.

This comment has been minimized.

@aabadie

aabadie Mar 10, 2016

Contributor

-################################################################################
+
+def _check_compression_format(filename, expected_list):

  • if (sys.version_info[:2] < (3, 3) and (filename.endswith('xz') or

You could use not PY3_OR_LATER which in our case means the same.

Done

@aabadie

aabadie Mar 10, 2016

Contributor

-################################################################################
+
+def _check_compression_format(filename, expected_list):

  • if (sys.version_info[:2] < (3, 3) and (filename.endswith('xz') or

You could use not PY3_OR_LATER which in our case means the same.

Done

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Mar 15, 2016

Contributor

I just realized that tests were failing on windows. They are now fixed.

Contributor

aabadie commented Mar 15, 2016

I just realized that tests were failing on windows. They are now fixed.

Show outdated Hide outdated joblib/test/test_numpy_pickle_compat.py
env['filename'] = os.path.join(env['dir'], 'test.pkl')
print(80 * '_')
print('setup numpy_pickle')
print(80 * '_')

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

Please avoid printing stuff in tests.

@ogrisel

ogrisel May 4, 2016

Contributor

Please avoid printing stuff in tests.

Show outdated Hide outdated joblib/numpy_pickle_utils.py
self._size = -1
if not isinstance(compresslevel, int) or not (1 <= compresslevel <= 9):
raise ValueError("compresslevel must be between 1 and 9")

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

Please include the observed value of compresslevel in the message:

raise ValueError("compresslevel must be between 1 and 9, got %r"
                 % compresslevel)
@ogrisel

ogrisel May 4, 2016

Contributor

Please include the observed value of compresslevel in the message:

raise ValueError("compresslevel must be between 1 and 9, got %r"
                 % compresslevel)
Show outdated Hide outdated joblib/numpy_pickle_utils.py
def _check_not_closed(self):
if self.closed:
raise ValueError("I/O operation on closed file")

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

Whenever we give and error message on a file we should always provide the filename (if possible), for instance following something like:

fname = getattr(self._fp, 'name', None)
msg = "I/O operation on closed file"
if fname is not None:
    msg += " %s" fname
@ogrisel

ogrisel May 4, 2016

Contributor

Whenever we give and error message on a file we should always provide the filename (if possible), for instance following something like:

fname = getattr(self._fp, 'name', None)
msg = "I/O operation on closed file"
if fname is not None:
    msg += " %s" fname
Show outdated Hide outdated joblib/test/test_numpy_pickle.py
nose.tools.assert_raises(TypeError,
BinaryZlibFile, bad_file, 'rb')
for d in (b'a few data as bytes.',

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

"a little data": "data" is not countable (like "money" or "matter").

@ogrisel

ogrisel May 4, 2016

Contributor

"a little data": "data" is not countable (like "money" or "matter").

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
with open(filename, 'rb') as f:
with BinaryZlibFile(f) as fz:
nose.tools.assert_true(fz.readable())
if sys.version_info[:1] == 3:

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

>= 3 or > 2.

@ogrisel

ogrisel May 4, 2016

Contributor

>= 3 or > 2.

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

Or better:

if PY3_OR_LATER:
@ogrisel

ogrisel May 4, 2016

Contributor

Or better:

if PY3_OR_LATER:
def __init__(self):
self.array_float = np.arange(100, dtype='float64')
self.array_int = np.ones(100, dtype='int32')
self.array_obj = np.array(['a', 10, 20.0], dtype='object')

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

Nice :)

@ogrisel

ogrisel May 4, 2016

Contributor

Nice :)

Show outdated Hide outdated joblib/numpy_pickle.py
Parameters
-----------
value: any Python object
The object to store to disk
filename: string or pathlib.Path
filename: str or pathlib.Path
The path of the file in which it is to be stored

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

I would also make it such as if filename is a string and filename.endswith('.gz') and compress is None, then we would automatically set compress=('gzip', 3), and similarly for filenames ending in ".bz2" and ".xz".

What do you think @aabadie @GaelVaroquaux and @lesteve?

@ogrisel

ogrisel May 4, 2016

Contributor

I would also make it such as if filename is a string and filename.endswith('.gz') and compress is None, then we would automatically set compress=('gzip', 3), and similarly for filenames ending in ".bz2" and ".xz".

What do you think @aabadie @GaelVaroquaux and @lesteve?

This comment has been minimized.

@aabadie

aabadie May 4, 2016

Contributor

I like the idea and I just pushed a commit with the change. The documentation still has to be updated though.

Just one comment about this feature: the compression level is now set automatically to 3 which is not that nice I think. Maybe there's a way to use both file extension and compress.

@aabadie

aabadie May 4, 2016

Contributor

I like the idea and I just pushed a commit with the change. The documentation still has to be updated though.

Just one comment about this feature: the compression level is now set automatically to 3 which is not that nice I think. Maybe there's a way to use both file extension and compress.

This comment has been minimized.

@aabadie

aabadie May 4, 2016

Contributor

Maybe there's a way to use both file extension and compress.

I pushed a solution and updated the documentation.

@aabadie

aabadie May 4, 2016

Contributor

Maybe there's a way to use both file extension and compress.

I pushed a solution and updated the documentation.

Show outdated Hide outdated joblib/numpy_pickle_utils.py
_XZ_PREFIX, _LZMA_PREFIX))
# Buffer size used in io.BufferedReader and io.BufferedWriter
_IO_BUFFER_SIZE = 10 * 1024 ** 2

This comment has been minimized.

@ogrisel

ogrisel May 4, 2016

Contributor

I wonder if 1024 ** 2 would not be better. I think it should hide disk latency as well as 10 MiB but also make the buffer fit in CPU cache so might make it possible to reduce the DRAM bandwidth usage. Could you please run a quick bench to check that the performance is not negatively impacted? If 1024 ** 2 is enough, let's go for it.

@ogrisel

ogrisel May 4, 2016

Contributor

I wonder if 1024 ** 2 would not be better. I think it should hide disk latency as well as 10 MiB but also make the buffer fit in CPU cache so might make it possible to reduce the DRAM bandwidth usage. Could you please run a quick bench to check that the performance is not negatively impacted? If 1024 ** 2 is enough, let's go for it.

This comment has been minimized.

@aabadie

aabadie May 9, 2016

Contributor

@ogrisel, I applied all your comments and ran a few benches. Here are the bench results:

With 10 * 1024 ** 2

Object Compression Buffer Pickler/Unpickler dump time (s) load time (s) Disk used (MB)
dict (50.3MB) Zlib io.Buffered Joblib 8.4  5.8 4.47
list (8.7MB) Zlib io.Buffered Joblib 2.6  1.5 1.56
array random (80.0MB) Zlib io.Buffered Joblib 3.0  0.5 77.14

With 1024 ** 2

Object Compression Buffer Pickler/Unpickler dump time (s) load time (s) Disk used (MB)
dict (50.3MB) Zlib io.Buffered Joblib 8.4  5.6 4.47
list (8.7MB) Zlib io.Buffered Joblib 2.7  1.6 1.56
array random (80.0MB) Zlib io.Buffered Joblib 3.0  0.5 77.14

Results are comparable so changing to 1024 ** 2 is fine. Will apply the change if you agree.

@aabadie

aabadie May 9, 2016

Contributor

@ogrisel, I applied all your comments and ran a few benches. Here are the bench results:

With 10 * 1024 ** 2

Object Compression Buffer Pickler/Unpickler dump time (s) load time (s) Disk used (MB)
dict (50.3MB) Zlib io.Buffered Joblib 8.4  5.8 4.47
list (8.7MB) Zlib io.Buffered Joblib 2.6  1.5 1.56
array random (80.0MB) Zlib io.Buffered Joblib 3.0  0.5 77.14

With 1024 ** 2

Object Compression Buffer Pickler/Unpickler dump time (s) load time (s) Disk used (MB)
dict (50.3MB) Zlib io.Buffered Joblib 8.4  5.6 4.47
list (8.7MB) Zlib io.Buffered Joblib 2.7  1.6 1.56
array random (80.0MB) Zlib io.Buffered Joblib 3.0  0.5 77.14

Results are comparable so changing to 1024 ** 2 is fine. Will apply the change if you agree.

This comment has been minimized.

@ogrisel

ogrisel May 9, 2016

Contributor

Thank you very much for checking, +1 with the change.

@ogrisel

ogrisel May 9, 2016

Contributor

Thank you very much for checking, +1 with the change.

This comment has been minimized.

@aabadie

aabadie May 9, 2016

Contributor

Done ! @grisel, I'll now rebase into a single commit and then it should be fine.

@aabadie

aabadie May 9, 2016

Contributor

Done ! @grisel, I'll now rebase into a single commit and then it should be fine.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel May 9, 2016

Contributor

Can you also please squash the commits? Commit messages such as "addressing other comments" are really not interesting.

Contributor

ogrisel commented May 9, 2016

Can you also please squash the commits? Commit messages such as "addressing other comments" are really not interesting.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie May 9, 2016

Contributor

Can you also please squash the commits?

@ogrisel : history rewritten :)

Contributor

aabadie commented May 9, 2016

Can you also please squash the commits?

@ogrisel : history rewritten :)

Show outdated Hide outdated joblib/test/test_numpy_pickle.py
def _check_compression_format(filename, expected_list):
if (not PY3_OR_LATER and (filename.endswith('xz') or
filename.endswith('lzma'))):

This comment has been minimized.

@ogrisel

ogrisel May 10, 2016

Contributor

filename.endswith('.xz') or filename.endswith('.lzma')

@ogrisel

ogrisel May 10, 2016

Contributor

filename.endswith('.xz') or filename.endswith('.lzma')

This comment has been minimized.

@aabadie

aabadie May 10, 2016

Contributor

Good catch ! Just pushed the update.

@aabadie

aabadie May 10, 2016

Contributor

Good catch ! Just pushed the update.

# We are careful to open the file handle early and keep it open to
# avoid race-conditions on renames.
# That said, if data are stored in companion files, which can be

This comment has been minimized.

@ogrisel

ogrisel May 10, 2016

Contributor

"data" is a mass noun so it should always be singular: "data is stored"

@ogrisel

ogrisel May 10, 2016

Contributor

"data" is a mass noun so it should always be singular: "data is stored"

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel May 10, 2016

Contributor

Thanks @aabdie this LGTM. Merging.

Contributor

ogrisel commented May 10, 2016

Thanks @aabdie this LGTM. Merging.

@ogrisel ogrisel merged commit 8ed578b into joblib:master May 10, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.7%) to 89.489%
Details

@ogrisel ogrisel removed the need Review label May 10, 2016

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel May 10, 2016

Contributor

🍻

Contributor

ogrisel commented May 10, 2016

🍻

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux May 10, 2016

Member

Yehaaa!!!

Member

GaelVaroquaux commented May 10, 2016

Yehaaa!!!

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve May 10, 2016

Contributor

Great stuff!

Contributor

lesteve commented May 10, 2016

Great stuff!

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie May 10, 2016

Contributor

Woohoo !! 🎉

Contributor

aabadie commented May 10, 2016

Woohoo !! 🎉

'zerosize_ok'],
buffersize=buffersize,
order=self.order):
pickler.file_handle.write(chunk.tostring('C'))

This comment has been minimized.

@mrocklin

mrocklin May 11, 2016

Contributor

Question: is it possible to use the memoryview directly? This might avoid a copy

In [1]: import numpy as np

In [2]: x = np.ones(5)

In [3]: x.data
Out[3]: <memory at 0x7f201c81fe58>

In [4]: with open('foo.pkl', 'wb') as f:
    f.write(x.data)
@mrocklin

mrocklin May 11, 2016

Contributor

Question: is it possible to use the memoryview directly? This might avoid a copy

In [1]: import numpy as np

In [2]: x = np.ones(5)

In [3]: x.data
Out[3]: <memory at 0x7f201c81fe58>

In [4]: with open('foo.pkl', 'wb') as f:
    f.write(x.data)

This comment has been minimized.

@ogrisel

ogrisel May 12, 2016

Contributor

I think we tried that but it breaks for some versions of Python / numpy if I recall correctly. @aabadie do you confirm?

Anyway this is a small buffer so it introduce minimal memory overhead and I don't think the performance overhead is significant.

@ogrisel

ogrisel May 12, 2016

Contributor

I think we tried that but it breaks for some versions of Python / numpy if I recall correctly. @aabadie do you confirm?

Anyway this is a small buffer so it introduce minimal memory overhead and I don't think the performance overhead is significant.

This comment has been minimized.

@aabadie

aabadie May 12, 2016

Contributor

Thanks @mrocklin for the suggestion.
@ogrisel, I was running some benches/tests with this change. Indeed, it avoids a memory copy but as you said it's not significant as this code works with a few MB (16). Performance is the same.
It also works on all versions of python/numpy supported by joblib.

@aabadie

aabadie May 12, 2016

Contributor

Thanks @mrocklin for the suggestion.
@ogrisel, I was running some benches/tests with this change. Indeed, it avoids a memory copy but as you said it's not significant as this code works with a few MB (16). Performance is the same.
It also works on all versions of python/numpy supported by joblib.

This comment has been minimized.

@ogrisel

ogrisel May 12, 2016

Contributor

Also I think this will result in storing incorrect data in the pickle when the array is not contiguous.

@ogrisel

ogrisel May 12, 2016

Contributor

Also I think this will result in storing incorrect data in the pickle when the array is not contiguous.

This comment has been minimized.

@aabadie

aabadie May 12, 2016

Contributor

Just tried this using:

import numpy as np
import joblib

a = np.asarray(np.arange(100000000).reshape((1000, 500, 200)), order='F')[:, :1, :]
a.flags  # f_contiguous: False, c_contiguous: False, but aligned: True
joblib.dump(a, '/tmp/test.pkl')
np.allclose(a, joblib.load('/tmp/test.pkl'))  # Return True

Seems ok to me.

@aabadie

aabadie May 12, 2016

Contributor

Just tried this using:

import numpy as np
import joblib

a = np.asarray(np.arange(100000000).reshape((1000, 500, 200)), order='F')[:, :1, :]
a.flags  # f_contiguous: False, c_contiguous: False, but aligned: True
joblib.dump(a, '/tmp/test.pkl')
np.allclose(a, joblib.load('/tmp/test.pkl'))  # Return True

Seems ok to me.

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux May 12, 2016

Member

a = np.asarray(np.arange(100000000).reshape((1000, 500, 200)), order='F')[:, :1, :]

Try with [::2]

@GaelVaroquaux

GaelVaroquaux May 12, 2016

Member

a = np.asarray(np.arange(100000000).reshape((1000, 500, 200)), order='F')[:, :1, :]

Try with [::2]

This comment has been minimized.

@aabadie

aabadie May 12, 2016

Contributor

Try with [::2]

Works

@aabadie

aabadie May 12, 2016

Contributor

Try with [::2]

Works

This comment has been minimized.

@ogrisel

ogrisel May 12, 2016

Contributor

Interesting, then let's do a PR :)

@ogrisel

ogrisel May 12, 2016

Contributor

Interesting, then let's do a PR :)

This comment has been minimized.

@ogrisel

ogrisel May 12, 2016

Contributor

I think np.nditer must return contiguous chunks whatever the input.

@ogrisel

ogrisel May 12, 2016

Contributor

I think np.nditer must return contiguous chunks whatever the input.

This comment has been minimized.

@aabadie

aabadie May 12, 2016

Contributor

then let's do a PR

Here it is : #352 :)

@aabadie

aabadie May 12, 2016

Contributor

then let's do a PR

Here it is : #352 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment