[MRG] persistence in/from file objects #351

Merged
merged 16 commits into from Jun 29, 2016

Conversation

Projects
None yet
4 participants
@aabadie
Contributor

aabadie commented May 11, 2016

This PR should fix #341 and fix #312.

Examples of new usage:

import joblib
data = "some data to persist"
# persist to/from a raw file
with open('/tmp/test.pkl', 'wb') as f:
    joblib.dump(data, f)

with open('/tmp/test.pkl', 'rb') as f:
    print(joblib.load(f))

# compressors are also supported:
import gzip
with gzip.GzipFile('/tmp/test.pkl.gz', 'wb') as f:
    joblib.dump(data, f)

with gzip.GzipFile('/tmp/test.pkl.gz', 'rb') as f:
    print(joblib.load(f))

# auto detection of compression also works:
with open('/tmp/test.pkl.gz', 'rb') as f:
    print(joblib.load(f))

# it's also possible to pickle in a buffer in memory using io.BytesIO:
import io
buf = io.BytesIO()
joblib.dump(data, buf)
joblib.load(buf)

Test functions are still missing.

@aabadie aabadie added the in progress label May 11, 2016

@aabadie aabadie referenced this pull request May 11, 2016

Closed

In-memory caching #341

@aabadie aabadie changed the title from WIP : persistence in file handles to WIP : persistence using file handles May 11, 2016

@aabadie aabadie changed the title from WIP : persistence using file handles to [MRG] persistence using file handles May 12, 2016

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie May 12, 2016

Contributor

I think this is in a pretty good shape now, so changing the status to MRG.

A few comments though:

  • the implementation is not perfect (e.g new _COMPRESSOR_OBJS list, etc) so I'd like to have someone else's opinion,
  • forcing the decompressor to a wrong one (exemple: use gzip compression with dump but try to uncompress with bz2) will raise an exception. Should we handle this case ? My answer on this is 'no' as the message is quite meaningful but maybe other people won't agree with this,
  • online documentation and CHANGES.rst still need to be updated but this can be done after a preliminary review.
Contributor

aabadie commented May 12, 2016

I think this is in a pretty good shape now, so changing the status to MRG.

A few comments though:

  • the implementation is not perfect (e.g new _COMPRESSOR_OBJS list, etc) so I'd like to have someone else's opinion,
  • forcing the decompressor to a wrong one (exemple: use gzip compression with dump but try to uncompress with bz2) will raise an exception. Should we handle this case ? My answer on this is 'no' as the message is quite meaningful but maybe other people won't agree with this,
  • online documentation and CHANGES.rst still need to be updated but this can be done after a preliminary review.
@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie May 18, 2016

Contributor

@lesteve, I addressed what we discussed IRL. Feel free to have a look when you have time.

Contributor

aabadie commented May 18, 2016

@lesteve, I addressed what we discussed IRL. Feel free to have a look when you have time.

aabadie referenced this pull request in GaelVaroquaux/website May 20, 2016

joblib/numpy_pickle.py
@@ -389,6 +389,9 @@ def dump(value, filename, compress=0, protocol=None, cache_size=None):
if Path is not None and isinstance(filename, Path):
filename = str(filename)
+ isFilename = isinstance(filename, _basestring)
+ isFileobj = hasattr(filename, "read") and hasattr(filename, "write")

This comment has been minimized.

@ogrisel

ogrisel May 26, 2016

Contributor

🚨 ⚠️ 🐫 ⚠️ camelcase alert! ⚠️ 🐫 ⚠️ 🚨

@ogrisel

ogrisel May 26, 2016

Contributor

🚨 ⚠️ 🐫 ⚠️ camelcase alert! ⚠️ 🐫 ⚠️ 🚨

This comment has been minimized.

@aabadie

aabadie May 27, 2016

Contributor

Sorry, habits from my old Qt life. I renamed using 🐍 snake_case 🐍 😄

@aabadie

aabadie May 27, 2016

Contributor

Sorry, habits from my old Qt life. I renamed using 🐍 snake_case 🐍 😄

joblib/numpy_pickle.py
- new_exc.__cause__ = exc
- raise new_exc
+
+ if hasattr(filename, "read") and hasattr(filename, "write"):

This comment has been minimized.

@ogrisel

ogrisel May 26, 2016

Contributor

We don't need the "write" to load therefore testing only for "read" (and "seek" too).

@ogrisel

ogrisel May 26, 2016

Contributor

We don't need the "write" to load therefore testing only for "read" (and "seek" too).

This comment has been minimized.

@aabadie

aabadie May 27, 2016

Contributor

Agreed and fixed.

@aabadie

aabadie May 27, 2016

Contributor

Agreed and fixed.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 9, 2016

Contributor

@lesteve, @ogrisel , any comment ?

Contributor

aabadie commented Jun 9, 2016

@lesteve, @ogrisel , any comment ?

@aabadie aabadie changed the title from [MRG] persistence using file handles to [MRG] persistence using file object Jun 9, 2016

@aabadie aabadie changed the title from [MRG] persistence using file object to [MRG] persistence in/from file object Jun 9, 2016

@aabadie aabadie changed the title from [MRG] persistence in/from file object to [MRG] persistence in/from file objects Jun 9, 2016

joblib/numpy_pickle.py
@@ -467,12 +472,48 @@ def dump(value, filename, compress=0, protocol=None, cache_size=None):
NumpyPickler(f, protocol=protocol).dump(value)
else:

This comment has been minimized.

@lesteve

lesteve Jun 9, 2016

Contributor

Use a elif here so that all the conditions are on the same indentation level.

@lesteve

lesteve Jun 9, 2016

Contributor

Use a elif here so that all the conditions are on the same indentation level.

This comment has been minimized.

@aabadie

aabadie Jun 9, 2016

Contributor

Fixed !

@aabadie

aabadie Jun 9, 2016

Contributor

Fixed !

joblib/numpy_pickle_utils.py
@@ -147,20 +151,42 @@ def _read_fileobject(fileobj, filename, mmap_mode=None):
"""
# Detect if the fileobj contains compressed data.
compressor = _detect_compressor(fileobj)
+ if isinstance(fileobj, tuple(_COMPRESSOR_OBJS)):

This comment has been minimized.

@lesteve

lesteve Jun 9, 2016

Contributor

Remind me why is _read_fileobject a generator function again?

@lesteve

lesteve Jun 9, 2016

Contributor

Remind me why is _read_fileobject a generator function again?

This comment has been minimized.

@aabadie

aabadie Jun 15, 2016

Contributor

It was to be able to be able to manage in the same function the compatibility with old pickles. The problem was that the load_compatibility function had to be called from the initial calling function (e.g load). The solution is to yield the filename in case of old pickles and yield file objects with new pickles.

@aabadie

aabadie Jun 15, 2016

Contributor

It was to be able to be able to manage in the same function the compatibility with old pickles. The problem was that the load_compatibility function had to be called from the initial calling function (e.g load). The solution is to yield the filename in case of old pickles and yield file objects with new pickles.

This comment has been minimized.

@lesteve

lesteve Jun 17, 2016

Contributor

I think my question was why you were using yield rather than return but this discussion is probably best left out of this PR.

@lesteve

lesteve Jun 17, 2016

Contributor

I think my question was why you were using yield rather than return but this discussion is probably best left out of this PR.

@aabadie aabadie added this to the 0.10 milestone Jun 16, 2016

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 17, 2016

Contributor

Just for the record, here is the kind of things now possible with this PR: https://gist.github.com/aabadie/074587354d97d872aff6abb65510f618

Contributor

aabadie commented Jun 17, 2016

Just for the record, here is the kind of things now possible with this PR: https://gist.github.com/aabadie/074587354d97d872aff6abb65510f618

joblib/test/test_numpy_pickle.py
@@ -333,14 +334,15 @@ def test_compress_mmap_mode_warning():
numpy_pickle.load(this_filename, mmap_mode='r+')
nose.tools.assert_equal(len(caught_warnings), 1)
for warn in caught_warnings:
- nose.tools.assert_equal(warn.category, DeprecationWarning)
+ nose.tools.assert_equal(warn.category, Warning)

This comment has been minimized.

@lesteve

lesteve Jun 17, 2016

Contributor

Why DeprecationWarning -> Warning?

@lesteve

lesteve Jun 17, 2016

Contributor

Why DeprecationWarning -> Warning?

This comment has been minimized.

@aabadie

aabadie Jun 17, 2016

Contributor

Because this is the type of warning raised here and that we want to test: it checks one cannot load a compressed pickle using mmap_mode so I think there's no meaning using a DeprecationWarning in this case.

@aabadie

aabadie Jun 17, 2016

Contributor

Because this is the type of warning raised here and that we want to test: it checks one cannot load a compressed pickle using mmap_mode so I think there's no meaning using a DeprecationWarning in this case.

joblib/numpy_pickle_utils.py
- if compressor == 'zlib':
+ 'mmap_mode "%(mmap_mode)s" flag passed. mmap_mode '
+ 'option will be ignored.'
+ % locals(), Warning, stacklevel=2)

This comment has been minimized.

@lesteve

lesteve Jun 17, 2016

Contributor

For generic warnings, probably better not to specify anything here:

warnings.warn(message, stacklevel=2)

It will default to UserWarning.

@lesteve

lesteve Jun 17, 2016

Contributor

For generic warnings, probably better not to specify anything here:

warnings.warn(message, stacklevel=2)

It will default to UserWarning.

joblib/numpy_pickle_utils.py
@@ -49,6 +50,9 @@
# Supported compressors
_COMPRESSORS = ('zlib', 'bz2', 'lzma', 'xz', 'gzip')
+_COMPRESSOR_OBJS = [gzip.GzipFile, bz2.BZ2File]

This comment has been minimized.

@lesteve

lesteve Jun 17, 2016

Contributor

_COMPRESSOR_CLASSES would be a better name.

@lesteve

lesteve Jun 17, 2016

Contributor

_COMPRESSOR_CLASSES would be a better name.

@aabadie aabadie added the enhancement label Jun 20, 2016

joblib/numpy_pickle.py
+ 'This feature is not supported by joblib.')
+ new_exc.__cause__ = exc
+ raise new_exc
+ # Raise exception "as-is" with python 2

This comment has been minimized.

@lesteve

lesteve Jun 20, 2016

Contributor

Reraise exception with Python 2

@lesteve

lesteve Jun 20, 2016

Contributor

Reraise exception with Python 2

+
+
+@with_numpy
+def test_in_memory_persistence():

This comment has been minimized.

@lesteve

lesteve Jun 20, 2016

Contributor

Any good reason that this test is not part of the previous one?

@lesteve

lesteve Jun 20, 2016

Contributor

Any good reason that this test is not part of the previous one?

This comment has been minimized.

@aabadie

aabadie Jun 21, 2016

Contributor

The io.BytesIO object doesn't take a filename or opening mode as parameters and the same instance cannot be used from 2 different context (the first one closes it and it cannot be reopened afterwards).

@aabadie

aabadie Jun 21, 2016

Contributor

The io.BytesIO object doesn't take a filename or opening mode as parameters and the same instance cannot be used from 2 different context (the first one closes it and it cannot be reopened afterwards).

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 23, 2016

Contributor

Looks like you need to rebase on master.

Contributor

lesteve commented Jun 23, 2016

Looks like you need to rebase on master.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 23, 2016

Contributor

Looks like you need to rebase on master.

Other than this LGTM.

Contributor

lesteve commented Jun 23, 2016

Looks like you need to rebase on master.

Other than this LGTM.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 23, 2016

Contributor

Looks like you need to rebase on master.

Rebased, waiting for AppVeyor.

Contributor

aabadie commented Jun 23, 2016

Looks like you need to rebase on master.

Rebased, waiting for AppVeyor.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 23, 2016

Contributor

Can you add an entry in CHANGES.rst? Maybe we want to update the docs and add a small section saying that we can use file objects in dump/load.

Contributor

lesteve commented Jun 23, 2016

Can you add an entry in CHANGES.rst? Maybe we want to update the docs and add a small section saying that we can use file objects in dump/load.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 23, 2016

Contributor

Can you add an entry in CHANGES.rst?

Sure

Maybe we want to update the docs and add a small section saying that we can use file objects.

Good point. Something I forgot to add.

Contributor

aabadie commented Jun 23, 2016

Can you add an entry in CHANGES.rst?

Sure

Maybe we want to update the docs and add a small section saying that we can use file objects.

Good point. Something I forgot to add.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 23, 2016

Contributor

@lesteve, I added some lines about persistence in file objects in the documentation and updated the CHANGES.rst file. Comments welcome 😉

Contributor

aabadie commented Jun 23, 2016

@lesteve, I added some lines about persistence in file objects in the documentation and updated the CHANGES.rst file. Comments welcome 😉

doc/persistence.rst
@@ -46,6 +46,20 @@ We can then load the object from the file::
[('a', [1, 2, 3]), ('b', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))]
+Persist in file objects

This comment has been minimized.

@lesteve

lesteve Jun 23, 2016

Contributor

Maybe persistence to match the title of the section.

@lesteve

lesteve Jun 23, 2016

Contributor

Maybe persistence to match the title of the section.

doc/persistence.rst
+Persist in file objects
+=======================
+
+Instead of filenames, `dump` and `load` functions also accept opened file

This comment has been minimized.

@lesteve

lesteve Jun 23, 2016

Contributor

remove opened -> "accept file objects"

@lesteve

lesteve Jun 23, 2016

Contributor

remove opened -> "accept file objects"

doc/persistence.rst
+Compressor files provided by the python standard library can also be used to
+compress pickle in a context manager, e.g `gzip.GzipFile`, `bz2.BZ2File`,
+`lzma.LZMAFile`:
+ >>> # Dumping in a gzip.GzipFile file object using a compress level of 3.

This comment has been minimized.

@lesteve

lesteve Jun 23, 2016

Contributor

"gzip.GzipFile object"

@lesteve

lesteve Jun 23, 2016

Contributor

"gzip.GzipFile object"

This comment has been minimized.

@aabadie

aabadie Jun 23, 2016

Contributor

Done, I also used compresslevel=3 in the line below to match what is written in the comment.

@aabadie

aabadie Jun 23, 2016

Contributor

Done, I also used compresslevel=3 in the line below to match what is written in the comment.

doc/persistence.rst
@@ -79,6 +92,16 @@ are 'gzip', 'bz2', 'lzma' and 'xz':
Lzma and Xz compression methods are only available for python versions >= 3.3.
+Compressor files provided by the python standard library can also be used to
+compress pickle in a context manager, e.g `gzip.GzipFile`, `bz2.BZ2File`,

This comment has been minimized.

@lesteve

lesteve Jun 23, 2016

Contributor

Remove "in a context manager".

Also singe back-tick does italic in rst. You have to use double back-tick to get fixed width font, if that is what you were after.

@lesteve

lesteve Jun 23, 2016

Contributor

Remove "in a context manager".

Also singe back-tick does italic in rst. You have to use double back-tick to get fixed width font, if that is what you were after.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 23, 2016

Contributor

Can you rebase on master so we quickly double-check that flake8_diff.sh works fine on your PR?

Contributor

lesteve commented Jun 23, 2016

Can you rebase on master so we quickly double-check that flake8_diff.sh works fine on your PR?

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 23, 2016

Contributor

Can you rebase on master so we quickly double-check that flake8_diff.sh works fine on your PR?

Travis is broken because of the new code in the doc. I'm on it.

Contributor

aabadie commented Jun 23, 2016

Can you rebase on master so we quickly double-check that flake8_diff.sh works fine on your PR?

Travis is broken because of the new code in the doc. I'm on it.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 23, 2016

Contributor

I had issues with doctest and the usage of file object in a context manager. I skipped the lines in the doc but that may not be a solution. Any better idea ?

Otherwise, I rebased on the lastest master (including the flake8_diff.sh recent changes) and it works just fine.

Contributor

aabadie commented Jun 23, 2016

I had issues with doctest and the usage of file object in a context manager. I skipped the lines in the doc but that may not be a solution. Any better idea ?

Otherwise, I rebased on the lastest master (including the flake8_diff.sh recent changes) and it works just fine.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 24, 2016

Contributor

I had issues with doctest and the usage of file object in a context manager. I skipped the lines in the doc but that may not be a solution. Any better idea ?

What was the error? It would be great to understand where it comes from and fix it properly ...

Otherwise, I rebased on the lastest master (including the flake8_diff.sh recent changes) and it works just fine.

Great, thanks a lot!

Contributor

lesteve commented Jun 24, 2016

I had issues with doctest and the usage of file object in a context manager. I skipped the lines in the doc but that may not be a solution. Any better idea ?

What was the error? It would be great to understand where it comes from and fix it properly ...

Otherwise, I rebased on the lastest master (including the flake8_diff.sh recent changes) and it works just fine.

Great, thanks a lot!

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 24, 2016

Contributor

What was the error? It would be great to understand where it comes from and fix it properly ...

There were 2 things:

  • with open() as fo returns the file object ([< io.blabla >]) : I fixed it using an ellipsis flags and by adding [<...>] after the with block
  • in python 2.6, using gzip.GzipFile or bz2.BZ2File doesn't work if you don't use contextlib.closing. So I had to skip the file object compression example.
Contributor

aabadie commented Jun 24, 2016

What was the error? It would be great to understand where it comes from and fix it properly ...

There were 2 things:

  • with open() as fo returns the file object ([< io.blabla >]) : I fixed it using an ellipsis flags and by adding [<...>] after the with block
  • in python 2.6, using gzip.GzipFile or bz2.BZ2File doesn't work if you don't use contextlib.closing. So I had to skip the file object compression example.
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 24, 2016

Contributor

with open() as fo returns the file object ([]) : I fixed it using an ellipsis flags and by adding [<...>] after the with block

Hmmm I am not sure what we should do in this case:

with gzip.GzipFile('/tmp/test.gz', 'wb') as f:             
    res = joblib.dump({'a': [1, 2, 3], 'b': 'asadf'}, f)

Should res be:

  • a single element list containing containing f
  • a single element list containing containing f.name to be more consistent with when dump is called with a filename
  • because this is a new feature, can we do what we want, in particular returning a list may not make sense, since the list was there for historical reasons (main pickle + companion files for each numpy array)

in python 2.6, using gzip.GzipFile or bz2.BZ2File doesn't work if you don't use contextlib.closing. So I had to skip the file object compression example.

Pfff 😭 is there a way we can just skip doctests in Python 2.6?

Contributor

lesteve commented Jun 24, 2016

with open() as fo returns the file object ([]) : I fixed it using an ellipsis flags and by adding [<...>] after the with block

Hmmm I am not sure what we should do in this case:

with gzip.GzipFile('/tmp/test.gz', 'wb') as f:             
    res = joblib.dump({'a': [1, 2, 3], 'b': 'asadf'}, f)

Should res be:

  • a single element list containing containing f
  • a single element list containing containing f.name to be more consistent with when dump is called with a filename
  • because this is a new feature, can we do what we want, in particular returning a list may not make sense, since the list was there for historical reasons (main pickle + companion files for each numpy array)

in python 2.6, using gzip.GzipFile or bz2.BZ2File doesn't work if you don't use contextlib.closing. So I had to skip the file object compression example.

Pfff 😭 is there a way we can just skip doctests in Python 2.6?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 24, 2016

Contributor

Pfff 😭 is there a way we can just skip doctests in Python 2.6?

Seems like scikit-learn has some *fixture* files in the doc folder to skip the doctests. You can probably have a look there. I believe there is something in setup.cfg as well to work with nose and doctest-fixtures.

Contributor

lesteve commented Jun 24, 2016

Pfff 😭 is there a way we can just skip doctests in Python 2.6?

Seems like scikit-learn has some *fixture* files in the doc folder to skip the doctests. You can probably have a look there. I believe there is something in setup.cfg as well to work with nose and doctest-fixtures.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 27, 2016

Contributor

Should res be:

  • a single element list containing containing f
  • a single element list containing containing f.name to be more consistent with when dump is called > with a filename
  • because this is a new feature, can we do what we want, in particular returning a list may not make > sense, since the list was there for historical reasons (main pickle + companion files for each numpy array)

I think returning nothing in case a file object is given as input is reasonable because this means the user knows in advance the destination of the dump and the returned list is here, as you said, for historical reasons.

Contributor

aabadie commented Jun 27, 2016

Should res be:

  • a single element list containing containing f
  • a single element list containing containing f.name to be more consistent with when dump is called > with a filename
  • because this is a new feature, can we do what we want, in particular returning a list may not make > sense, since the list was there for historical reasons (main pickle + companion files for each numpy array)

I think returning nothing in case a file object is given as input is reasonable because this means the user knows in advance the destination of the dump and the returned list is here, as you said, for historical reasons.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 27, 2016

Contributor

Seems like scikit-learn has some fixture files in the doc folder to skip the doctests

Nice, it works like a charm. The setup.cfg file in joblib was already correctly configured.

Contributor

aabadie commented Jun 27, 2016

Seems like scikit-learn has some fixture files in the doc folder to skip the doctests

Nice, it works like a charm. The setup.cfg file in joblib was already correctly configured.

doc/persistence_fixture.py
+def setup_module(module):
+ """Setup module."""
+ if _compat.PY26:
+ raise SkipTest("Skipping persitence doctest in Python 2.6")

This comment has been minimized.

@lesteve

lesteve Jun 28, 2016

Contributor

typo in persistence (missing s)

@lesteve

lesteve Jun 28, 2016

Contributor

typo in persistence (missing s)

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 28, 2016

Contributor

I think returning nothing in case a file object is given as input is reasonable because this means the user knows in advance the destination of the dump and the returned list is here,

Not entirely convinced, but it is a good point that joblib.dump returning something is mostly for companion files (i.e. npy files) since they are not passed in by the user. Let's ask @GaelVaroquaux and @ogrisel whether they have an opinion on this!

Contributor

lesteve commented Jun 28, 2016

I think returning nothing in case a file object is given as input is reasonable because this means the user knows in advance the destination of the dump and the returned list is here,

Not entirely convinced, but it is a good point that joblib.dump returning something is mostly for companion files (i.e. npy files) since they are not passed in by the user. Let's ask @GaelVaroquaux and @ogrisel whether they have an opinion on this!

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 28, 2016

Member
I think returning nothing in case a file object is given as input
is reasonable because this means the user knows in advance the
destination of the dump and the returned list is here,

The danger with returning the file object is that it risks having users
holding on the open file handles. Under windows, this locks the file, and
hence can raise problems in parallel computing.

Member

GaelVaroquaux commented Jun 28, 2016

I think returning nothing in case a file object is given as input
is reasonable because this means the user knows in advance the
destination of the dump and the returned list is here,

The danger with returning the file object is that it risks having users
holding on the open file handles. Under windows, this locks the file, and
hence can raise problems in parallel computing.

@aabadie

This comment has been minimized.

Show comment
Hide comment
@aabadie

aabadie Jun 28, 2016

Contributor

The danger with returning the file object is that it risks having users holding on the open file handles. > Under windows, this locks the file, and hence can raise problems in parallel computing.

So I guess we can keep it as it is in this PR (returns nothing).

Contributor

aabadie commented Jun 28, 2016

The danger with returning the file object is that it risks having users holding on the open file handles. > Under windows, this locks the file, and hence can raise problems in parallel computing.

So I guess we can keep it as it is in this PR (returns nothing).

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 29, 2016

Contributor

So I guess we can keep it as it is in this PR (returns nothing).

Let's do it like this in this PR. It'd be good to try to make the joblib.dump return type consistent whether it takes a filename or a file object though in a further PR.

Contributor

lesteve commented Jun 29, 2016

So I guess we can keep it as it is in this PR (returns nothing).

Let's do it like this in this PR. It'd be good to try to make the joblib.dump return type consistent whether it takes a filename or a file object though in a further PR.

doc/persistence_fixture.py
+def setup_module(module):
+ """Setup module."""
+ if _compat.PY26:
+ raise SkipTest("Skipping persistence doctest in Python 2.6")

This comment has been minimized.

@lesteve

lesteve Jun 29, 2016

Contributor

Can you add a comment to say that GzipFile can not be used as a context manager in Python 2.6 or some better explanation?

@lesteve

lesteve Jun 29, 2016

Contributor

Can you add a comment to say that GzipFile can not be used as a context manager in Python 2.6 or some better explanation?

This comment has been minimized.

@aabadie

aabadie Jun 29, 2016

Contributor

Comment added.

@aabadie

aabadie Jun 29, 2016

Contributor

Comment added.

@lesteve lesteve merged commit af2e9b7 into joblib:master Jun 29, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 89.713%
Details

@lesteve lesteve removed the in progress label Jun 29, 2016

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jun 29, 2016

Contributor

Merging thanks!

Contributor

lesteve commented Jun 29, 2016

Merging thanks!

@aabadie aabadie deleted the aabadie:file_handle_persistence branch Sep 24, 2016

yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 30, 2016

Merge tag '0.10.2' into debian
* tag '0.10.2': (55 commits)
  Bump up __version__
  Update release to 0.10.2 in CHANGES.rst
  nosetests should run tests only from the joblib folder.
  API expose joblib.parallel.ParallelBackendBase and joblib.parallel.AutoBatchingMixin
  Update to numpydoc 0.6
  [numpy_pick_utils] Handle unicode filenames in BinaryZlibFile (#384)
  Fix format_stack with compiled code (e.g. .so or .pyd)
  PEP8: cosmit fix (#376)
  FIX typo
  Release 0.10.0
  FIX: __all__ should hold symbol names as strings
  Fix bench_auto_batching.py
  [MRG] Persistence in/from file objects (#351)
  Minor tweaks in auto batching benchmark script
  Improve flake8_diff.sh (#371)
  FIX numpy array persistence with pickle HIGHEST_PROTOCOL (#370)
  DOC: remove easy_install from joblib installation documentation (#363)
  MAINT fix typo
  DOC Add documentation of mmap_mode
  Explicit handling of job cancellation on first collected exception (#361)
  ...

@pyup-bot pyup-bot referenced this pull request in workforce-data-initiative/skills-airflow Jul 28, 2017

Merged

Pin joblib to latest version 0.11 #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment