New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] use distinct filenames for pickled datasets under Python 2 and 3 #5355

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@ogrisel
Member

ogrisel commented Oct 7, 2015

This is a fix for #3596.

Note that I changed the covtype loader to be consistent with the others even though that might re-trigger a useless download once.

@ogrisel ogrisel added this to the 0.17 milestone Oct 7, 2015

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 7, 2015

Hum, I will fix the issue under Python 2.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 7, 2015

The what's new entry is conflicting. As it's going to change a lot today, I will do a final rebase once the tests are all green and reviewers are ok with the change.

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 7, 2015

Is it possible to just use the lowest pickle version setting? Or is the issue that the constructors differ?

Should we anticipate different Py3 versions producing mutually incompatible pickles? Someone may have multiple versions of Py3 installed in different venvs but share the same default data dir.

@TomDLT

This comment has been minimized.

Member

TomDLT commented Oct 7, 2015

LGTM

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 7, 2015

Is it possible to just use the lowest pickle version setting? Or is the issue that the constructors differ?

This is what @lesteve tried to do in joblib but there is still issues with the fact that the numpy pickler used in some cases (small arrays with compression) tries to represent the buffer content with a str instance under Python 2 (which is fine) but str instance under Python 3 imply decoding. They should be unpickled as bytes under Python 3 but only for str instances that represent the binary content of a numpy buffer. This is very complex to get right and I feel that is going to be a serious maintenance headache.

I prefer to make it explicit that we expect pickles not to be compatible across versions of Python.

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 7, 2015

Should we anticipate different Py3 versions producing mutually incompatible pickles? Someone may have multiple versions of Py3 installed in different venvs but share the same default data dir.

That should not be a problem as the str type behavior should not change again under Python 3.

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 7, 2015

Okay.

On 7 October 2015 at 21:51, Olivier Grisel notifications@github.com wrote:

Should we anticipate different Py3 versions producing mutually
incompatible pickles? Someone may have multiple versions of Py3 installed
in different venvs but share the same default data dir.

That should not be a problem as the str type behavior should not change
again under Python 3.


Reply to this email directly or view it on GitHub
#5355 (comment)
.

@@ -72,9 +72,8 @@ def _load_coverage(F, header_length=6, dtype=np.int16):
header = dict([make_tuple(line) for line in header])
M = np.loadtxt(F, dtype=dtype)
nodata = header[b'NODATA_value']
nodata = int(header[b'NODATA_value'])

This comment has been minimized.

@jnothman

jnothman Oct 7, 2015

Member

What's going on here? Does this fix make a practical difference that should be recorded as a bug fix?

This comment has been minimized.

@ogrisel

ogrisel Oct 7, 2015

Member

It was to fix a numpy warning that does not like indexing by float values. The value is 128.0. I do not really understand this code but that should not impact the behavior besides the warning.

This comment has been minimized.

@jnothman

jnothman Oct 7, 2015

Member

Ah of course

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 7, 2015

LGTM

@TomDLT TomDLT changed the title from [MRG] use distinct filenames for pickled datasets under Python 2 and 3 to [MRG+2] use distinct filenames for pickled datasets under Python 2 and 3 Oct 7, 2015

@ogrisel

This comment has been minimized.

Member

ogrisel commented Oct 7, 2015

Ok I will merge by rebase then.

@ogrisel ogrisel closed this Oct 7, 2015

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 7, 2015

Ta

@amueller

This comment has been minimized.

Member

amueller commented Oct 12, 2015

Maybe it would also be a good idea to include joblib and scikit-learn version in the files. That might be more insight the pickle, though? We had some issues with changes in scikit-learn already.

@jakirkham jakirkham referenced this pull request May 22, 2017

Open

Iiboost recipe #2931

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment