Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] use distinct filenames for pickled datasets under Python 2 and 3 #5355

Closed
wants to merge 8 commits into from

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 7, 2015

This is a fix for #3596.

Note that I changed the covtype loader to be consistent with the others even though that might re-trigger a useless download once.

@ogrisel ogrisel added this to the 0.17 milestone Oct 7, 2015
@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2015

Hum, I will fix the issue under Python 2.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2015

The what's new entry is conflicting. As it's going to change a lot today, I will do a final rebase once the tests are all green and reviewers are ok with the change.

@jnothman
Copy link
Member

jnothman commented Oct 7, 2015

Is it possible to just use the lowest pickle version setting? Or is the issue that the constructors differ?

Should we anticipate different Py3 versions producing mutually incompatible pickles? Someone may have multiple versions of Py3 installed in different venvs but share the same default data dir.

@TomDLT
Copy link
Member

TomDLT commented Oct 7, 2015

LGTM

@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2015

Is it possible to just use the lowest pickle version setting? Or is the issue that the constructors differ?

This is what @lesteve tried to do in joblib but there is still issues with the fact that the numpy pickler used in some cases (small arrays with compression) tries to represent the buffer content with a str instance under Python 2 (which is fine) but str instance under Python 3 imply decoding. They should be unpickled as bytes under Python 3 but only for str instances that represent the binary content of a numpy buffer. This is very complex to get right and I feel that is going to be a serious maintenance headache.

I prefer to make it explicit that we expect pickles not to be compatible across versions of Python.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2015

Should we anticipate different Py3 versions producing mutually incompatible pickles? Someone may have multiple versions of Py3 installed in different venvs but share the same default data dir.

That should not be a problem as the str type behavior should not change again under Python 3.

@jnothman
Copy link
Member

jnothman commented Oct 7, 2015

Okay.

On 7 October 2015 at 21:51, Olivier Grisel notifications@github.com wrote:

Should we anticipate different Py3 versions producing mutually
incompatible pickles? Someone may have multiple versions of Py3 installed
in different venvs but share the same default data dir.

That should not be a problem as the str type behavior should not change
again under Python 3.


Reply to this email directly or view it on GitHub
#5355 (comment)
.

@@ -72,9 +72,8 @@ def _load_coverage(F, header_length=6, dtype=np.int16):
header = dict([make_tuple(line) for line in header])

M = np.loadtxt(F, dtype=dtype)
nodata = header[b'NODATA_value']
nodata = int(header[b'NODATA_value'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Does this fix make a practical difference that should be recorded as a bug fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to fix a numpy warning that does not like indexing by float values. The value is 128.0. I do not really understand this code but that should not impact the behavior besides the warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course

@jnothman
Copy link
Member

jnothman commented Oct 7, 2015

LGTM

@TomDLT TomDLT changed the title [MRG] use distinct filenames for pickled datasets under Python 2 and 3 [MRG+2] use distinct filenames for pickled datasets under Python 2 and 3 Oct 7, 2015
@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2015

Ok I will merge by rebase then.

@ogrisel ogrisel closed this Oct 7, 2015
@jnothman
Copy link
Member

jnothman commented Oct 7, 2015

Ta

@amueller
Copy link
Member

Maybe it would also be a good idea to include joblib and scikit-learn version in the files. That might be more insight the pickle, though? We had some issues with changes in scikit-learn already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants