Skip to content

BUG: Make allow_pickle=False the default for loading#12889

Merged
charris merged 1 commit intonumpy:masterfrom
ivanov:no-pickles-for-you
Apr 17, 2019
Merged

BUG: Make allow_pickle=False the default for loading#12889
charris merged 1 commit intonumpy:masterfrom
ivanov:no-pickles-for-you

Conversation

@ivanov
Copy link
Copy Markdown
Contributor

@ivanov ivanov commented Jan 30, 2019

a partial mitigation of #12759.

see also https://nvd.nist.gov/vuln/detail/CVE-2019-6446

I'm not sure if this commit should start with 'API:' - and it should
certainly be documented, but I'm not sure if this will go into 1.16 or
1.17.

@ivanov
Copy link
Copy Markdown
Contributor Author

ivanov commented Jan 30, 2019

The vulnerability only applies to loading, but I thought it'd be awkward if the default parameters did not round-trip save-load with default parameters.

@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Jan 30, 2019

This looks like the correct default value for that parameter. @seberg suggested we let the user know how to fix the situation, should their file contain a pickle.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jan 31, 2019
@charris charris added this to the 1.16.2 release milestone Jan 31, 2019
@seberg
Copy link
Copy Markdown
Member

seberg commented Jan 31, 2019

@ivanov thanks, indeed, unless someone seriously argues this is critical, could you go with allow_pickle=None which then gives a FutureWarning instead? IIRC, FutureWarnings are always visible, so it will at least educate over the dangers.

With such a warning I think we should probably backport it as well, since I assume downstream packages will at worst log warnings in their test suits, it should be OK. (I am not sure about the typical setup, numpy would ignore such warnings in tests for release versions)

We could probably even aim for changing it by 1.18, with the option of delaying if any downstream packages get hit by it. The current error message is very correct, but maybe could be clarified a bit more for very unexperienced users.

@charris charris added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 1, 2019
@charris
Copy link
Copy Markdown
Member

charris commented Feb 1, 2019

This should get a mention in the 1.17.0-notes as changed behavior.

@eric-wieser
Copy link
Copy Markdown
Member

+1 on seberg's suggestion. This doesn't strike me as a real enough problem to break everyone saving object arrays without a deprecation period

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

@seberg @eric-wieser Just to clarify, do you mean adding allow_pickle=None as default for numpy.load, numpy.save, numy.savez and numpy.NpzFile?

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

oleksandr-pavlyk commented Feb 5, 2019

By the way, changing default of allow_pickle of _savez renders it impossible to save object arrays with numpy.savez and numpy.savez_compressed. Those functions simply do not provide for the way to control values of allow_pickle and pickle_kwargs of the underlying _savez.

@ivanov
Copy link
Copy Markdown
Contributor Author

ivanov commented Feb 6, 2019

@oleksandr-pavlyk is right , I added the passing of that arg to the savez functions, but neglected to push that out to this PR. Will integrate allow_pickle=None as a transition path with a warning next

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

@ivanov I also needed to change test_io.py as follows to allow tests to pass:

diff --git a/numpy/lib/tests/test_io.py b/numpy/lib/tests/test_io.py
index 7ef25538b..5c9ece928 100644
--- a/numpy/lib/tests/test_io.py
+++ b/numpy/lib/tests/test_io.py
@@ -179,7 +179,14 @@ class RoundtripTest(object):

 class TestSaveLoad(RoundtripTest):
     def roundtrip(self, *args, **kwargs):
-        RoundtripTest.roundtrip(self, np.save, *args, **kwargs)
+        load_kwds = kwargs.get('load_kwds', {})
+        load_kwds['allow_pickle'] = load_kwds.get('allow_pickle', True)
+        save_kwds = kwargs.get('save_kwds', {})
+        save_kwds['allow_pickle'] = save_kwds.get('allow_pickle', True)
+        kwargs_copy = kwargs
+        kwargs_copy['load_kwds'] = load_kwds
+        kwargs_copy['save_kwds'] = save_kwds
+        RoundtripTest.roundtrip(self, np.save, *args, **kwargs_copy)
         assert_equal(self.arr[0], self.arr_reloaded)
         assert_equal(self.arr[0].dtype, self.arr_reloaded.dtype)
         assert_equal(self.arr[0].flags.fnc, self.arr_reloaded.flags.fnc)
@@ -187,7 +194,11 @@ class TestSaveLoad(RoundtripTest):

 class TestSavezLoad(RoundtripTest):
     def roundtrip(self, *args, **kwargs):
-        RoundtripTest.roundtrip(self, np.savez, *args, **kwargs)
+        load_kwds = kwargs.get('load_kwds', {})
+        load_kwds['allow_pickle'] = load_kwds.get('allow_pickle', True)
+        kwargs_copy = kwargs
+        kwargs_copy['load_kwds'] = load_kwds
+        RoundtripTest.roundtrip(self, np.savez, *args, **kwargs_copy)
         try:
             for n, arr in enumerate(self.arr):
                 reloaded = self.arr_reloaded['arr_%d' % n]

The reason TestSavezLoad are only setting load_kwds, but not save_kwds was because I did not go as far locally, and did not modify the defaults to _savez function in npyio.py.

@ivanov
Copy link
Copy Markdown
Contributor Author

ivanov commented Feb 11, 2019

I've been hoping to figure this out on my own, but now I've spent so much time trying to do that that it makes sense to just ask instead of wading through the git log to infer the process.

Should I make a doc/release/1.16.2-notes.rst to document this change, or does this go into doc/changelog/1.17.0-changelog.rst and then that will get included in a backport?

Here's what the FutureWarning looks like at the moment:

$ python -c 'import numpy as np; np.savez("hi.npz", [{"yo": "sup"}])'

/home/pi/code/numpy/numpy/lib/npyio.py:746: FutureWarning: Object arrays will not be saved by default in the future because allow_pickle will default to False. You should add allow_pickle=True explicitly to eliminate this warning.

@seberg
Copy link
Copy Markdown
Member

seberg commented Feb 11, 2019

@ivanov just add the 1.17.0 release notes for now, I think. I believe we tended to be on the side of: there is no reason to disallow saving, I very much agree with Erics argument that it is hell if a long running script dies at the end because of such a silly thing. I would be OK with a general warning, but I am not sure it is actually useful.

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

@ivanov This code does not compile with Python 2.7 as def _savez_dispatcher(file, *args, allow_pickle=None, **kwds): is invalid syntax, so porting this change back to 1.16.2 would require changes.

@seberg The scipy 1.2.1 contains a test, scipy.sparse.tests.test_matrix_io::test_malicious_load, that expects save to not execute __reduce__.

@seberg
Copy link
Copy Markdown
Member

seberg commented Feb 11, 2019

@oleksandr-pavlyk thanks for the pointer, but it seems to me that it does not matter here? Since that already sers allow_pickle=False explicitly on new enough numpy versions, so it already is clean with respect to any change we are doing. Also good point about the backporting!

@ivanov I think the messages could probably be simplified a bit (but have to think about it myself first). We will definitely need tests for the deprecation. If in doubt you can put them into the test_deprecations.py file but together with the normal tests is just as well.

@oleksandr-pavlyk
Copy link
Copy Markdown
Contributor

@seberg Yes, indeed. I misread the test. I assumed it was expecting savez to raise the ValueError.

Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the Deprecation for saving, although we can discuss it before if you like. We can also discuss if we should give a UserWarning on writing a pickle.
I still think we can probably make the text a bit more to the point (imagine someone coding for 2 weeks being frustrated their data does not load).

Oleksandrs comment on Backporting/Py 2.7 compatibility is very important of course (since I guess we want to do that).

@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Feb 12, 2019

What is NumPy's policy on pushing to other developers' branches? In scikit-image, we often fix nitpicks on behalf of other authors, to save frustration.

@seberg
Copy link
Copy Markdown
Member

seberg commented Feb 12, 2019

@stefanv, I think we (at least me) just do not have much experience with it, or a policy in numpy. Probably something we should pick up, since I guess it is much less frustrating and safe an iteration or two. I guess conflicts are rare enough. You would only use it for nits really or also for very clear fixups?

@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Feb 12, 2019 via email

@charris
Copy link
Copy Markdown
Member

charris commented Feb 12, 2019

@stefanv We allow editing online to fix nits, it is encouraged to get otherwise good PRs merged. Although sometimes having the user jump through the hoops is educational :) Are you suggesting pushing directly to the user's branch?

@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Feb 12, 2019 via email

@charris charris changed the title BUG: allow_pickle=False by default API: allow_pickle=False by default Feb 13, 2019
@charris
Copy link
Copy Markdown
Member

charris commented Feb 13, 2019

"API" looks like a good marker, I'll add it.

@charris charris added 30 - API and removed 00 - Bug labels Feb 13, 2019
@rgommers
Copy link
Copy Markdown
Member

Should I make a doc/release/1.16.2-notes.rst to document this change, or does this go into doc/changelog/1.17.0-changelog.rst and then that will get included in a backport?

It doesn't look to me like an API change like this should be backported to 1.16.2. A bigger warning for this was already added to the docs, so for the rest the regular policy should apply: warning in 1.17.0, change of behavior in 1.19.0

@charris
Copy link
Copy Markdown
Member

charris commented Apr 16, 2019

How about just disable the default load with this PR, which handles the CVE and backports easily, and make the warning/deprecation another PR.

@shoyer
Copy link
Copy Markdown
Member

shoyer commented Apr 16, 2019

How about just disable the default load with this PR, which handles the CVE and backports easily, and make the warning/deprecation another PR.

Sounds good to me!

@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Apr 16, 2019

+1

@mattip mattip force-pushed the no-pickles-for-you branch from de48e7d to 1c9e68a Compare April 16, 2019 19:41
@mattip mattip changed the title API: allow_pickle=False by default BUG: allow_pickle=False in loading by default Apr 16, 2019
@mattip
Copy link
Copy Markdown
Member

mattip commented Apr 16, 2019

split into two PRs. This one is now only about loading, and changes the default for allow_pickle to False

@charris charris changed the title BUG: allow_pickle=False in loading by default BUG: Make allow_pickle=False the default for loading Apr 16, 2019
@charris
Copy link
Copy Markdown
Member

charris commented Apr 16, 2019

I think there is consensus around this now, but will wait. I don't think we need to test the default, anyone want to make an argument for that?

@rgommers
Copy link
Copy Markdown
Member

I don't think we need to test the default,

agreed

@eric-wieser
Copy link
Copy Markdown
Member

Looks good to me once the ..versionchanged is added that I mention above

@charris charris force-pushed the no-pickles-for-you branch 2 times, most recently from 14edbb1 to 7bf5d88 Compare April 17, 2019 04:11
@charris charris force-pushed the no-pickles-for-you branch from 7bf5d88 to 2f00573 Compare April 17, 2019 04:29
@charris charris force-pushed the no-pickles-for-you branch from 2f00573 to a4df7e5 Compare April 17, 2019 04:50
@charris charris merged commit 8f31f95 into numpy:master Apr 17, 2019
@stefanv
Copy link
Copy Markdown
Contributor

stefanv commented Apr 17, 2019

👏

@rgommers
Copy link
Copy Markdown
Member

woohoo! looking forward to 1.16.3:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.