New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules #9047

Merged
merged 7 commits into from Feb 18, 2019

Conversation

@tjb900
Copy link
Contributor

tjb900 commented Sep 3, 2018

import.c seems to contain a lot of protection to ensure that a partially-initialised module is not returned from import when in a multi-threaded environment. PyImport_Import enjoys these protections, but PyImport_GetModule does not.

https://bugs.python.org/issue34572

@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Sep 3, 2018

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@tjb900 tjb900 force-pushed the tjb900:pickle_import_race_fix branch from c71f99f to e47c5e1 Sep 5, 2018

@tjb900

This comment has been minimized.

Copy link
Contributor Author

tjb900 commented Jan 7, 2019

Hi!

Just wondering if there's something I can do to move this along - either that or be told that I'm completely wrong. We are using pure-python unpickling as a workaround for now, but I'd love to have some hope that we can switch back at some point in future.

(I notice the same issue appears to be present in @pitrou/pickle5-backport - which I'm very keen to start using!)

Thanks!

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 5, 2019

@tjb900 I realize that this is a thread safety bug which is notoriously hard to test, but I generally find that pull requests that include tests are much easier to review, because it's easier to get a sense of what behavior it fixes and an assurance that there won't be regressions.

return NULL;
}
module = PyImport_Import(module_name);
if (module == NULL)

This comment has been minimized.

@pganssle

pganssle Feb 5, 2019

Contributor

It seems to me that the difference in behavior here occurs when PyImport_Import returns NULL without raising an exception. I'm not really sure why this would occur. Is it safe to simply propagate the NULL up to the next level here?

This comment has been minimized.

@tjb900

tjb900 Feb 5, 2019

Author Contributor

IIRC (a big if), the issue wasn't that PyImport_Import returned NULL, it's that PyImport_GetModule can return a module object where the body of the module is still being executed (in another thread), and this partially initialised thing is then used immediately by the thread doing the unpickling.

This comment has been minimized.

@pganssle

pganssle Feb 5, 2019

Contributor

I'm not sure how this fixes things, though. It seems like the only change in behavior is for the case where PyImport_Import returns NULL without raising an exception. Why would that ever happen, and why was it being explicitly handled here in the first place?

This comment has been minimized.

@tjb900

tjb900 Feb 5, 2019

Author Contributor

In the failing case before, PyImport_Import was never called (at least, not on the thread which causes the error): GetModule returned non-NULL, and this object fell through to the getattribute, which then (sometimes) failed because that attribute did not exist yet.

On L6626 I changed _GetModule -> _Import. This, I believe, invokes the locking logic described here

/* Locking primitives to prevent parallel imports of the same module
which prevents the original issue.

This comment has been minimized.

@pganssle

pganssle Feb 5, 2019

Contributor

My mistake, I didn't realize that you were changing the import function. I'm still concerned as to why this was added in the first place, and whether this is a wider problem.

This comment has been minimized.

@pganssle

pganssle Feb 6, 2019

Contributor

Another nitpick, per PEP 7:

Code structure: one space between keywords like if, for and the following left paren; no spaces inside the paren; braces are required everywhere, even where C permits them to be omitted, but do not add them to code you are not otherwise modifying. All new C code requires braces.

So can you add braces around this?

@tjb900

This comment has been minimized.

Copy link
Contributor Author

tjb900 commented Feb 5, 2019

@pganssle Thanks for checking it out. Test case was attached with the bug report, but I didn't include it in the PR - will look at that ASAP.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 5, 2019

Looks to me like @ericsnowcurrently added this line when adding PyImport_GetModule in PR #3593, with associated issue [bpo-28411](https://bugs.python.org/issue2841https://bugs.python.org/issue28411).

It was added in several places, so I'm wondering if this thread safety bug is a problem for all the uses of PyImport_GetModule, and if so is this the right place for it? Maybe Eric or @ncoghlan has a bit more information about why this code was added and whether falling back to PyImport_Import is the right fix.

@tjb900

This comment has been minimized.

Copy link
Contributor Author

tjb900 commented Feb 6, 2019

Hmmm. PyImport_GetModule was added in #3593, but it actually looks to me like that logic (i.e. "check in sys.modules, and only import if it isn't there") dates at least back back to

if (module == NULL) {

I'm still trying to identify at what point this became unsafe (have partially initialised modules always been present in sys.modules?). There is talk of changing the way the locking works in #2646, but the locking itself seems much older even than that.

@ncoghlan

This comment has been minimized.

Copy link
Contributor

ncoghlan commented Feb 6, 2019

It looks to me like the cpickle logic has always been unsafe, but prior to the introduction of per-module import locks for Python 3.3 in https://bugs.python.org/issue9260 it was likely much harder to hit.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 6, 2019

So if I understand this correctly, the current version of the code likely works the way it does for performance reasons, but unfortunately that comes at the cost of thread safety, in which case I think the current patch is likely the right approach (at least for now - maybe PyImport_GetModule needs to be re-designed to take into account partially initialized modules).

If it's possible to make a reliable but not terribly resource-intensive test for this, I think that would be a big help, but that may be a lot to ask.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 7, 2019

@tjb900 Thanks for adding that test. I've verified that this test consistently fails when run without the other changes in this PR, and it runs quickly whether or not it succeeds, so it's a +1 from me!

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

slow_import_module = """
import time
time.sleep(%(delay)s)

This comment has been minimized.

@pablogsal

pablogsal Feb 7, 2019

Member

This test relies on sleep to work. I do not feel comfortable with such tests as it may fail or not work as intended almost certainly in the slowest or more exotic buildbots that we have. Check out, for example, the discussion in #10413 .

As an additional note, it also needs the reap_threads decorator.

This comment has been minimized.

@pganssle

pganssle Feb 7, 2019

Contributor

It will make the test a bit more complicated, but I think you can avoid sleep and make this more deterministic by creating a deliberate deadlock like this:

  1. Acquire Lock A in the test
  2. Attempt to acquire lock A in the partially-initialized module
  3. Spawn a thread that attempts to unpickle the object that involves an import
  4. Release lock A
  5. After the partially-initialized module has lock A, it adds the attribute that the pickled object uses.

I think there's still a race condition there that doesn't make it deterministic - namely that the thread which is unpickling the object needs to reach the "access the imported attribute" stage before lock A is released in order for it to be a good test, but I think if you fiddle around with the locks and threads a little bit you can get it to work.

This comment has been minimized.

@pitrou

pitrou Feb 7, 2019

Member

I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

This comment has been minimized.

@pganssle

pganssle Feb 7, 2019

Contributor

@pitrou I think removing the test because it's not guaranteed to test the thing it's intended to test would be a mistake. This still successfully tests what it's intending to test with high probability, which means that a regression would be very unlikely to ever hit master. Unlike other tests which cause erroneous failures, this causes erroneous successes, so its flakiness is unlikely to cause any erroneously broken buildbots.

If left in as a "flaky" test, I'd probably view it similar to a hypothesis test, which is that it's a probabilistic test that never has false negatives - if the test succeeds you are only partially certain that you have no bugs, but if it fails you know you have a bug.

Elsewhere you mention adding a comment about why PyImport_GetModule was dropped to prevent regressions - if it's worth a comment (which is not guaranteed to work), it's probably worth at least trying to enforce this with a test. Possibly with a comment that says why the test isn't guaranteed to detect regressions.

This comment has been minimized.

@pitrou

pitrou Feb 7, 2019

Member

Well, there are corner cases that we fix without writing tests for, because they are too tricky to reproduce. I think this one is a bit borderline: it's possible to reproduce, but nevertheless it needs a complex machinery.

This comment has been minimized.

@pganssle

pganssle Feb 7, 2019

Contributor

I certainly agree that if it's likely to have a negative impact on the test suite for reasons other than that it's not guaranteed to fail in the case of a regression the test should probably be considered optional.

I do think it's worth trying to get a test for this in the test suite, even if just as an exercising in providing a good example of how to test a race condition.

results = []
forget("slowimport")
self.assertNotIn("slowimport", sys.modules)
importlib.invalidate_caches()

This comment has been minimized.

@pablogsal

pablogsal Feb 7, 2019

Member

Can this affect any of the importlib tests?

This comment has been minimized.

@tjb900

tjb900 Feb 8, 2019

Author Contributor

Any advice on how I can address this? FWIW I pretty much copied this test from test_threaded_import.py.

I haven't tested without that line, so it may not be necessary. I'll see if the test works (i.e. fails without the patch) reliably without it.

This comment has been minimized.

@pitrou

pitrou Feb 8, 2019

Member

Actually, I don't know why it would affect the importlib tests (I'm the one who added that function IIRC).

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
}
}

module = PyImport_GetModule(module_name);
module = PyImport_Import(module_name);

This comment has been minimized.

@pablogsal

pablogsal Feb 7, 2019

Member

Could you benchmark how slower this is compared with just using PyImport_GetModule?

This comment has been minimized.

@pitrou

pitrou Feb 7, 2019

Member

git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

This comment has been minimized.

@pierreglaser

pierreglaser Feb 7, 2019

Contributor

+1. Also, STACK_GLOBAL should tend to progressively vanish in the pickle string, in favor of BINGET as functions/classes get progressively added to the memo (if used).

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Feb 7, 2019

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal

This comment has been minimized.

Copy link
Member

pablogsal commented Feb 7, 2019

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

I agree with @pganssle in that this test should be placed with the rest of the pickle tests.

@pablogsal pablogsal self-assigned this Feb 7, 2019

slow_import_module = """
import time
time.sleep(%(delay)s)

This comment has been minimized.

@pitrou

pitrou Feb 7, 2019

Member

I'll be more radical and say I don't think we need a test at all, if we can't make it simpler.

@@ -6623,13 +6623,9 @@ _pickle_Unpickler_find_class_impl(UnpicklerObject *self,
}
}

module = PyImport_GetModule(module_name);
module = PyImport_Import(module_name);

This comment has been minimized.

@pitrou

pitrou Feb 7, 2019

Member

git master:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
100000 loops, best of 5: 2.55 usec per loop

This PR:

$ ./python -m timeit -s "import pickle, logging; d = pickle.dumps((logging.Logger, logging.Handler, logging.Filter, logging.Formatter))" "pickle.loads(d)"
50000 loops, best of 5: 4.84 usec per loop

So this is a sizable performance drop, but on a micro-benchmark. I don't think this will be significant in real-world cases. Usually, pickle performance becomes important when large pieces of data are serialized, and by construction you don't get many STACK_GLOBAL opcodes in it.

cc'ing @pierreglaser @ogrisel out of curiosity.

Show resolved Hide resolved Modules/_pickle.c
@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 7, 2019

I haven't tried adapting this to actually use pickle, but I can consistently get a failure due to partially initialized modules on the first try with this configuration. It does use time.sleep, but it only needs to do the import once, which may reduce the side effects.

@tjb900

This comment has been minimized.

Copy link
Contributor Author

tjb900 commented Feb 8, 2019

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Objects/typeobject.c:

    /* Try to fetch cached copy of copyreg from sys.modules first in an
       attempt to avoid the import overhead. Previously this was implemented
       by storing a reference to the cached module in a static variable, but
@tjb900

This comment has been minimized.

Copy link
Contributor Author

tjb900 commented Feb 8, 2019

I have made the requested changes; please review again.

Happy to backport to 3.6 and 3.7 also.

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Feb 8, 2019

Thanks for making the requested changes!

@pablogsal: please review the changes made to this pull request.

@pitrou

This comment has been minimized.

Copy link
Member

pitrou commented Feb 8, 2019

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Please make that a separate issue on bugs.python.org. We'll have to discuss its performance implications separately.

t1.join()
t2.join()

self.assertEqual(len(results), 2)

This comment has been minimized.

@pitrou

pitrou Feb 8, 2019

Member

Perhaps also assert something about the results?

@@ -1174,6 +1178,63 @@ def test_truncated_data(self):
for p in badpickles:
self.check_unpickling_error(self.truncated_errors, p)

@reap_threads
def test_unpickle_module_race(self):

This comment has been minimized.

@pitrou

pitrou Feb 8, 2019

Member

This is a neat test :-)

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Feb 18, 2019

@pablogsal Any thoughts on the new tests for this one?

@pitrou

pitrou approved these changes Feb 18, 2019

Copy link
Member

pitrou left a comment

LGTM.

@pitrou

This comment has been minimized.

Copy link
Member

pitrou commented Feb 18, 2019

@pablogsal, I think your comments have been addressed. Can you confirm?

@pablogsal

This comment has been minimized.

Copy link
Member

pablogsal commented Feb 18, 2019

This addresses all my concerns!

I like test_unpickle_module_race a lot :)

@pitrou pitrou merged commit 4371c0a into python:master Feb 18, 2019

5 checks passed

Azure Pipelines PR #20190213.2 succeeded
Details
bedevere/issue-number Issue number 34572 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Feb 18, 2019

Thanks @tjb900 for the PR, and @pitrou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒🤖

miss-islington added a commit to miss-islington/cpython that referenced this pull request Feb 18, 2019

bpo-34572: change _pickle unpickling to use import rather than retrie…
…ving from sys.modules (pythonGH-9047)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <ozburgess@gmail.com>
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Feb 18, 2019

GH-11921 is a backport of this pull request to the 3.7 branch.

pitrou added a commit that referenced this pull request Feb 18, 2019

bpo-34572: change _pickle unpickling to use import rather than retrie…
…ving from sys.modules (GH-9047) (GH-11921)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <ozburgess@gmail.com>

maggyero added a commit to maggyero/cpython that referenced this pull request Feb 18, 2019

bpo-34572: change _pickle unpickling to use import rather than retrie…
…ving from sys.modules (pythonGH-9047)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment