Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): fix pickling of Basic to use __getnewargs__ #21260

Merged
merged 4 commits into from
Apr 23, 2021

Conversation

oscarbenjamin
Copy link
Collaborator

This ensures that pickling of Basic subclasses is based on
__getnewargs__ with the exception of Symbol that uses __getnewargs_ex__
for keyword argument support. The changes here fix pickling using
protocol 2 or higher but remove support for pickling using protocol 0
or 1. Protocol 2 was introduced in Python 2.3.

A number of classes needed to be rearranged or tidied up to accommodate for
this including Singleton and some of the Singleton class such as Reals.

References to other Issues or PRs

Fixes #21121

Brief description of what is fixed or changed

Other comments

Release Notes

  • core
    • Fixed pickling for a number of Basic subclasses (such as a Symbol with assumptions) for protocol 2 or higher. Removed support for pickling with protocol 0 or 1.

This ensures that pickling of Basic subclasses is based on
__getnewargs__ with the exception of Symbol that uses __getnewargs_ex__
for keyword argument support. The changes here fix pickling using
protocol 2 or higher but remove support for pickling using protocol 0
or 1. Protocol 2 was introduced in Python 2.3.

A number of classes needed to rearranged or tidied up to accommodate for
this including Singleton and some of the Singleton class such as Reals.
@sympy-bot
Copy link

sympy-bot commented Apr 7, 2021

Hi, I am the SymPy bot (v161). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • core
    • Fixed pickling for a number of Basic subclasses (such as a Symbol with assumptions) for protocol 2 or higher. Removed support for pickling with protocol 0 or 1. (#21260 by @oscarbenjamin)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.9.

Click here to see the pull request description that was parsed.
This ensures that pickling of Basic subclasses is based on
`__getnewargs__` with the exception of Symbol that uses `__getnewargs_ex__`
for keyword argument support. The changes here fix pickling using
protocol 2 or higher but remove support for pickling using protocol 0
or 1. Protocol 2 was introduced in Python 2.3.

A number of classes needed to be rearranged or tidied up to accommodate for
this including Singleton and some of the Singleton class such as Reals.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->

Fixes #21121 

#### Brief description of what is fixed or changed


#### Other comments


#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* core
    * Fixed pickling for a number of Basic subclasses (such as a Symbol with assumptions) for protocol 2 or higher. Removed support for pickling with protocol 0 or 1.
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@asmeurer
Copy link
Member

asmeurer commented Apr 7, 2021

Does it give an error if you try to pickle with protocol 1 (or is that even possible)?

@asmeurer
Copy link
Member

asmeurer commented Apr 7, 2021

Is the bug from the issue tested here? All I see is test_core_symbol in test_pickling.py, but that (presumably) already passes.

@asmeurer
Copy link
Member

asmeurer commented Apr 7, 2021

Thanks for working on this. We need to make a more concerted effort to making pickling work better with SymPy. The current state is that either things don't pickle at all, or you get bizzare bugs like #21121 that lead to broken behavior, often in very subtle ways. Pickle not working also means that you can't use multiprocessing with SymPy, so it's hard to speed things up by parallelizing them.

Incidentally, pickling is one place where Hypothesis testing would be very natural and help to catch bugs in corner cases (#20914).

@oscarbenjamin
Copy link
Collaborator Author

Does it give an error if you try to pickle with protocol 1 (or is that even possible)?

If you try to pickle with protocol 1 then it will work sometimes but not always. There isn't a good way to support protocol 0 and 1 at the same time as the others. For protocol 1 or 0 you need to define __getstate__ to handle any keyword arguments which doesn't work very well (hence the issue that this fixes). To support protocol 2 or higher the best way is to use __getnewargs_ex__ for anything that needs keyword arguments but to make that work completely you need to make sure that __getstate__ returns None which breaks protocol 0 or 1. The way to support all protocols is using __reduce_ex__ but that is not recommended. I think it is better not to try and support the really old protocols.

@oscarbenjamin
Copy link
Collaborator Author

Is the bug from the issue tested here? All I see is test_core_symbol in test_pickling.py, but that (presumably) already passes.

I'm not sure how exactly to test this. It needs to be a test that invokes a subprocess because in-process cacheing can hide the bug.

@oscarbenjamin
Copy link
Collaborator Author

Incidentally, pickling is one place where Hypothesis testing would be very natural and help to catch bugs in corner cases (#20914).

As I mentioned somewhere else I'm not sure that hypothesis is useful in the places where we already know what the problems are :)

Maybe if this is merged and fixes the basic problems so that we think things are good then hypothesis testing to discover unforeseen bugs would be useful. For now it wouldn't be hard to add pickling etc to test_args and I expect that a whole bunch of failures would jump right out.

@asmeurer
Copy link
Member

asmeurer commented Apr 7, 2021

If you try to pickle with protocol 1 then it will work sometimes but not always. There isn't a good way to support protocol 0 and 1 at the same time as the others. For protocol 1 or 0 you need to define getstate to handle any keyword arguments which doesn't work very well (hence the issue that this fixes). To support protocol 2 or higher the best way is to use getnewargs_ex for anything that needs keyword arguments but to make that work completely you need to make sure that getstate returns None which breaks protocol 0 or 1. The way to support all protocols is using reduce_ex but that is not recommended. I think it is better not to try and support the really old protocols.

That's the impression I got. But my question was if there is a way to make it error on protocol 1, so that people don't try to use it. But maybe it's not actually that important, since likely very few people are actually using it. It's not the default, and it it's only really required to use if you want to use a version of Python that hasn't been supported in a very long time.

I'm not sure how exactly to test this. It needs to be a test that invokes a subprocess because in-process cacheing can hide the bug.

So we should extend the pickling test to do this.

As I mentioned somewhere else I'm not sure that hypothesis is useful in the places where we already know what the problems are :)

I'm not at all convinced we know what the problems are for pickling. I've seen very bizarre behavior trying to use SymPy/mpmath with multiprocessing, which I know came from pickling, but I have no idea what was going on beyond that. Things like numerical results coming out completely wrong. It's possible this PR would have fixed the issues I had, or that there are tons of more similar latent problems (I highly expect the latter). So the very basic question of, "does this PR fix all the existing issues with pickling Basic instances?" is unanswerable, especially if the answer is purported to be in the positive, because the existing tests aren't very convincing in terms of coverage. Hypothesis would enable you to expand coverage (not line coverage, but search space coverage), to be more confident that a fix does in fact work. Actually pickling is one of those things where things could still be broken anyway, because it's easy to have bugs that can only be reproduced by creating some specific internal state, but even then, it would be much better than what we have now, which is already written like a property test, but only tests against a very small specific set of examples.

@asmeurer
Copy link
Member

asmeurer commented Apr 7, 2021

Protocols 2 and 3 aren't really necessary to support either https://docs.python.org/3/library/pickle.html#data-stream-format. Unless there's some other reason someone might use an older protocol that I'm not aware of.

@oscarbenjamin
Copy link
Collaborator Author

I'm not at all convinced we know what the problems are for pickling.

I agree but I think that right now we know what the showstoppers are. Once those are out of the way if we're not sure what problems there are then hypothesis can be a really good way to find them. At the moment I expect a hypothesis test would stumble after immediately finding the bugs that we already know (that I'm trying to fix here).

@oscarbenjamin
Copy link
Collaborator Author

my question was if there is a way to make it error on protocol 1, so that people don't try to use it

I guess we could define __reduce_ex__ and have it raise for protocol 0 or 1:
https://docs.python.org/3/library/pickle.html#object.__reduce_ex__

I just pushed that as a change.

@oscarbenjamin
Copy link
Collaborator Author

I've added an explicit error for protocols 0 and 1.

This is ready for review/merge.

@oscarbenjamin oscarbenjamin merged commit be3c7b5 into sympy:master Apr 23, 2021
@oscarbenjamin oscarbenjamin deleted the pr_pickling_assumptions branch April 23, 2021 20:41
bocklund added a commit to bocklund/pycalphad that referenced this pull request Sep 22, 2021
SymPy Symbol subclasses now use __getnewargs_ex__, which broke our subclasses.
See sympy/sympy#21260.

Since Python 3.6, __getnewargs_ex__ has been used by Pickle protocol 2, which
was introduced in Python 2.3. __getnewargs_ex__ takes precedence over
__getnewargs__ if it is defined.
See https://docs.python.org/3/library/pickle.html#object.__getnewargs_ex__.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Same symbols created in different processes are not resolved as being equal
3 participants