Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gotchas note about Python bool vs SymPy Boolean behavior #13427

Open
Wrzlprmft opened this issue Oct 10, 2017 · 19 comments
Open

add gotchas note about Python bool vs SymPy Boolean behavior #13427

Wrzlprmft opened this issue Oct 10, 2017 · 19 comments

Comments

@Wrzlprmft
Copy link

Wrzlprmft commented Oct 10, 2017

Right now, adding or multiplying Python booleans with SymPy arrays raises a TypeError (BooleanAtom not allowed in this context). Please change this such that booleans behave like the respective integers (0 or 1):

from sympy.abc import x
assert True*x  == x
assert False*x == 0
assert True+x  == 1+x
assert False+x == x

It is specified in PEP 285 (which introduced the bool type) that booleans should just behave like integers except for printing (boldface mine):

This PEP proposes the introduction of a new built-in type, bool, with two constants, False and True. The bool type would be a straightforward subtype (in C) of the int type, and the values False and True would behave like 0 and 1 in most respects (for example, False==0 and True==1 would be true) except repr() and str(). […]

Also see Python’s data model (boldface mine):

There are two types of integers:

Integers (int)

[…]

Booleans (bool)

These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects. The Boolean type is a subtype of the integer type, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings "False" or "True" are returned, respectively.

I see no reason why SymPy should not adhere to this.

Some practical problems arising due to this:

  • The user needs to make a pointless conversion.

  • The resulting error may be difficult to find, in particular if some other function only returns booleans some times, but otherwise returns integers.

  • It leads to inconsistencies:

     
     1*True*x  # no error
     True*1*x  # no error
     x*True*1  #    error
     x*1*True  #    error
     True*x    #    error
  • Many other Python modules adhere to this standard and thus combining them with SymPy can cause problems. For example, if you have a NumPy array of integers that can only be 1 or 0, it may be wise to make this a boolean array (instead of a regular integer array). The same goes for NumPy arrays resulting from piecewise logical operations. Elements of these arrays cannot be directly used together with SymPy.

@smichr
Copy link
Member

smichr commented Oct 10, 2017

What should [1,2,3]*False give? [], [0, 0, 0] or 0?

@Wrzlprmft
Copy link
Author

Wrzlprmft commented Oct 10, 2017

What should [1,2,3]*False give? [], [0, 0, 0] or 0?

That already works and and has nothing to do with SymPy (or I fail to see what the connection should be) as it only uses plain Python.
It yields [] which is the same result as for [1,2,3]*0 (because that’s how the multiplication of iterables and integers works in Python).

@Wrzlprmft Wrzlprmft changed the title Allow simple arithmetic operations with Python booleans Python booleans don’t behave like integers together with expressions Oct 10, 2017
@smichr
Copy link
Member

smichr commented Oct 10, 2017

Sorry for the confusion. I meant the list to be illustrative, e.g. Tuple(1, 2)*False -- should it give (0, 0) or ()? Or (x<1)*False. It just seems that explicit is better than implicit here:

Piecewise((expr, Eq(b, True)), (other, Eq(b, False), (expr*b, True))

@Wrzlprmft
Copy link
Author

Wrzlprmft commented Oct 11, 2017

Tuple(1, 2)*False -- should it give (0, 0) or ()? Or (x<1)*False.

Well, the same question can be asked for Tuple(1,2)*0, but that design decision has been made long ago. My point is that Tuple(1,2)*False should return the same as Tuple(1,2)*0 (which it does). Python booleans are just a special case of integers, and in fact they behave the same with exception of printing (see the quotes I added to the initial comment).

SymPy breaks this for no reason. I listed some reasons why this is bad in the initial comment.

Yes, using integers explicitly is better in many cases, but that doesn’t mean that using booleans should not work.

@smichr
Copy link
Member

smichr commented Oct 12, 2017

I agree. I'm not sure of the ramifications in terms of the code base, bowever.

@smichr
Copy link
Member

smichr commented Oct 12, 2017

cf #8571

@asmeurer
Copy link
Member

One of the reasons we created a separate boolean type for SymPy is because the Python booleans behave like integers, which is incorrect for our mathematical needs. For me, putting True or False in an expression (non-boolean) context is a garbage in-garbage out scenario.

@asmeurer
Copy link
Member

See also the docstring of BooleanTrue.

@asmeurer
Copy link
Member

Here's some history for anyone interested (I'm pretty sure I'm the "happy elephant" on Google Code) #2560

@Wrzlprmft
Copy link
Author

One of the reasons we created a separate boolean type for SymPy is because the Python booleans behave like integers, which is incorrect for our mathematical needs. For me, putting True or False in an expression (non-boolean) context is a garbage in-garbage out scenario.

Okay, Python booleans and SymPy booleans work differently, I accept that. But if there is already inconsistency here anyway (e.g., ~sympy.true != ~True), what would be the problem with SymPy treating Python booleans like Python integers in all respects? That’s what I am suggesting here; I wasn’t even aware of the existence of SymPy booleans when I made this request. (Note that the above is not a rhetorical question; as I have very little knowledge of the purpose of SymPy booleans, I can very well imagine that this could pose an actual problem.)

@smichr
Copy link
Member

smichr commented Oct 13, 2017

One of the idioms that we use is to convert python values to sympy values. sympify(True) -> S.true, the sympy boolean. So if you write foo*True where foo is a sympy object there will be a request to sympify to convert True. But there is no context to inform the conversion process as to whether True should be converted to 1 or whether it should be converted to S.true. Well, actually there is a context: the __foo__ routine that is recognizing that a conversion needs to be made, e.g. __mul__. There are 133 of those in 64 files. Maintaining this for all methods does not sound like a good use of resources when "explicit is better than implicit" and a simple int(True) or int(False) (or just 1*True 1*False will make the conversion.

As for the inconsistent results in something like 1*True*x vs x*1*True -- yes, that's something we can't do anything about because of the order in which python processes terms. But that's really not the use case that you are advocating for (at least I don't see a situation where someone would rather type True instead of 1). As you said:

For example, if you have a NumPy array of integers that can only be 1 or 0, it may be wise to make this a boolean array (instead of a regular integer array). The same goes for NumPy arrays resulting from piecewise logical operations. Elements of these arrays cannot be directly used together with SymPy.

I imagine that you want to use each value as a selector, e.g. [x*b for x, b in zip(expr, booleans)]. In that case I think the only way forward is to use [x*(1*b) for x, b in zip(expr, booleans)] or [x*int(b) for x, b in zip(expr, booleans)].

@Wrzlprmft
Copy link
Author

There are 133 of those in 64 files. Maintaining this for all methods does not sound like a good use of resources […]

I am not sufficiently familiar with SymPy’s code to make substantiated statements on this, but it could be as easy as replacing type(x) == int with issubclass(type(x),int).

But that's really not the use case that you are advocating for (at least I don't see a situation where someone would rather type True instead of 1).

Well, it’s of course a simplified example and I do not suggest that either of this is typed. Rather all elements of 1*True*x could be the result of some function evaluation. Thus, if you happen to change the order of operators or some function only occasionally returns a Python boolean but integers otherwise, you have a difficult-to-track error.

I imagine that you want to use each value as a selector, e.g. [x*b for x, b in zip(expr, booleans)]. In that case I think the only way forward is to use [x*(1*b) for x, b in zip(expr, booleans)] or [x*int(b) for x, b in zip(expr, booleans)].

While I am all in favour of such explicit solutions and you can always convert, if you know that you need it, it can easily happen that such things are more buried in application. Suppose, for example, that you represent the weight matrix of a network as a NumPy matrix A. Now, for many applications, routines translate straightforwardly if you have a binary network and let A be the adjacency matrix. In this case, you (or some intelligent routine) might decide that it’s appropriate to make A as a boolean matrix, e.g., to reduce memory usage.

@smichr
Copy link
Member

smichr commented Oct 13, 2017

could be the result of some function evaluation.

The results of all but a few low-level functions are all SymPy objects so I don't think this will occur unless a user is writing their own functions and returning non-sympified results and combining them with bool results. It might be worth a word in a gotchas section of the tutorial.

@smichr smichr changed the title Python booleans don’t behave like integers together with expressions add gotchas note about Python bool vs SymPy Boolean behavior Oct 13, 2017
@asmeurer
Copy link
Member

~True is actually the initial motivation, since in Python it gives a integer value that has a boolean value of True (-2). If you have boolean values and want to include them in SymPy expressions interpreted as 0 and 1, your best bet is to first convert them to 0/1.

I agree with @smichr that this could be documented better. There are some notes in the BooleanTrue docstring that hint at why it exists, but it would be good to just write down somewhere the motivation for having it not be an integer, even for cases where it doesn't necessarily conflict with boolean operations.

@Wrzlprmft
Copy link
Author

The results of all but a few low-level functions are all SymPy objects so I don't think this will occur unless a user is writing their own functions and returning non-sympified results and combining them with bool results. It might be worth a word in a gotchas section of the tutorial.

SymPy does not have to factor into this part at all. The function that sometimes returns a boolean in that example could be any Python routine, user-defined or by some other module (e.g., NumPy). Yes, in most cases, this would be non-perfect behaviour, but given Python’s standards on booleans, it’s not really bad either.

@smichr
Copy link
Member

smichr commented Oct 14, 2017

I think what we are discussing is inherent to the CAS-in-python world. What you are saying makes sense in Python. Sympy has isolated Boolean behavior which makes sense mathematically. This is just one if those areas where a decision has been made not to let Python norms dictate something that doesn't make mathematical sense.

@Wrzlprmft
Copy link
Author

Sympy has isolated Boolean behavior which makes sense mathematically.

… and that’s all fine and proper. But that doesn’t mean that SymPy has to break with Python standard with respect to Python booleans.

As far as I can see, there are only a few ways to obtain a SymPy boolean anyway:

  • Import it directly.
  • Sympify a Python boolean (and nothing else), "True", or something similar.
  • Use a SymPy function returning a SymPy boolean.
  • Use bitwise operators with a SymPy expression and a Python boolean or integer. (The latter yields the same result as if the integer had been converted to a boolean first.)

Neither of these breaks with the suggested change.

@Wrzlprmft
Copy link
Author

Sympy has isolated Boolean behavior which makes sense mathematically.

… and that’s all fine and proper. But that doesn’t mean that SymPy has to break with Python standard with respect to Python booleans.

As far as I can see, there are only a few ways to obtain a SymPy boolean anyway:

  • Import it directly.
  • Sympify a Python boolean (and nothing else), "True", or something similar.
  • Use a SymPy function returning a SymPy boolean.
  • Use bitwise operators with a SymPy expression and a Python boolean or integer. (The latter yields the same result as if the integer had been converted to a boolean first.)

Neither of these is affected or somewhat interacting with the suggested change, as far as I can see.

@asmeurer
Copy link
Member

But the problem is that we want the ability to use True and False in SymPy boolean contexts. So sympify() currently converts those to the corresponding SymPy types. This is more important than having them work in integer contexts, since that's much rarer. You could try to be context sensitive, but it adds complication to the code, and you can't always guess correctly in every case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants