Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer union from conditional definition in if statement #6233

Open
JukkaL opened this issue Jan 21, 2019 · 19 comments
Open

Infer union from conditional definition in if statement #6233

JukkaL opened this issue Jan 21, 2019 · 19 comments
Labels
false-positive mypy gave an error on correct code feature

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 21, 2019

Maybe if a variable is initialized both in if and else blocks, the resulting type should be the union of the types of the initializers (only when using --allow-redefinitions):

if foo():
    x = 0
    # Type of x is 'int' here
else:
    x = ''  # No error here
    # Type of x is 'str' here
reveal_type(x)  # Union[int, str]

Currently the second assignment generates an error, and the type of x is int everywhere.

This is a generalization of #6232 and related to #1174 (but not a direct follow-up issue).

@JukkaL JukkaL added feature priority-1-normal false-positive mypy gave an error on correct code labels Jan 21, 2019
@bluetech
Copy link
Contributor

This pattern occurs a lot in our code, specifically with None:

if condition:
    x: Optional[int] = None
else:
    x = 5

@ilevkivskyi
Copy link
Member

@bluetech Your example already works without the explicit annotation. None is special-cased in mypy, it will infer an optional type (if None appears in the first definition).

@bluetech
Copy link
Contributor

@ilevkivskyi Thanks for letting me know. I didn't realize the order matters. Looking at our code, we indeed usually put the None case in the else.

Maybe this feature can subsume this special case.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 28, 2019

@bluetech This feature would not entirely obsolete the special case, unless we extend it to handle cases like this:

x = None
if cond():
    x = 5

It's unclear to me whether we should accept code like this:

x = 0
if cond():
    x = ''  # Should this be okay for an arbitrary rvalue type?

@dumblob
Copy link

dumblob commented Jan 7, 2020

x = 0
if cond():
    x = ''  # Should this be okay for an arbitrary rvalue type?

I think it's okay for an arbitrary rvalue if after end of the if cond(): body the first use of x will be as an lvalue (incl. for x in ..., x = , lambda: x = , lambda: for x in , etc.).

@aperiodic
Copy link

aperiodic commented Jan 9, 2020

We've recently started using mypy in some of our Python projects, and we're running into this issue in places where we initialize an optional variable using this pattern:

if cond():
    x = <some_value>
else:
   x = None

The change proposed by this issue would solve it, but extending the special case to cover assigning to None in the else branch also would. Is the proposed change accepted and just waiting for someone to take on the work? If so, we would be interested in doing that. If not, would extending the special case be accepted?

@msullivan
Copy link
Collaborator

I would certainly accept extending the special case. I think this is actually a huge pain point.

The core suggestion in this issue is only being proposed for when --allow-redefinitions is enabled, so I'd definitely want to see at least the None case handled otherwise.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 13, 2020

I agree with @msullivan. In my experience, code like @aperiodic'd example comes up very often, and it would be great if the None case would be supported by default, for symmetry with this case which already works:

if cond():
    x = None
else:
    x = <some_value>  # Already supported

So there are two sub-issues:

  1. Support x = <value> followed by x = None with default flags (perhaps only if x is not read in between).
  2. Support conditional definition with arbitrary types if using --allow-redefinition.

Case (1) seems much more important, but if also supporting (2) is not much more work, it may make sense to implement both at the same time.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 13, 2020

Increased priority to "high" since the None case is one of the biggest type inference pain points in practice.

@aperiodic
Copy link

I think that the general case described here should be supported without the --allow-redifinitions flag. My reasoning is that the different branches of an if statement are not in the same scope, so initializing it in every branch should not be viewed as redefining that variable in every branch beyond the first, because there is no definition in scope in those other branches. For example, this code:

if False:
  x = 1
else:
  x += 1

results in UnboundLocalError: local variable 'x' referenced before assignment.

My proposal is that, when a variable is initialized in all the branches of an if statement that includes an else branch, then its type is inferred to be the union of all the types it is assigned to in all the branches of that if statement, regardless of whether or not mypy was given the --allow-redefinitions flag.

@unordinateur
Copy link

unordinateur commented Jan 15, 2020

I would agree with @aperiodic's suggestion, given, of course, that the variable was not defined before the if. (@JukkaL's example in his 2019-01-28 comment should not work unless --allow-redefinitions is activated.)

There might be some value to implement another flag, something like --disable-union-inference to disable that behavior; and make it an error to access a variable declared in different branches of an if-else with the same name but different types, after the if-else. (If the life time don't overlap, I don't think this should be an error, this is related to #6232 )This flag would also disable the current special case of infering an Optional type for a variable initialized with None and then given an other value later.

@unordinateur
Copy link

unordinateur commented Jan 15, 2020

To summarize, here's what I think should be the behaviour. I see four possible situations:

Case A: No union, no redefinition

if cond():
   x = ''
   print(x)
else:
   x = 0
   print(x)

Case B: Union, no redefinition

if cond():
   x = ''
else:
   x = 0
   
print(x)

Case C: No union, redefinition

x = 0
if cond():
   x = ''
   print(x)
else:
   print(x)

Case D: Union, redefinition

x = 0
if cond():
   x = ''
    
print(x)

By default, cases A and B should be allowed.
With --allow-redefinition, A, B, C and D should be allowed.
With --disable-union-inference, only case A should be allowed.
With both --allow-redefinition and --disable-union-inference, cases A and C would be allowed.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 16, 2020

@unordinateur I like your proposal. The main issue I see with changing the default behavior is that this may generate a lot of apparent false positives for existing code, in particular since some inferred types will be different. For example:

if cond():
    x = [1.0]
else:
    x = [1]
reveal_type(x)

Currently the type of x will be List[float], but if we change the semantics, it would likely be Union[List[float], List[int]], which may be a problem, since List is invariant. I've seen variants of the above example quite often in production code.

Another possible problem case:

if cond():
    x = [1]
else:
    x = []
reveal_type(x)

If we assume that the two assignments initialize two separate variables, mypy might require an annotation for the second assignment, since it can't infer the list item type.

@unordinateur
Copy link

unordinateur commented Jan 16, 2020

I understand! Union inference might be disabled by default then; and be optionally enabled by a flag. (--infer-union?) To keep the current default behavior; None would still have to be treated as a special case that is always allowed as a first assignment notwhistanding the --allow-redefinition and --infer-union flags. This special case could be disabled with the existing --no-implicit-optional flag; or be disabled with a new flag.

With that proposal:

  • By default, cases B, C and D are allowed if the first assignment is None. (See below)
  • By default, all the branches of conditional statements are treated as sharing the same scope. This would allow all the cases, but only if the branches have compatible types. If they don't have compatible types, none of the cases are allowed.
  • With --infer-union, A and B are allowed for arbitrary types. Within their respective branch, each x has a single type, but after the end of the else, the type of x is the union of its type in each branch.
  • With --allow-redefinition, A and C would be allowed for arbitrary types.
  • With both --allow-redefinition and --infer-union, cases A, B, C and D would be allowed for arbitrary types.

The semantics of the special case for None are not exactly clear: It should be compatible with the current rules, and also be as consistent as possible with the various flags. My suggestion is:

  • For cases A and B, if one of the branches is None, then it should act exactly as if --infer-union was enabled. This extends the current behavior, since the current behavior would only be ok with having a None in the first branch.
  • For cases C and D, they only work if the initial assignment is None. This is exactly the current behavior.

So, these cases would be allowed:

if cond():
   x = None
   reveal_type(x)  # None
else:
   x = 0
   reveal_type(x)  # int

reveal_type(x)  # Optional[int]
if cond():
   x = 0
   reveal_type(x)  # int
else:
   x = None
   reveal_type(x)  # None

reveal_type(x)  # Optional[int]
x = None
reveal_type(x)  # Optional[int]
if cond():
   x = 0
   reveal_type(x)  # Optional[int]
reveal_type(x)  # Optional[int]

Whereas this code wouldn't work unless --allow_redefinition and --infer_union would be enabled (I'm writing the output of reveal_type in that case):

x = 0
reveal_type(x)  # int
if cond():
   x = None
   reveal_type(x)  # None

reveal_type(x)  # Optional[int]

Without the final reveal_type, the code in this last case should also work with only --allow_redefinition enabled.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 17, 2020

@unordinateur Some comments:

  1. I'd prefer if we didn't need a new --infer-union flag. Either this might be enabled by default (perhaps we can make it work; see below for an idea), or enabled by --allow-redefinition. Too many mode options make things confusing and make it harder to maintain mypy.

  2. At least the x = <value> followed by x = None case should be enabled by default, since it's very common and it already works if x = None comes first. We could enable this without full union inference if the latter doesn't work out.

  3. I suspect that we need to figure out how to support empty container initialization after non-empty initialization to make union inference practical:

if cond():
    x = [1]
else:
    x = []
reveal_type(x)  # Should continue to be List[int]

One potential way to support this would be to link the two definitions so that the inferred type from the first assignment would be used for the type context of the second assignment. Another potential way would be to infer a partial list type in the second assignment and allow it to be merged with a non-partial list type from the first assignment.

@unordinateur
Copy link

unordinateur commented Jan 18, 2020

1 and 2. Agreed!

Now for 3. I think that the following approach could be taken:

  1. Infer the type of x based on the assignment of the first branch.
  2. Use that type as context for the assignment of all the other branches. If this succeeds for all branches, we use this type for the type of x.
  3. Otherwise, we repeat the process, but using the type inferred from the assignment of the second branch as a candidate for the type of x, and so on, until a suitable type is found, or until all the branches have been analysed without finding a suitable type, in which case mypy produces an error.

As a special case, in step 2, if any of the other branch contains None, we don't consider it a failure, but instead consider the type of x as being the Optional of the tested type.

This should make all your examples work, I believe. It would however fail on my examples above, for example:

if cond():
   x = ''  # str
else:
   x = 0  # int

In this case, the user would have to explicitly tag x in one of the branches as Union[str, int], for it to work.

I think this makes sense. It would be pretty unusual, I think, for the user to voluntarily want a variable to be either of two unrelated types. Having mypy see that as an error by default until the used explicitly requests it makes sense.

To be clear: the behaviour I suggest above is not union inference. It's more like extending the current behaviour to support the cases we identified above as important to support, while preserving the behavior as much as possible for currently valid code.

@gtarabat
Copy link

Is there any update on this issue?

@randolf-scholz
Copy link
Contributor

This would indeed be really nice.

Consider this example: https://mypy-play.net/?mypy=latest&python=3.10&gist=45faed499d453baccdb671b4985facb0

def f(x: bool) -> None:
    a = "string" if x else 123

    if x:
        b = "string"
    else:
        b = 123
        
    match x:
        case True:
            c = "string"
        case False:
            c = 123
        case _:
            raise ValueError
    
    reveal_type(a)  # object
    reveal_type(b)  # str
    reveal_type(c)  # str

The result should be str | int in all 3 cases. I wouldn't want to have this mangled with the allow-redefinition flag, because technically there is no re-definition occurring here.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jun 2, 2023

Lowered priority since both the None cases have been supported for some time now, and the implementation may be complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
false-positive mypy gave an error on correct code feature
Projects
None yet
Development

No branches or pull requests

9 participants