Skip to content

Improve Tuple/Sequence/Iterable overlapping #5315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 22, 2018

Conversation

emmatyping
Copy link
Member

Fixes #4466

@ilevkivskyi hopefully this is similar to what you were expecting?

@ilevkivskyi ilevkivskyi self-requested a review July 4, 2018 12:04
@ilevkivskyi ilevkivskyi self-assigned this Jul 4, 2018
Copy link
Collaborator

@Michael0x2a Michael0x2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not Ivan, but I'm also bored so here's a code review.

Apologies in advance if the review ended up being incoherent in places -- I'm sort of bouncing between different things atm.

mypy/meet.py Outdated
@@ -192,9 +203,8 @@ def is_overlapping_tuples(t: Type, s: Type, use_promotions: bool) -> Optional[bo
if all(is_overlapping_types(ti, si, use_promotions)
for ti, si in zip(t.items, s.items)):
return True
# TupleType and non-tuples do not overlap
return False
# No tuples are involved here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting rid of the return False introduces a regression here -- we would no longer return 'False' when checking to see if Tuple[A] and Tuple[B] are overlapping, for example.

I think this is probably hard to trigger because mypy doesn't ever really call is_overlapping_types directly/usually runs other checks first, but here's one example program:

from typing import *

class A: pass
class B: pass

possibles: Tuple[int, Tuple[A]]
x: Optional[Tuple[B]]

if x in possibles:
    reveal_type(x)
else:
    reveal_type(x) 

Previously, this used to reveal Union[Tuple[B], None] on both lines -- now, the first reveal_type will incorrectly reveal Tuple[B].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this and added a regression test. Thank you for pointing this out.

mypy/meet.py Outdated
if isinstance(t, TupleType):
if isinstance(s, Instance) and is_subtype(t, s):
if all(is_overlapping_types(ti, si, use_promotions)
for ti, si in zip(t.items, s.args)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this zip quite does what you want -- if somebody were to call is_overlapping_types(Tuple[int, str], Sequence[int]), the zip would be zipping together [int, str] and [int] -- the first list is truncated so we only end up comparing the pair (int, int).

I also think having both the subtype check on line 146 and the overlapping check on lines 147-148 are unnecessary -- I believe is_subtype(t, s) implies that t is overlapping with s.

The converse isn't necessarily true though -- it's possible somebody somebody could pass in two types that ought to overlap despite not being a subtype of one another: for example, is_overlapping_types(Union[A, B], Union[B, C]). So if the former call is accepted, we probably ought to accept is_overlapping_types(Iterable[Union[A, B]], Tuple[Union[B, C]]) as well for symmetry.

That said, I think this is a pretty extreme edge case, and mypy currently calls is_overlapping_types under very limited circumstances anyways. In that case, it maybe might be worth just simplifying the logic here so it just defers to calling is_subtype(...) and doesn't bother recursively checking whatever's inside the containers.

So, maybe just something like if is_subtype(t, s) or is_subtype(s, t): return True.

mypy/meet.py Outdated
if isinstance(s, TupleType):
if isinstance(t, Instance) and is_subtype(s, t):
if all(is_overlapping_types(ti, si, use_promotions)
for ti, si in zip(s.items, t.args)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ti corresponds to s.items and si corresponds to t.args, which is a little confusing. We should probably swap one of the expressions to keep things consistent.

@ilevkivskyi ilevkivskyi removed their assignment Jul 5, 2018
@ilevkivskyi ilevkivskyi removed their request for review July 5, 2018 09:40
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have few things to add here.

mypy/meet.py Outdated
return False
# No tuples are involved here
# TupleType and non-tuples are handled later
# Otherwise, no tuples are involved
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still make sense to return None here instead of False here?

mypy/meet.py Outdated
return True

if is_subtype(t, s) or is_subtype(s, t):
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand I would like to see this general case somewhere at the top of this function as an obvious case of overlap. However the fact that such an obvious case is still not here makes me think of "Chesterton's fence". Is there any danger of putting this here (like performance costs)?

An alternative may be to put return is_subtype(t, s) or is_subtype(s, t) instead of return False near the end of is_overlapping_tuples.

If you think the performance is still fine, then I would keep this check here (or maybe even higher above), but add few more tests that cover the other (non-tuple) situations not covered by existing if branches.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree performance is something we might need to think about here, since we're basically traversing the types three times instead of once now. I think the subtype cache might help mitigate this, though of course it's hard to say for certain w/o benchmarking.

I suspect the other reason why the function doesn't already do something like this was because it was originally intended to be super-simple/lightweight -- here's a snapshot from 2015 and from 2016, for example. I guess it then just steadily accumulated patches over the years?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly have a great idea of how I could benchmark this, but I am happy to move it to the end if you think that is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly have a great idea of how I could benchmark this

A good benchmark is self checking mypy mypy.

Copy link
Member Author

@emmatyping emmatyping Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self check is a decent benchmark, I never feel like mypy is large enough to see a difference, but then again, I often forget how large mypy is.

Anyway, the selfcheck test seems to perform the same within a few hundredths of a second on master and this branch (without cache in both cases), so I am not concerned about performance regressions.

mypy/meet.py Outdated
return False
# No tuples are involved here
return None
return is_subtype(t, s) or is_subtype(s, t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the selfcheck test seems to perform the same within a few hundredths of a second on master and this branch (without cache in both cases), so I am not concerned about performance regressions.

Did you measure performance when this check is here or at the top of the is_overlapping_types? If at the top of is_overlapping_types, then I would rather add if is_subtype(t, s) or is_subtype(s, t): return True at the top of is_overlapping_types (after builtins.object check) and leave this function completely "intact" then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was indeed measured from the top. I just added it after the object check.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a delay with this! Feel free to merge this after you consider one last comment.

@Michael0x2a could you please double-check this PR doesn't cause problems in internal codebases?

mypy/meet.py Outdated
if tup_overlap is not None:
return tup_overlap
if is_overlapping_tuples(t, s, use_promotions):
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is now unnecessary (and the corresponding change below) after you added the check at the top of this function.

@Michael0x2a
Copy link
Collaborator

@ilevkivskyi -- sure! I have to go to a thing, but I can re-run these tests a little later tonight.

@ethanhs -- I'm not sure if your PR will run into this case or not, but I remember running into this edge case while working on my other PR yesterday. Could you try checking to see what your PR does on this test case?

from typing import *

# E.g. maybe this was imported from a non-typed module
Parent: Any

class Child(Parent):
    def some_method(self) -> int:
        if self is None:
            return None
        return 3

I believe the current behavior of mypy is to report no errors when running in strict-optional mode. If your PR complains about the "return None", you may have to swap out the two calls to is_subtype with is_proper_subtype or something.

(The actual code that I synthesized this from does self = super(Child, self).blah() before the if statement, so the above example is a little less stupid than it looks, if only marginally.)

@JelleZijlstra
Copy link
Member

Maybe I'm missing something, but shouldn't that code sample produce an error under strict optional? It returns None but is declared to return int.

@ilevkivskyi
Copy link
Member

@JelleZijlstra mypy infers the branch as unreachable.

@emmatyping
Copy link
Member Author

@Michael0x2a thanks! That was indeed an issue, and is_proper_subtype fixed it :)

I also reverted the unneeded changes as suggested by Ivan.

@Michael0x2a Michael0x2a merged commit 3fb16a2 into python:master Jul 22, 2018
@emmatyping emmatyping deleted the tuplesequence branch July 22, 2018 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants