subtypes: fast path for Union/Union subtype check #14277

huguesb · 2022-12-10T23:22:30Z

Enums are exploded into Union of Literal when narrowed.

Conditional branches on enum values can result in multiple distinct narrowing
of the same enum which are later subject to subtype checks (most notably via
is_same_type, when exiting frame context in the binder). Such checks would
have quadratic complexity: O(N*M) where N and M are the number of entries
in each narrowed enum variable, and led to drastic slowdown if any of the enums
involved has a large number of values.

Implement a linear-time fast path where literals are quickly filtered, with
a fallback to the slow path for more complex values.

In our codebase there is one method with a chain of a dozen if statements
operating on instances of an enum with a hundreds of values. Prior to the
regression it was typechecked in less than 1s. After the regression it takes
over 13min to typecheck. This patch fully fixes the regression for us.

Fixes #13821

A5rocks · 2022-12-11T06:33:19Z

mypy/subtypes.py

@@ -269,6 +270,11 @@ def is_same_type(
    )


+class _SubtypeCheck(Protocol):
+    def __call__(self, left: Type, right: Type, *, subtype_context: SubtypeContext) -> bool:


I'm not too used to mypy's codebase -- is this the convention or is the convention to use mypy's extensions (which include a kwarg type for callables)

Let's use protocols, the extensions are soft-deprecated in my view in favor of callback protocols.

the extensions are soft-deprecated in my view in favor of callback protocols.

Not just in your view — mypy's documentation officially states that they're deprecated :) https://mypy.readthedocs.io/en/stable/additional_features.html#extended-callable-types

huguesb · 2022-12-21T07:57:41Z

Added some numbers about perf impact to hopefully motive review/merge: <1s to >13min due to regression, and back to <1s after fix.

huguesb · 2022-12-21T21:42:47Z

rebased to resolve merge conflict with @JukkaL 's recent changes to _is_subtype in #14325

JukkaL · 2022-12-21T23:23:28Z

Could you try the newly added tool misc/perf_compare.py to compare performance against master? I don't doubt that this PR makes things faster, but it would be great to see if the tool works as expected.

huguesb · 2022-12-22T07:35:01Z

Could you try the newly added tool misc/perf_compare.py to compare performance against master? I don't doubt that this PR makes things faster, but it would be great to see if the tool works as expected.

I did. A few observations:

it's very slow
the timing is very noisy on my system (macOS 12.5.1, 2018-vintage macbook pro): each run takes ~16.7+/-.8s (i.e noise floor is at ~10%)
the final calculation doesn't flag outliers (the first run for my test branch was >1s off because there was another CPU-intensive finishing up at the same time)
the reported impact is +2.6% (adjusted to +1.8% taking the outlier into account) but this is below the noise floor for my system...
have you considered using hyperfine ?

JukkaL · 2022-12-22T17:38:21Z

I ran perf_compare on my Linux desktop and here are the results (HEAD is this PR):

master                    9.654s (0.0%)
HEAD                      9.815s (+1.7%)

This is similar to what you measured and seems real. Can you try to optimize this further -- I bet you'd need to use a faster code path for small unions or something? I can also give it a try, but it might take a while because of holidays.

it's very slow

Yeah, that can't be avoided right now, unfortunately. Compilation is kind of slow, and we need to compile to get realistic results. Also without multiple measurements the noise level is too high.

the timing is very noisy on my system (macOS 12.5.1, 2018-vintage macbook pro): each run takes ~16.7+/-.8s (i.e noise floor is at ~10%)

On my 2019-vintage Linux desktop noise can be ~5%, and we can work around that to some degree by taking 15 measurements. Even if each measurement has 10% noise, the average over many measurements will typically have much less expected noise.

If you don't have access to a desktop computer, a big enough cloud instance might produce more precise results and wouldn't be expensive, since you'd only need it for less than one hour. (Obviously this may not be practical.)

the final calculation doesn't flag outliers

Good point, I should add automatic outlier filtering to the tool.

the reported impact is +2.6% (adjusted to +1.8% taking the outlier into account) but this is below the noise floor for my system...

As I discussed above, the result when outliers are taken out may actually be fairly accurate due to averaging over multiple samples.

have you considered using hyperfine?

I haven't seen it before, but it looks interesting. We could add it as an optional "backend" to the script. (I don't want to require any extra deps by default.)

huguesb · 2022-12-22T21:00:20Z

I ran perf_compare on my Linux desktop and here are the results (HEAD is this PR):
master                    9.654s (0.0%)
HEAD                      9.815s (+1.7%)
This is similar to what you measured and seems real. Can you try to optimize this further -- I bet you'd need to use a faster code path for small unions or something? I can also give it a try, but it might take a while because of holidays.

Probably the biggest thing to do is to decouple the Instance/Union and Union/Union codepath which I combined to reduce duplication. The former is more common by far and the extra set creations and the few extra branches are probably to blame for any observable regression.

I can definitely push an updated version for that. I can also play with further micro-optimizations but that honestly seems a little silly when weighing at slowdown below the noise floor in the common case versus a 7800% speedup in the slow case.

it's very slow

Yeah, that can't be avoided right now, unfortunately. Compilation is kind of slow, and we need to compile to get realistic results. Also without multiple measurements the noise level is too high.

The repeat checkout/compilation cost is a real problem for repeated measurements when iterating through possible approaches though. At the very least you'd want to keep the reference binary around for as long as possible instead of recompiling it every time.

Enums are exploded into Union of Literal when narrowed. Conditional branches on enum values can result in multiple distinct narrowing of the same enum which are later subject to subtype checks (most notably via `is_same_type`, when exiting frame context in the binder). Such checks would have quadratic complexity: `O(N*M)` where `N` and `M` are the number of entries in each narrowed enum variable, and led to drastic slowdown if any of the enums involved has a large number of valuees. Implemement a linear-time fast path where literals are quickly filtered, with a fallback to the slow path for more complex values. In our codebase there is one method with a chain of a dozen if statements operating on instances of an enum with a hundreds of values. Prior to the regression it was typechecked in less than 1s. After the regression it takes over 13min to typecheck. This patch fully fixes the regression for us. Fixes python#13821

huguesb · 2022-12-22T21:25:34Z

Result for the newest commit:

master                    15.925s (0.0%)
gh-13821                  15.973s (+0.3%)

github-actions · 2022-12-22T21:49:19Z

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

JukkaL · 2022-12-28T13:38:45Z

I can also play with further micro-optimizations but that honestly seems a little silly when weighing at slowdown below the noise floor in the common case versus a 7800% speedup in the slow case.

Context: I've spent several days recently reducing the impact of many (mostly) minor performance regressions. The 7800% speedup is amazing, and if we can have the speedup without impacting performance in the average case, it's even better!

A slowdown has an impact even if it's below measurement noise floor. Mypy accumulated maybe 15-20% overall slowdown in 2022 because of many minor performance regressions (see #14358). If we'd have similar yearly slowdowns for 5 years, mypy runtimes would increase by 100%. These small regressions are hard to fix months later, so if I can find a potential regression, my preference is to fix it even before merging the PR.

The repeat checkout/compilation cost is a real problem for repeated measurements when iterating through possible approaches though. At the very least you'd want to keep the reference binary around for as long as possible instead of recompiling it every time.

A good idea. Added an issue to track this: #14359

JukkaL

Thanks for the PR and for persisting with the final tweaks! I love the huge speedup.

huguesb mentioned this pull request Dec 10, 2022

mypy 0.982 run 15 minutes (previous version needs only one minute) #13821

Closed

This comment has been minimized.

Sign in to view

AlexWaygood added the performance label Dec 11, 2022

A5rocks reviewed Dec 11, 2022

View reviewed changes

hauntsaninja added the upnext label Dec 14, 2022

hauntsaninja mentioned this pull request Dec 19, 2022

Release 1.0 planning #13685

Closed

17 tasks

huguesb mentioned this pull request Dec 20, 2022

Optimize subtype checking by avoiding a nested function #14325

Merged

huguesb force-pushed the gh-13821 branch from be9760a to 5519997 Compare December 21, 2022 21:41

huguesb force-pushed the gh-13821 branch from 5519997 to cc5bab6 Compare December 21, 2022 21:43

This comment has been minimized.

Sign in to view

huguesb force-pushed the gh-13821 branch from cc5bab6 to 863da89 Compare December 22, 2022 21:24

JukkaL mentioned this pull request Dec 28, 2022

Reuse compiled baseline in perf_compare #14359

Open

JukkaL approved these changes Dec 28, 2022

View reviewed changes

JukkaL merged commit 1f8621c into python:master Dec 28, 2022

AlexWaygood mentioned this pull request Sep 29, 2024

[red-knot] Add a benchmark involving realistic code that creates large unions astral-sh/ruff#13549

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subtypes: fast path for Union/Union subtype check #14277

subtypes: fast path for Union/Union subtype check #14277

huguesb commented Dec 10, 2022 •

edited

Loading

This comment has been minimized.

A5rocks Dec 11, 2022

JelleZijlstra Dec 11, 2022

AlexWaygood Dec 11, 2022 •

edited

Loading

huguesb commented Dec 21, 2022

huguesb commented Dec 21, 2022

This comment has been minimized.

JukkaL commented Dec 21, 2022

huguesb commented Dec 22, 2022

JukkaL commented Dec 22, 2022

huguesb commented Dec 22, 2022

huguesb commented Dec 22, 2022

github-actions bot commented Dec 22, 2022

JukkaL commented Dec 28, 2022

JukkaL left a comment

subtypes: fast path for Union/Union subtype check #14277

subtypes: fast path for Union/Union subtype check #14277

Conversation

huguesb commented Dec 10, 2022 • edited Loading

This comment has been minimized.

A5rocks Dec 11, 2022

Choose a reason for hiding this comment

JelleZijlstra Dec 11, 2022

Choose a reason for hiding this comment

AlexWaygood Dec 11, 2022 • edited Loading

Choose a reason for hiding this comment

huguesb commented Dec 21, 2022

huguesb commented Dec 21, 2022

This comment has been minimized.

JukkaL commented Dec 21, 2022

huguesb commented Dec 22, 2022

JukkaL commented Dec 22, 2022

huguesb commented Dec 22, 2022

huguesb commented Dec 22, 2022

github-actions bot commented Dec 22, 2022

JukkaL commented Dec 28, 2022

JukkaL left a comment

Choose a reason for hiding this comment

huguesb commented Dec 10, 2022 •

edited

Loading

AlexWaygood Dec 11, 2022 •

edited

Loading