Stop TypedDictAnalyzer from leaking synthetic types #13732

Michael0x2a · 2022-09-26T00:56:12Z

Mypy currently crashes when you try:

Creating a class-based TypedDict containing a malformed type hint
Asking it to compute fine-grained dependencies, either via running dmypy or by setting the --cache-fine-grained flag.

Here is the exact sequence of events that leads to this crash:

Since the type annotation is malformed, semanal initially gives the type annotation a type of RawExpressionType.
TypedDictAnalyzer (correctly) determines determines that the type of the malformed type annotation should be treated as just Any in:

mypy/mypy/semanal_typeddict.py

Line 289 in f5ce4ee

analyzed = self.api.anal_type(
TypedDictAnalyzer forgets to modify stmt.type like we normally do after calling self.anal_type in normal classes:

mypy/mypy/semanal.py

Line 3022 in f5ce4ee

analyzed = self.anal_type(s.type, allow_tuple_literal=allow_tuple_literal)
Mypy does use the Any type when constructing the TypeInfo for the TypedDict. This is why mypy will not crash under most conditions: the correct type is being used in most places.
Setting --cache-fine-grained will make mypy perform one final pass against the AST to compute fine-grained dependencies. As a part of this process, it traverses the AssigmentStatement's type field using TypeTriggersVisitor.
TypeTriggersVisitor is not a SyntheticTypeVisitor. So, the visitor trips an assert when we try traversing into the RawExpressionType.

Interestingly, this same crash does not occur for NamedTuples despite the fact that NamedTupleAnalyzer also does not set stmt.type:

mypy/mypy/semanal_namedtuple.py

Line 177 in f5ce4ee

analyzed = self.api.anal_type(

It turns out this is because semanal.py will end up calling the analyze_class_body_common(...) function after NamedTupleAnalyzer runs, but not after TypedDictAnalyzer runs:

mypy/mypy/semanal.py

Line 1510 in f5ce4ee

if is_named_tuple:
mypy/mypy/semanal.py

Line 1479 in f5ce4ee

if is_typeddict:

I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible. But refactoring this felt potentially disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.

ilevkivskyi · 2022-09-27T11:07:02Z

I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible.

We do this for named tuples since they can have methods. Ideally, we should not need to visit the body in normal way for typed dicts, as the semantics of assignment stmt is different. This solution is also OK however (at least in short term).

Fixes python#10007 Mypy currently crashes when you try: 1. Creating a class-based TypedDict containing a malformed type hint 2. Asking it to compute fine-grained dependencies, either via running dmypy or by setting the `--cache-fine-grained` flag. Here is the exact sequence of events that leads to this crash: 1. Since the type annotation is malformed, semanal initially gives the type annotation a type of `RawExpressionType`. 2. TypedDictAnalyzer (correctly) determines determines that the type of the malformed type annotation should be treated as just `Any` in: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_typeddict.py#L289 3. TypedDictAnalyzer forgets to modify `stmt.type` like we normally do after calling `self.anal_type` in normal classes: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L3022 4. Mypy _does_ use the `Any` type when constructing the TypeInfo for the TypedDict. This is why mypy will not crash under most conditions: the correct type is being used in most places. 5. Setting `--cache-fine-grained` will make mypy perform one final pass against the AST to compute fine-grained dependencies. As a part of this process, it traverses the AssigmentStatement's `type` field using TypeTriggersVisitor. 6. TypeTriggersVisitor is _not_ a SyntheticTypeVisitor. So, the visitor trips an assert when we try traversing into the RawExpressionType. Interestingly, this same crash does not occur for NamedTuples despite the fact that NamedTupleAnalyzer also does not set `stmt.type`: https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal_namedtuple.py#L177 It turns out this is because semanal.py will end up calling the `analyze_class_body_common(...)` function after NamedTupleAnalyzer runs, but _not_ after TypedDictAnalyzer runs: - https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1510 - https://github.com/python/mypy/blob/f5ce4ee6ca7e2d8bb1cde8a2b49865f53bbacff5/mypy/semanal.py#L1479 I'm not sure why this is: ideally, the two analyzers ought to have as similar semantics as possible. But refactoring this felt potentially disruptive, so I went for the narrower route of just patching TypedDictAnalyzer.

ilevkivskyi · 2022-10-23T22:12:48Z

Probably we should print only e.g. last 100 lines in the deferral trace. Otherwise it is very noisy. Also FWIW I don't think this PR introduced a new bug, it probably just exposing some existing flaw. Still, it is probably better to fix it before merging, as crashes are always bad.

ilevkivskyi · 2022-10-23T22:21:42Z

Actually, I think the crash may be caused by 0f4e0fb

@Michael0x2a You can try removing that .defer() and try locally. Also when you will find a simple repro, could please try if there is a similar crash on current master if you replace TypedDict with NamedTuple?

Michael0x2a · 2022-10-24T01:08:09Z

@ilevkivskyi -- whoops, didn't see your comment until just now.

I just pushed a commit which tried throwing a has_placeholder(...) in front of the anal_type call, but it looks like that ended up just breaking the recursive typeddict tests instead so it's back to the drawing board for me.

I also included a minimal repro in my commit, in case you want to take a look -- I'm not confident I fully understand when/where we're supposed to be deferring and checking for placeholder types.

(The actual crash in numpy was due to a TypedDict in numpy.random containing a field of type ndarray[Any, dtype[uint64]] -- the dtype[uint64] apparently is a placeholder type on the first pass.)

I also tried commenting out my change and commenting out the defer call from 0f4e0fb, but I still got the crash. I also tried triggering the crash with NamedTuples, but wasn't able to repro interesting enough -- not sure why its behavior would differ from TypedDicts here either.

Michael0x2a · 2022-10-24T02:31:42Z

mypy/semanal_typeddict.py

+                    #
+                    # TODO: Find some way of refactoring and partially unifying
+                    #       these two codepaths?
+                    if not has_placeholder(analyzed):


I think #13516 might have been the root cause of the previous mypy_primer failures -- the previous version of this diff probably just never worked when we enabled recursive types, so it broke once I rebased pass the diff flipping the defaults.

I'm not really sure this is the correct fix though; it feels sort of ad-hoc.

I also need to figure out what's up with the bokeh error from mypy_primer -- I'm having a difficult time repro-ing those new failures locally.

ilevkivskyi · 2022-10-24T11:26:18Z

Although the fix looks like a hack, it may be the best we can have now. Most likely the problem is that logic around placeholders in typeanal.py (in particular visit_placeholder_type(), btw IIRC all tests actually still pass if I remove it) never actually worked. This is also why on every iteration semantic analizer usually analyzes a type "from scratch" (i.e. starting from UnboundType), while your change actually makes it re-analyze an already analyzed type.

The above is just a guess I can look in more details later today or tomorrow.

ilevkivskyi · 2022-10-24T13:24:34Z

Yes, essentially my guess was right. For example in typeanal.py:

def visit_instance(self, t: Instance) -> Type:
    return t

If you make it actually traverse the type args, the test passes. But I bet there are several similar scenarios that will cause a crash. So you still should have the if.

In general, I think the weird state where typeanal.py is, because some people have a mental model that it should be "fixed point" style analyzer (as e.g. semantic analyzer is). While in fact only visit_unbound_type() has well thought out logic covering various corner cases etc. I guess we may need to agree on how typeanal.py should work, and clean it up accordingly. cc @JukkaL

Fixes python#13066 During the semanal phase, mypy opts to ignore and skip processing any malformed or illegal statements inside of a TypedDict class definition, such as method definitions. Skipping semanal analysis on these statements can cause any number of odd downstream problems: the type-checking phase assumes that all semanal-only semantic constructs (e.g. FakeInfo) have been purged by this point, and so can crash at any point once this precondition has been violated. This diff opts to solve this problem by filtering down the list of statements so we keep only the ones we know are legal within a TypedDict definition. The other possible solution to this problem is to modify mypy so we skip checking TypedDict class bodies entirely during type checking and fine-grained deps analysis. Doing this would also let address python#10007 and supersede my other diff python#13732. I decided against doing this for now because: 1. I wasn't sure if this was actually safe, especially in the fine-grained deps phase and for mypyc. 2. I think no matter what, the semanal phase should not leak semanal-only types: relaxing this postcondition would make it harder to reason about mypy. So, we'd probably want to make this change regardless of what we do in the later phases.

Fixes #13066 During the semanal phase, mypy opts to ignore and skip processing any malformed or illegal statements inside of a TypedDict class definition, such as method definitions. Skipping semanal analysis on these statements can cause any number of odd downstream problems: the type-checking phase assumes that all semanal-only semantic constructs (e.g. FakeInfo) have been purged by this point, and so can crash at any point once this precondition has been violated. This diff opts to solve this problem by filtering down the list of statements so we keep only the ones we know are legal within a TypedDict definition. The other possible solution to this problem is to modify mypy so we skip checking TypedDict class bodies entirely during type checking and fine-grained deps analysis. Doing this would also let address #10007 and supersede my other diff #13732. I decided against doing this for now because: 1. I wasn't sure if this was actually safe, especially in the fine-grained deps phase and for mypyc. 2. I think no matter what, the semanal phase should not leak semanal-only types: relaxing this postcondition would make it harder to reason about mypy. So, we'd probably want to make this change regardless of what we do in the later phases.

ilevkivskyi

Just to be clear, I think this is a good solution for now, no need to wait for a long-term plan for typeanal.py.

…thetic-types

ilevkivskyi · 2022-11-14T18:53:36Z

@Michael0x2a It looks like the crash still happens on master, I would propose to merge this (assuming the tests pass after my merge).

ilevkivskyi · 2022-11-14T19:28:49Z

Hm, @Michael0x2a it looks like there is actually a bug in this PR, see mypy_primer result for discord. NotRequired (and Required) are stripped when analyzing type, and then we put the stripped type back as stmt.type. So if there is more than one semantic analysis iteration (some forward ref e.g.) this information will be lost when we will again re-analyze stmt.type.

Could you please take a look?

Michael0x2a · 2022-11-16T02:06:45Z

Yeah, will do later this week.

I was stumped earlier trying to figure out how to repro the issues w/ Bokeh -- hopefully the latest discord error will give me something more useful to work with.

ilevkivskyi · 2022-11-16T10:31:14Z

@Michael0x2a Ah, the thing with Bokeh is not a bug in your PR, it is a bug(?) in --disallow-any-unimported. For example, if you have something like

import bazz # type: ignore

A = list[bazz.Bazz]

x: A
y: A
z: A

mypy will show errors on all three variables, instead of once for the type alias target. Maybe this was intentional, but most likely it was just overlooked. If you want to fix it, it is a simple fix, add self.msg.unimported_type_becomes_any("Type alias target component", res, s) and then transform Anys to special_form (to prevent more errors) in check_and_set_up_type_alias().

…thetic-types

github-actions · 2022-12-03T22:34:16Z

Diff from mypy_primer, showing the effect of this PR on open source code:

discord.py (https://github.com/Rapptz/discord.py)
- discord/components.py:282: error: Incompatible types (expression has type "Union[int, int, int, int, int, int, int, int]", TypedDict item "type" has type "Literal[3, 5, 6, 7, 8]")  [typeddict-item]
+ discord/components.py:281: error: Missing key "options" for TypedDict "SelectMenu"  [typeddict-item]

bokeh (https://github.com/bokeh/bokeh)
+ src/bokeh/document/json.py: note: In class "ColumnDataChanged":
+ src/bokeh/document/json.py:86:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import  [no-any-unimported]
+ src/bokeh/document/json.py: note: In class "ColumnsStreamed":
+ src/bokeh/document/json.py:93:5: error: Type of variable becomes "Dict[str, Union[Sequence[Any], ndarray[Any, dtype[Any]], Any, Any]]" due to an unfollowed import  [no-any-unimported]

ilevkivskyi · 2024-07-06T19:13:30Z

This is now superseded by #17495

Fixes #10007 Fixes #17477 This fixes the crash as proposed in #13732, but also fixes some inconsistencies in `Any` types exposed by the fix. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>