Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic inference regression from #5699 #5738

Closed
msullivan opened this issue Oct 4, 2018 · 19 comments · Fixed by #15287
Closed

Generic inference regression from #5699 #5738

msullivan opened this issue Oct 4, 2018 · 19 comments · Fixed by #15287
Labels
bug mypy got something wrong false-positive mypy gave an error on correct code priority-0-high topic-type-variables

Comments

@msullivan
Copy link
Collaborator

Since #5699, mypy now rejects the following program (minimized from a few failures at dropbox):

from typing import TypeVar, Callable, Tuple

T = TypeVar('T')

def f(x: Callable[..., T]) -> T:
    return x()

x: Tuple[str, ...] = f(tuple)

with the error message
Incompatible types in assignment (expression has type "Tuple[_T_co, ...]", variable has type "Tuple[str, ...]")

@ilevkivskyi thoughts?

To be honest I'm not totally sure how we used to get this right, since my understanding was that we don't do the sort of instantiation of all type variables and unification that I would expect to be necessary?

@ilevkivskyi
Copy link
Member

OK, so here is the diagnosis: I remember that I checked that our repos are clean with that PR, but later @JukkaL proposed to only use the outer context invariant instances, which I didn't check. The point is that tuple is covariant (and Tuple[str, ...] translates to an instance type), so we previously (successfully) used it for inference, but not anymore.

It looks however, there is another bug this uncovered, because currently it should have inferred <nothing>, or failed with a reasonable error, like "Cannot infer type variable", instead of leaking type variable.

I believe this may be a manifestation of #1317 which is another high priority issue I always wanted to fix.

@ilevkivskyi
Copy link
Member

Yes, I am pretty sure this is a duplicate of #1317, here is a similar repro:

[case testCallableArgInference]
from typing import TypeVar, Callable

T = TypeVar('T')

def f(x: Callable[..., T]) -> T:
    return x()

def tpl(x: T) -> T:  # Really any generic thing (class or just function)
    return x

reveal_type(f(tpl))  # Revealed type is 'T`-1' (???)
[out]

I can try looking into this more today and tomorrow, but I can't guarantee it is easy to fix.

@ilevkivskyi ilevkivskyi added bug mypy got something wrong priority-0-high false-positive mypy gave an error on correct code topic-type-variables labels Oct 5, 2018
@ilevkivskyi
Copy link
Member

Continuing thinking about this and #1317 it seems there are actually two aspects here:

Fixing the first one could be easy, while the second point may be a bit tricky. Currently mypy gets almost there:

from typing import TypeVar, Callable, List

T = TypeVar('T')
S = TypeVar('S')
U = TypeVar('U')

def dec(f: Callable[[S], T]) -> Callable[[S], List[T]]: ...

def f(x: U) -> U: ...

g = dec(f)

reveal_type(g)  # Revealed type is 'def (U`-1) -> builtins.list[U`-1]'

we just need to make it def [U] ....

@JukkaL what do you think?

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 5, 2018

I agree that leaking type variables probably isn't hard to fix, once we figure out what the correct behavior would be.

I can't say how hard it would be to infer generic types. This would need some research.

@gvanrossum
Copy link
Member

I'm not sure this needs to be ready for the release even if it technically is a regression -- IIUC the code that used to pass was questionable at best, right?

@ilevkivskyi
Copy link
Member

IIUC the code that used to pass was questionable at best, right?

It is not incorrect although. I will try a bit more to see if there is a simple fix, I will have nothing by the end of the day, I would propose to just not worry about this for now.

@euresti
Copy link
Contributor

euresti commented Oct 17, 2018

This also affects a common usage of attrs:

import attr
from typing import Sequence

x: Sequence[int] = attr.Factory(list)  # Incompatible types in assignment (expression has type "List[_T]", variable has type "Sequence[int]")

Unsuprisingly, attr.Factory is defined as follows:

def Factory(factory: Callable[[], _T]) -> _T: ...

However

x: List[int] = attr.Factory(list) 

is ok.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Jul 14, 2019

I spent couple days thinking and investigating this and related issues (in total I think we have around couple dozen issues that are duplicates or related to either this or #1317). Here are results of the analysis.

Preliminary note on terminology. Just to avoid confusion about what is meant by "generic type" I want to recall that there are two different things called this way: type constructors and polymorphic types. It is better to illustrate this on an example of callback protocols:

T = TypeVar('T')
α = TypeVar('α')

class Contructor(Protocol[T]):
    def __call__(self, x: T) -> T: ...

class Poly(Protocol):
    def __call__(self, x: α) -> α: ...

Type constructors expect type arguments while polymorphic types don't (I am not sure if these are the "official" terms but let's use them). The type parameters of the latter are expected to be inferred if it appears as a callee, for the former they are just Any if missing. Note that not only callable types can be polymorphic, there are also existential types like registry: Dict[Type[α], List[α]] but mypy/PEP 484 don't support them. (I am going to use Greek letters whenever I want to emphasize that this type variable should be inferred.) As a side note Poly <: Constructor[t] and mypy understand this well.

Whole expression type inference. Here I want to briefly describe how an "ideal" (see below for why it isn't actually ideal) type inference scheme/algorithm should look like. Imagine we have a nested function call f(g(...), h(...), ...) where functions may be polymorphic. Collecting all type constraints by matching each formal parameter type against corresponding actual arguments in the whole call "tree" leads to type constraints of three forms:

  • αi <: tj or ti <: αj (upper and lower bounds)
  • αi <: βj ("linear" constraints)
  • αi <: List[βj] ("non-linear" constraints)

I tried to search the literature about whether there are any "standard" algorithms for solving these, but didn't find anything satisfying, so here is how I see this.

First we notice that the linear constraints plus lower and upper bound constraints form a directed graph where type variables αi are internal nodes and ti are leaves. So we can "propagate" lower bounds and "back-propagate" upper bounds to all reachable inner nodes.

Then we meet all upper bounds, and join all lower bounds for each node. If there is both lower bound and upper bound for a variable we choose the lower bound (if lower is not a subtype of upper it is an error), otherwise the upper bound, otherwise <nothing>. (FWIW I think I have a proof that if there are only linear constraints this will produce a valid set of solutions, hint is two type variables are connected they either both have a lower bound, or both don't.)

Next, we substitute the inferred values into non-linear constraints, so that some of them can become linear, solve the new linear subset by the constraint propagation and so on until we see no progress. For example if we start from α <: β; β <: int; α <: str; γ <: List[β], then we find that α = object; β = int; γ = List[int].

The main downside of this inference scheme (and btw other schemes I have found in literature) is that any type error made by a user will result in a cryptic message Cannot infer type variable for function "g" (which is especially annoying because we don't support explicit type application for functions).

Current state of type inference in mypy. Here is a brief summary of the current logic of the "Hanoi Tower" type inference in mypy:

  1. Erase meta-variables (type variables of the caller that are currently being inferred) from the type context.
  2. Use this erased type context against the return type to infer some type variables of the function.
  3. Apply those type variable solutions inferred above that don't violate upper bound or values.
  4. Infer types of arguments in this partial context (basically just accept them).
  5. Infer now the rest of type variables using the "inner" context -- argument types.
  6. Steps 4-5 are repeated twice: first for a subset of arguments, then for the rest, so that we infer types of lambda expressions using a richer context.
  7. Infer types of arguments in the full context and type check them.

Note that this scheme is more limited that full expression inference mostly because of the type erasure that happens in step 1 for an argument that is a function call itself, while performing the step 4 of the caller. Basically we "cut" all linear constraints and are only left with the upper- and lower-bound kind of constraint (first in the list above).

Issues in current logic and implementation. Unfortunately both inference scheme itself and its implementation has certain issues. There are four big ones:

  1. Staring the inference from the outer context results in too many false positives. Imagine even a basic situation like this:
    T = TypeVar('T')
    
    def f(xs: List[T]) -> T:
        ...
    xs: List[int]
    x: float = f(xs)
    Following the general scheme would infer T = float and cause Invalid type of argument, expected "List[float]" got "List[int]". We avoid this by having a bunch of ad-hoc hacks (or heuristics) to skip the step 2 (and consequently 3) in situations like this.
  2. The logic for sorting arguments between first and second pass of steps 4-5 looks broken. At least what I would expect it to do, what it is documented doing, and what it actually does all pairwise disagree.
  3. This scheme can cause "leaking" of type variables if an argument to match against is a polymorphic callable (see e.g. the original example in this issue, there is also a bunch of duplicates).
  4. Generic decorators don't work on generic functions, see for example Generic decorators don't work with generic functions #1317 (one of the oldest high priority issues) and Generic inference regression from #5699 #5738 (comment) just above.

Possible solution plan I think fixing these issues can be mostly done independently. However, since they are all related it is better to do this in an organized manner. Here are some ideas that I would like to propose.

Issue 1: This would be solved by the whole expression inference scheme, but as I mentioned, the latter results in bad error messages, and IMO this is a very important argument against it. But I think with just couple improvements we can make the existing inference logic almost as powerful as whole expression inference without deteriorating error messages (and maybe even make them better).

First, we need to collect and solve constraints from outer context and from the first pass arguments together (steps 2 and 5). Second, don't apply them (i.e. skip step 3 altogether), instead record the gathered information about bounds in the erased types when erasing meta-variables (so that for a nested function call we can use inferred constraints and the recorded ones together). Infer the types of second pass arguments using this augmented erased context. And finally infer the type argument using both external and full internal context.

Importantly, if there is a type inference error in this last step, then we fall back to solution without extra constraints from erased parts of outer context, and then without the argument types. So that we will get Invalid argument type error whenever possible instead of Can't infer type.

This is best illustrated on an example with function f defined above. In a call x: float = f(['a', 'b', 'c']) there is no first pass arguments (list display translates to a generic constructor call), so we only infer one constraint T <: float, now we don't apply it, but supply an erased type context List[Erased(lower=None, upper=float)] to the list call. List call infers for its type argument T' :> str from first pass arguments (and there is no second one). Trying to solve the full set T' <: float; T' :> float; T' <: str it fails, so falls back to T' = str. This returns List[str] to the initial f call, where it fails again, so we now infer T = float for f and get Invalid argument type "List[str]", expected "List[float]" which I think is a nice error for this call.

Issue 2: In some sense this should be "just fixed", i.e. we should clean up the implementation and docs here. Importantly, if we go with the above solution, I think we should bias the selection to only put rock solid arguments in the first pass, i.e. those that work well in empty context, like RefExprs, calls to non-generic functions, and similar things. Everything else should go in the second pass.

Issue 3: The fix for this issue is straightforward -- type variables of polymorphic callables should participate in the inference as intermediate nodes for linear constraint propagation. (Caveat: we should be careful to not confuse type variables bound by an enclosed function with polymorphic variables, they currently can have the same name and id.) This is better illustrated on an example:

X = TypeVar('X')
T = TypeVar('T')

def foo(x: Callable[[int], X]) -> List[X]:
    ...
def id(x: T) -> T:
    ...
y = foo(id)

Here inferred type of y should be List[int] (not List[T] as currently). This happens as following, there is no external context, but there is a first pass argument so we just infer constraints for Callable[[T], T] <: Callable[[int], X], this yields T <: X; int <: T. Propagating the constraint through T we get int <: X and infer X = int.

Issue 4: This issue can be fixed by allowing "instantiation" with less variables instead of inferring <nothing> for "unbounded" type variables with linear constraints. Let's use the example from the comment above:

T = TypeVar('T')
S = TypeVar('S')
U = TypeVar('U')

def dec(f: Callable[[S], T]) -> Callable[[S], List[T]]:
    ...
def f(x: U) -> U:
    ...
g = dec(f)

Here the inferred type of g should be Callable[[α], List[α]]. This happens in a manner similar to last example, we infer constraints U <: T; U :> S and propagating through U we get S <: T. Now is the interesting step: instead of inferring S = T = <nothing> (because there are no bounds), we introduce a new variable S = T = α and get the desired result. There is one important caveat, this procedure can easily generate existential types, for example if dec would have a signature like:

def dec(f: Callable[[S], T]) -> Dict[S, T]:
    ...

we would end up inferring Dict[α, α]. So we need a function apply_polymorphic_if_possible() that will check if application will not generate existential types in return, and fall back to applying <nothing> (and thus likely Need type annotation for variable) otherwise.

I am not sure what is the best order to fix these, 1 and 2 are better fixed together and in total can take a week to fix, while 3 and 4 are smaller. Any thoughts or comments?

cc @Michael0x2a

@Michael0x2a
Copy link
Collaborator

Hmm, I wonder if it might be possible to improve the error handling of the "whole expression inference" algorithm.

Basically, it seems to me that there are only four ways the whole expression inference algorithm can fail to produce a valid mapping of nodes to types:

  1. We end up inferring that some node α has inconsistent bounds. That is, the upper and lower bounds U and L do not satisfy L <: U.
  2. After propagating inferred types, we end up with an inconsistent arrow in the graph. For example, suppose we have as input Set[α] <: List[β] and infer α = int and β = str, causing us to end up with the bad relationship Set[int] <: List[str]
  3. We have some nodes that do not have a concrete lower bound/have a lower bound of <nothing>. For example, we have as input α <: β but no way of determining what α or β are supposed to be.
  4. The input graph contains cycles/is not a DAG.

I think we might be able to handle cases 1 through 3 by making the following changes:

  1. For case 1, if some node α has inconsistent upper and lower bounds, assign it some sort of InconsistentBounds(upper=U, lower=L) object instead of giving up and returning an error. Then, when we try applying the mapping to deduce the argument/return types, we check for the presence of these special objects and construct an appropriate error message.

    So for example, if we have as input int <: α, α <: str, α <: β, we'd return the mapping α = InconsistentBounds(lower=int, upper=str), β = str.

    (We could have also assigned this InconsistentBounds object to β during propagation, but in order to prevent these objects from "spreading" and causing lots of error messages, we could always make sure to use either the upper or lower bounds when propagating.)

  2. For case 2, label each subtyping relationship we pass in and make sure to preserve the labels whenever the algorithm internally transforms the graph. This means when we hit some inconsistency, we can use the labels to figure out exactly where that inconsistency is happening/give us the context we need to make a better error message.

    For example, suppose we have the following:

    T = TypeVar('T')
    S = TypeVar('S')
    
    def f(xs: List[T]) -> T: ...
    def g(y: S) -> Set[S]: ...
    
    a: str = f(g(3))
    

    Then we'd pass in these labeled subtyping relationships as input:

    T      <:-- return type of f -- str
    Set[S] <:----- arg 1 of f ----- List[T]
    int    <:----- arg 1 of g ----- S
    

    After one propagation pass, we end up with the mapping T = str, S = int.

    Then when we try applying this mapping to the graph, we realize the second relationship results in an inconsistency: Set[str] <: List[int] is not true and so return it alongside of our mapping.

    And since the returned relationship is labeled, the caller can use it to reconstruct a more meaningful error message.

  3. For case 3, we use the same fixes you proposed when discussing "Issue 4" (including adding a final check to replace singleton nodes/existential types with <nothing>).

I think the only case where it's difficult to construct a good error message for is case 4 (the graph is not a DAG). But tbh I think such cases are pretty rare, maybe even impossible, so just failing with a bland "Could not infer type" error might be good enough in this case.

Caveats: I haven't really thought about this problem too deeply, so there could be some flaws with the idea that I'm overlooking. And even if this modified algorithm does work, I think it could still be reasonable to go with your proposal -- iterative changes to a known algorithm are generally less risker compared to a complete rewrite, etc etc...

@ilevkivskyi
Copy link
Member

@Michael0x2a Thanks for ideas! Here are few comments:

  1. After propagating inferred types, we end up with an inconsistent arrow in the graph. For example, suppose we have as input Set[α] <: List[β]

Such constraint can't get into the list initially, I didn't clarify this, sorry. I think we should "dig through" the type components until we get a bare variable on either side of constraint. IOW I don't have problems with how constraints are found currently (well except for inferring constraints against unions, but this is a totally different story).

The input graph contains cycles/is not a DAG.

This is not actually an error. It is totally fine to have cycles i.e. have a DG that is not a DAG. Moreover, this is a quite typical situation for invariant containers to produce tiny loops like α <: β; β <: α. We still can (back-)propagate constraints through such graph (just should be careful and use an appropriate graph traversal).


To summarize I think the problems/situations 2-4 in your list are non-issues, and problem 1 is really the root of all evil. I am however not sure I understand how you propose to use your solution to give the better error message. Could you please explain it on example with x: float = f(['a', 'b', 'c'])? Also another clarification here, the whole expression inference happens for each top-level expression once (with errors disabled), then we apply all inferred arguments, then perform type checking.

@Michael0x2a
Copy link
Collaborator

Such constraint can't get into the list initially, I didn't clarify this, sorry. I think we should "dig through" the type components until we get a bare variable on either side of constraint. IOW I don't have problems with how constraints are found currently (well except for inferring constraints against unions, but this is a totally different story).

Oh, so given the constraint Set[α] <: List[β], we're actually ok with digging down and passing in α <: β instead?

What happens if we have a constraint that looks like Foo[α, β] <: Bar[γ]?

Could you please explain it on example with x: float = f(['a', 'b', 'c'])?

Sure! Assuming that the type signature of f is f(x: List[T]) -> T, we initially pass in the constraintsList[str] <: List[T] and T <: float to the constraint-solving algorithm.

At some point, these constraints get refined down to str <: T and T <: float. I initially assumed this would happen inside of the algorithm, but based on your comments, it seems it's actually the caller's responsibility to simplify this down? But it's not too important; either way is fine.

After performing propagation, we end up with T having inconsistent lower and upper bounds of str and float. So, our algorithm returns the mapping {T: InconsistentBounds(lower=str, upper=float)}.

Once the caller receives this mapping, it realizes there must be a type error of some kind since the mapping contains an InconsistentBounds. So to produce nice error messages, it deconstructs this mapping into {T: str} and {T: float} (one mapping uses the lower bound, the other the upper) and produces two types as we deduce the arg types and return type. If after the application the two types we produce are different, we report an error for that particular arg or return.

In this case, when we try applying these two mappings we get List[str] and List[float] for the argument. These types are different, so we produce an error like Argument 1 to "f" has incompatible type "List[str]"; expected "List[float]".

This error is actually different from the error mypy produces today, which is Incompatible types in assignment (expression has type "str", variable has type "float").

I think both errors are fine tbh, but I don't think it'll be too hard to find a way to continue biasing towards "pushing up" errors to the return type. For example, we could maybe try running this "solve constraints then apply" step twice -- first without including the return type context as a constraint, then if that doesn't return any errors trying again with it.

(Or alternatively, we could apply these two mappings without doing any error handling to produce the callee and caller signatures f(List[str]) -> str and f(List[float]) -> float respectively? Then pass these two signatures to the type checker and let it do its usual type-checking.)

@ilevkivskyi
Copy link
Member

Oh, so given the constraint Set[α] <: List[β], we're actually ok with digging down and passing in α <: β instead?

No, this will infer no constraints because this can never be satisfied. But for the cases like List[α] <: Iterable[β] we do infer α <: β using map_instance_to_supertype() (or a similar thing for protocols). You can read about this in constraints.py, as I said there is nothing there I have problems with except inference against unions.

Sure! [...]

Thanks for explanation. But it looks like this will not work well. I should have been more explicit here again. The point is that a list literal is internally translated to a generic function call (see my explanation for how errors would generate in the compromise inference scheme). So that ['a', 'b', 'c'] is roughly:

def _list(*args: T) -> List[T]: ...
_list('a', 'b', 'c')

This is actually why x: List[float] = [1, 2, 3] works in mypy.

Taking this into account and renaming type variable of _list to T' to avoid confusion (and as I did in the long comment) full expression inference will get us the following set of constraints: T <: float; T :> T'; T <: T'; T' :> str. Following your idea this should generate InconsistentBounds(lower=str, upper=float) for both T and T'. So now when you will try to apply all four options this will generate two errors that confusingly contradict each other.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 16, 2019

@ilevkivskyi I like your incremental approach of improving type inference. The general scheme looks good, though I have a few questions. It would be good to have a few more worked examples, such as some already supported tricky cases (such as cases involving lambda and/or overloads), and open bugs that this would fix.

Beyond the difficulty of generating good error messages, the "whole expression inference" approach also seems much more work -- especially more work needed for any incremental improvement, as it seems basically all-or-nothing to me.

Second, don't apply them (i.e. skip step 3 altogether), instead record the gathered information about bounds in the erased types when erasing meta-variables (so that for a nested function call we can use inferred constraints and the recorded ones together).

Does this mean using the new Erased(lower, upper) types? How would these types work -- would we need to support all type operations with them? An early prototype that eventually evolved into mypy had a concept of "range type" during type inference, and I hit various difficulties with them and eventually abandoned them. I can't recall the specifics, though -- your idea may be sufficiently different that it doesn't have the same issues.

I think we should bias the selection to only put rock solid arguments in the first pass, i.e. those that work well in empty context, like RefExprs, calls to non-generic functions, and similar things. Everything else should go in the second pass.

How would this deal with cases where there is both a generic function call (say, [1, 2, 3]) and a lambda as an argument, and the type of lambda depends on the return value of the generic function call? The map() builtin is a good example where these happen with some frequency.

@ilevkivskyi
Copy link
Member

We briefly discussed Jukka's questions privately, here is a summary.

Does this mean using the new Erased(lower, upper) types?

We can re-use the existing ErasedType for this and add two extra attributes to it. All the existing interactions of this type will hold, we would only update its behaviour for constraint inference.

How would this deal with cases where there is both a generic function call (say, [1, 2, 3]) and a lambda as an argument

It looks like we might need to introduce priorities that will determine order of processing arguments to make this kind of things robust.

gabbard pushed a commit to isi-vista/immutablecollections that referenced this issue Jul 25, 2019
@ilevkivskyi
Copy link
Member

An example of questionable behaviour that stems from current logic of splitting arguments in two passes (appeared in a discussion on typeshed):

def g(x: T, y: Callable[[], T]) -> None: ...
def not_a_lambda() -> str: ...

g(42, not_a_lambda)  # this currently fails, but should infer T = object

The-Compiler added a commit to qutebrowser/qutebrowser that referenced this issue Jan 12, 2020
We sometimes get this from mypy:

    Traceback (most recent call last):
      [...]
      File "mypy/main.py", line 89, in main
      File "mypy/build.py", line 166, in build
      File "mypy/build.py", line 235, in _build
      File "mypy/build.py", line 2611, in dispatch
      File "mypy/build.py", line 2911, in process_graph
      File "mypy/build.py", line 2989, in process_fresh_modules
      File "mypy/build.py", line 1919, in fix_cross_refs
      File "mypy/fixup.py", line 25, in fixup_module
      File "mypy/fixup.py", line 89, in visit_symbol_table
      File "mypy/fixup.py", line 45, in visit_type_info
      File "mypy/fixup.py", line 91, in visit_symbol_table
      File "mypy/nodes.py", line 874, in accept
      File "mypy/fixup.py", line 136, in visit_var
      File "mypy/types.py", line 234, in accept
        cr_running = gi_running
      File "mypy/fixup.py", line 169, in visit_type_alias_type
      File "mypy/fixup.py", line 276, in lookup_qualified_alias
      File "mypy/fixup.py", line 290, in lookup_qualified
      File "mypy/fixup.py", line 299, in lookup_qualified_stnode
      File "mypy/lookup.py", line 47, in lookup_fully_qualified
    AssertionError: Cannot find component 'pending_download_type' for
    'qutebrowser.browser.webkit.mhtml._Downloader.pending_download_type'

This seems to help...

See:

python/mypy#5738
python/mypy#7281
The-Compiler added a commit to qutebrowser/qutebrowser that referenced this issue Jan 13, 2020
We sometimes get this from mypy:

    Traceback (most recent call last):
      [...]
      File "mypy/main.py", line 89, in main
      File "mypy/build.py", line 166, in build
      File "mypy/build.py", line 235, in _build
      File "mypy/build.py", line 2611, in dispatch
      File "mypy/build.py", line 2911, in process_graph
      File "mypy/build.py", line 2989, in process_fresh_modules
      File "mypy/build.py", line 1919, in fix_cross_refs
      File "mypy/fixup.py", line 25, in fixup_module
      File "mypy/fixup.py", line 89, in visit_symbol_table
      File "mypy/fixup.py", line 45, in visit_type_info
      File "mypy/fixup.py", line 91, in visit_symbol_table
      File "mypy/nodes.py", line 874, in accept
      File "mypy/fixup.py", line 136, in visit_var
      File "mypy/types.py", line 234, in accept
        cr_running = gi_running
      File "mypy/fixup.py", line 169, in visit_type_alias_type
      File "mypy/fixup.py", line 276, in lookup_qualified_alias
      File "mypy/fixup.py", line 290, in lookup_qualified
      File "mypy/fixup.py", line 299, in lookup_qualified_stnode
      File "mypy/lookup.py", line 47, in lookup_fully_qualified
    AssertionError: Cannot find component 'pending_download_type' for
    'qutebrowser.browser.webkit.mhtml._Downloader.pending_download_type'

This seems to help...

See:

python/mypy#5738
python/mypy#7281
@bwo
Copy link
Contributor

bwo commented May 18, 2020

I'm not sure but I suspect this is another instance of this issue, which just bit me today:

from typing import Optional, TypeVar, Callable, Tuple

A = TypeVar('A')
B = TypeVar('B')
L = TypeVar('L')
R = TypeVar('R')
Ret = TypeVar('Ret')


def decorate(f):
    # type: (Callable[[L, R], Ret]) -> Callable[[L, R], Ret]
    ''' in real life, the type of decorator is a little more involved, so I can't just have it be `(T) -> T` '''

    assert False


def function(l, foo):
    # type: (Optional[A], Optional[B]) -> Optional[Tuple[A, B]]
    assert False


x: Optional[Tuple[str, int]] = function('s', 1)
y: Optional[Tuple[str, int]] = decorate(function)('s', 1)
reveal_type(function)
reveal_type(decorate)
reveal_type(decorate(function))
reveal_type(x)
reveal_type(y)

typechecking this file gives:

$ mypy foo.py
foo.py:21: error: Incompatible types in assignment (expression has type "Optional[Tuple[A, B]]", variable has type "Optional[Tuple[str, int]]")
foo.py:21: error: Argument 1 has incompatible type "str"; expected "Optional[A]"
foo.py:21: error: Argument 2 has incompatible type "int"; expected "Optional[B]"
foo.py:22: note: Revealed type is 'def [A, B] (l: Union[A`-1, None], foo: Union[B`-2, None]) -> Union[Tuple[A`-1, B`-2], None]'
foo.py:23: note: Revealed type is 'def [L, R, Ret] (f: def (L`-1, R`-2) -> Ret`-3) -> def (L`-1, R`-2) -> Ret`-3'
foo.py:24: note: Revealed type is 'def (Union[A`-1, None], Union[B`-2, None]) -> Union[Tuple[A`-1, B`-2], None]'
foo.py:25: note: Revealed type is 'Union[Tuple[builtins.str, builtins.int], None]'
foo.py:26: note: Revealed type is 'Union[Tuple[builtins.str, builtins.int], None]'
Found 3 errors in 1 file (checked 1 source file)

From reveal_type(decorate(function)) it seems that the genericity has been wiped out. But A and B aren't concrete types. I've been able to type decorators that don't change the function signature in the past by giving them the type (T) -> T (though I haven't tried to decorate themselves-generic functions with such a decorator), but for my actual decorator that won't work.

@RunOrVeith
Copy link

I assume this is also caused by this issue:

from typing import List, Optional
from dataclasses import field, dataclass

@dataclass
class Foo:
    x: Optional[List[str]] = field(default_factory=list)

which gives Incompatible types in assignment (expression has type "List[_T]", variable has type "Optional[List[str]]").
Mypy thus fails to match _T to str. It is able to do so if x where not Optional.

@phinate
Copy link

phinate commented Jan 26, 2021

I assume this is also caused by this issue:

from typing import List, Optional
from dataclasses import field, dataclass

@dataclass
class Foo:
    x: Optional[List[str]] = field(default_factory=list)

which gives Incompatible types in assignment (expression has type "List[_T]", variable has type "Optional[List[str]]").
Mypy thus fails to match _T to str. It is able to do so if x where not Optional.

for reference, I'm encountering this exact scenario too with Tuple

@cebtenzzre
Copy link

The way mypy does type inference also breaks calling a generic function on the right side of set.__and__, because it takes an argument of type AbstractSet[object].

from typing import TypeVar

T = TypeVar('T')

def identity(x: set[T]) -> set[T]:
    return x

x = {0, 1}

# error: Argument 1 to "identity" has incompatible type "Set[int]"; expected "Set[object]"  [arg-type]
print({0} & identity(x))

ilevkivskyi added a commit that referenced this issue Jun 18, 2023
Fixes #1317
Fixes #5738
Fixes #12919 
(also fixes a `FIX` comment that is more than 10 years old according to
git blame)

Note: although this PR fixes most typical use-cases for type inference
against generic functions, it is intentionally incomplete, and it is
made in a way to limit implications to small scope.

This PR has essentially three components (better infer, better solve,
better apply - all three are needed for this MVP to work):
* A "tiny" change to `constraints.py`: if the actual function is
generic, we unify it with template before inferring constraints. This
prevents leaking generic type variables of actual in the solutions
(which makes no sense), but also introduces new kind of constraints `T
<: F[S]`, where type variables we solve for appear in target type. These
are much harder to solve, but also it is a great opportunity to play
with them to prepare for single bin inference (if we will switch to it
in some form later). Note unifying is not the best solution, but a good
first approximation (see below on what is the best solution).
* New more sophisticated constraint solver in `solve.py`. The full
algorithm is outlined in the docstring for `solve_non_linear()`. It
looks like it should be able to solve arbitrary constraints that don't
(indirectly) contain "F-bounded" things like `T <: list[T]`. Very short
the idea is to compute transitive closure, then organize constraints by
topologically sorted SCCs.
* Polymorphic type argument application in `checkexpr.py`. In cases
where solver identifies there are free variables (e.g. we have just one
constraint `S <: list[T]`, so `T` is free, and solution for `S` is
`list[T]`) it will apply the solutions while creating new generic
functions. For example, if we have a function `def [S, T] (fn:
Callable[[S], T]) -> Callable[[S], T]` applied to a function `def [U]
(x: U) -> U`, this will result in `def [T] (T) -> T` as the return.

I want to put here some thoughts on the last ingredient, since it may be
mysterious, but now it seems to me it is actually a very well defined
procedure. The key point here is thinking about generic functions as
about infinite intersections or infinite overloads. Now reducing these
infinite overloads/intersections to finite ones it is easy to understand
what is actually going on. For example, imagine we live in a world with
just two types `int` and `str`. Now we have two functions:
```python
T = TypeVar("T")
S = TypeVar("S")
U = TypeVar("U")

def dec(fn: Callable[[T], S]) -> Callable[[T], S]: ...
def id(x: U) -> U: ...
```
the first one can be seen as overload over
```
((int) -> int) -> ((int) -> int)  # 1
((int) -> str) -> ((int) -> str)  # 2
((str) -> int) -> ((str) -> int)  # 3
((str) -> str) -> ((str) -> str)  # 4
```
and second as an overload over
```
(int) -> int
(str) -> str
```
Now what happens when I apply `dec(id)`? We need to choose an overload
that matches the argument (this is what we call type inference), but
here is a trick, in this case two overloads of `dec` match the argument
type. So (and btw I think we are missing this for real overloads) we
construct a new overload that returns intersection of matching overloads
`# 1` and `# 4`. So if we generalize this intuition to the general case,
the inference is selection of an (infinite) parametrized subset among
the bigger parameterized set of intersecting types. The only question is
whether resulting infinite intersection is representable in our type
system. For example `forall T. dict[T, T]` can make sense but is not
representable, while `forall T. (T) -> T` is a well defined type. And
finally, there is a very easy way to find whether a type is
representable or not, we are already doing this during semantic
analyzis. I use the same logic (that I used to view as ad-hoc because of
lack of good syntax for callables) to bind type variables in the
inferred type.

OK, so here is the list of missing features, and some comments on them:
1. Instead of unifying the actual with template we should include
actual's variables in variable set we solve for, as explained in
#5738 (comment). Note
however, this will work only together with the next item
2. We need to (iteratively) infer secondary constraints after linear
propagation, e.g. `Sequence[T] <: S <: Sequence[U] => T <: U`
3. Support `ParamSpec` (and probably `TypeVarTuple`). Current support
for applying callables with `ParamSpec` to generics is hacky, and kind
of dead-end. Although `(Callable[P, T]) -> Callable[P, List[T]]` works
when applied to `id`, even a slight variation like `(Callable[P,
List[T]]) -> Callable[P, T]` fails. I think it needs to be re-worked in
the framework I propose (the tests I added are just to be sure I don't
break existing code)
4. Support actual types that are generic in type variables with upper
bounds or values (likely we just need to be careful when propagating
constraints and choosing free variable within an SCC).
5. Add backtracking for upper/lower bound choice. In general, in the
current "Hanoi Tower" inference scheme it is very hard to backtrack, but
in in this specific choice in the new solver, it should be totally
possible to switch from lower to upper bound on a previous step, if we
found no solution (or `<nothing>`/`object`).
6. After we polish it, we can use the new solver in more situations,
e.g. for return type context, and for unification during callable
subtyping.
7. Long term we may want to allow instances to bind type variables, at
least for things like `LRUCache[[x: T], T]`. Btw note that I apply force
expansion to type aliases and callback protocols. Since I can't
transform e.g. `A = Callable[[T], T]` into a generic callable without
getting proper type.
8. We need to figure out a solution for scenarios where non-linear
targets with free variables and constant targets mix without secondary
constraints, like `T <: List[int], T <: List[S]`.

I am planning to address at least majority of the above items, but I
think we should move slowly, since in my experience type inference is
really fragile topic with hard to predict long reaching consequences.
Please play with this PR if you want to and have time, and please
suggest tests to add.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong false-positive mypy gave an error on correct code priority-0-high topic-type-variables
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants