Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce typing.STRICTER_STUBS #1096

Open
not-my-profile opened this issue Mar 1, 2022 · 40 comments
Open

Introduce typing.STRICTER_STUBS #1096

not-my-profile opened this issue Mar 1, 2022 · 40 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@not-my-profile
Copy link

not-my-profile commented Mar 1, 2022

Some functions can return different types depending on passed arguments. For example:

  • open(name, 'rb') returns io.BufferedReader, whereas
  • open(name, 'wb') returns io.BufferedWriter.

The typeshed accurately models this using @typing.overload and typing.Literal. However there is the case that the argument value deciding the return type cannot be determined statically, for example:

def my_open(name: str, write: bool):
    with open(name, 'wb' if write else 'rb') as f:
        content = f.read()

In this case typeshed currently just claims that open returns typing.IO[Any], so content ends up having the type Any, resulting in a loss of type safety (e.g. content.startswith('hello') will lead to a runtime error if the file was opened in binary mode, but type checkers won't be able to warn you about this because of Any).

While typeshed could theoretically just change the return type to typing.IO[Union[str, bytes]], that would force all existing code bases that currently rely on Any to type check to update their code, which is of course unacceptable.

When starting a new project I however want the strictest type stubs possible. I explicitly do not want standard library functions to return unsafe values like Any (or the a bit less unsafe AnyOf suggested in #566), when the return types can be modeled by a Union.

I therefore propose the introduction of a new variable typing.STRICTER_STUBS: bool, that's only available during type checking.

Which would allow typeshed to do the following:

if typing.STRICTER_STUBS:
    AnyOrUnion = typing.Union
else:
    AnyOrUnion = typing.Any

Ambiguous return types could then be annotated as e.g. -> typing.IO[AnyOrUnion[str, bytes]].

This would allow users to opt into stricter type stubs, if they so desire, without forcing changes on existing code bases.

CC: @AlexWaygood, @JelleZijlstra, @srittau, @hauntsaninja, @rchen152, @erictraut

P.S. Since I have seen Union return types being dismissed because "the caller needs to use isinstance()", I want to note that this is not true, if the caller wants to trade type safety for performance, they can always just add an explicit Any annotation to circumvent the runtime overhead of isinstance. Union return types force you to either handle potential type errors or explicitly opt out of type safety, which I find strongly preferable to lack of type safety by default.

@not-my-profile not-my-profile added the topic: feature Discussions about new features for Python's type annotations label Mar 1, 2022
@AlexWaygood
Copy link
Member

AlexWaygood commented Mar 1, 2022

Interesting idea. How would users opt into the "stricter" stubs? Do you propose that type checkers add a new config setting?

I'm not a fan of reusing the TYPE_CHECKING constant, by the way -- I'd prefer to see a new constant introduced in typing.py. But that's a minor point.

@not-my-profile not-my-profile changed the title Introduce typing.TYPE_CHECKING == "strict" Introduce typing.STRICT_STUBS Mar 1, 2022
@not-my-profile
Copy link
Author

not-my-profile commented Mar 1, 2022

Yes, type checkers would need to introduce a new config setting. I'd propose --stricter-stubs as a uniform command-line flag.

I'm not a fan of reusing the TYPE_CHECKING constant, by the way

Yes, good point. I updated the proposal to typing.STRICTER_STUBS with a boolean value. It would actually only need to be available during type checking, so no change to typing.py should be necessary (but maybe it's still a good idea?).

@not-my-profile not-my-profile changed the title Introduce typing.STRICT_STUBS Introduce typing.STRICTER_STUBS Mar 1, 2022
@erictraut
Copy link
Collaborator

Here are two other potential solutions to this problem.

  1. Formally introduce an Unknown type and use it in place of Any in these cases. Pyright internally tracks an Unknown as a form of Any except that in "strict" mode, it will complain about the use of Unknown. This idea is borrowed from TypeScript, which formally exposes an unknown type.

  2. Introduce a OneOf type an an unsafe union — one where only one subtype needs to be compatible rather than all subtypes. This has been discussed previously but didn't gain enough support to lead to a PEP.

Of these three options, I think the OneOf provides the most utility because it provides much more information for language servers. For example, it allows good completion suggestions to be presented to users. It can also be flagged as unsafe in the strictest modes.

@gvanrossum
Copy link
Member

Why do we need a flag? In the OP's example it seems a pretty good idea to change the typechecker behavior to return io.BufferedReader | io.BufferedWriter. (A better example would actually be open(name, 'rb' if binary else 'r'), which could return io.BufferedReader | io.TextIO.)

@not-my-profile
Copy link
Author

not-my-profile commented Mar 1, 2022

@gvanrossum The type checker does not know about the return types of open. These are only encoded in the type stubs. In this case the stdlib stubs of typeshed. There are currently many projects that rely on Any return types of typeshed, changing the return type in typeshed would break the type checking for these projects. Hence the flag.

@erictraut thanks for bringing up these alternatives. I think the appeal of the flag is that you can set it and forget it, whereas the introduction of a new type would require Python programmers to learn about the type ... and for what? I personally wouldn't want to deal with some functions returning Union and some functions returning AnyOf (besides I find the latter very confusing semantically because the definition of a union is that it can be any one of its arguments, having both AnyOf and Union is bound to result in confusion). Yes OneOf would allow richer LSP autocompletion, however I am not sure if providing autocompletion is such a good idea if the suggestions might lead to type errors during runtime.

@JelleZijlstra
Copy link
Member

changing the return type in typeshed would break the type checking for these projects

Are you sure? It may be worth checking whether that's really common in real code. mypy-primer should help.

I like the OneOf idea. Type checkers could add a flag that makes the checker treat the type like a regular Union, which would provide the type safety @not-my-profile asks for.

@gvanrossum
Copy link
Member

@gvanrossum The type checker does not know about the return types of open. These are only encoded in the type stubs. In this case the stdlib stubs of typeshed.

I'm well aware. :-)

My actual proposal was actually more complex. If we have stubs containing e.g.

@overload
def foo(x: int) -> list[int]: ...
@overload
def foo(y: str) -> list[str]: ...

and we have a call like this

def f(flag: bool):
    a = foo(0 if flag else "")
    reveal_type(a)

then the revealed type could be list[int] | list[str]. But this would require the inferred type of 0 if flag else "" to be int | str, whereas ATM (in mypy at least) the inferred type there is object, being the nearest common (named) type in the MROs of int and str.

There are currently many projects that rely on Any return types of typeshed, changing the return type in typeshed would break the type checking for these projects. Hence the flag.

But in this example the return type is not Any. The call in my example is rejected; however if we help the type checker by forcing a union it will actually do the right thing:

from typing import overload

@overload
def foo(a: int) -> list[int]: ...

@overload
def foo(a: str) -> list[str]: ...

def foo(a): pass

def f(flag: bool):
    a: int | str = 0 if flag else ""
    reveal_type(a)  # int | str
    b = foo(a)
    reveal_type(b)  # list[int] | list[str]

When I tried the OP's example it seems that because the mode expression's type is inferred as str the fallback overload is used, and that's typing.IO[Any]. if I try to help a bit by adding an explicit type, like this:

from typing import Literal

def my_open(name: str, write: bool):
    mode: Literal['rb', 'wb'] = 'wb' if write else 'rb'
    with open(name, mode) as f:
        reveal_type(f)  # typing.IO[Any]

I still get the fallback. This seems to be due to some bug (?) in the typeshed stubs -- if I add buffering=0 it gets the type right:

def my_open(name: str, write: bool):
    mode: Literal['rb', 'wb'] = 'wb' if write else 'rb'
    with open(name, mode, buffering=0) as f:
        reveal_type(f)  # io.FileIO

(@JelleZijlstra @srittau Do you think the former result is a bug in the stubs? Those overloads have buffering: int without a default.)

@srittau
Copy link
Collaborator

srittau commented Mar 1, 2022

I like the OneOf idea. Type checkers could add a flag that makes the checker treat the type like a regular Union, which would provide the type safety @not-my-profile asks for.

Exactly what I was thinking. #566 (AnyOf/OneOf) would work very well with a strictness check and we could get rid of those pesky Any return typesthat no one likes.

@not-my-profile
Copy link
Author

Are you sure? It may be worth checking whether that's really common in real code. mypy-primer should help.

I am unsure how representative 100 projects can be for the far larger Python ecosystem. Anyway I tried it with python/typeshed#7416 and it interestingly enough resulted in some INTERNAL ERRORs for mypy.

(AnyOf/OneOf) would work very well with a strictness check

When should a function return Union and when should a function return OneOf?

@gvanrossum
Copy link
Member

Exactly what I was thinking. #566 (AnyOf/OneOf) would work very well with a strictness check and we could get rid of those pesky Any return typesthat no one likes.

But that's not what's going on in the OP's example (see my message).

@srittau
Copy link
Collaborator

srittau commented Mar 1, 2022

@gvanrossum It's late here and my brain might be a bit mushy. But aren't those orthogonal issues? AnyOf/OneOf can be useful generally in stubs in many situations. And there seems to be a problem with the way type checkers infer a base type instead of a union type in some situations.

Also if I remember correctly, there are some issues with open() when using mypy, since the latest version still has a plug for it, which overrides the stubs for open().

Edit: I will try to give more coherent thoughts tomorrow.

@gvanrossum
Copy link
Member

IIRC AnyOf has been proposed before but always met fierce resistance from mypy core devs.

I just wanted to get to the root of the motivation of the issue as presented by the OP, and came up with some interesting (to me) facts.

The plugin for open has (finally) been deleted from the mypy sources (python/mypy#9275) but that was only three weeks ago, so it's probably not yet in the latest release (0.931). The results for my examples are the same on the master branch and with 0.931 though.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Mar 1, 2022

Adding a default value to buffering in the overload you link sounds correct to me — good spot!

There's interest in getting mypy to use a meet instead of a join for ternary, e.g. see python/mypy#12056. I don't think anyone is too opposed philosophically.

i.e. I agree that OP's specific complaint would be fixed if mypy inferred Literal["wb", "rb"] + typeshed adds a default value to buffering in that overload.

OP's general complaint still stands. If you tweak the example to take mode: str, you'd get an Any that I suspect OP would not want. AnyOf + flag to treat AnyOf as Union would work well for this case.

My recollection of past discussions of AnyOf isn't fierce resistance as much as inertia, since the benefits for type checking are fairly marginal (although of course, if substantial numbers of users would use an AnyOf = Union flag, such users would get meaningfully more type safety). The strong argument for AnyOf in my mind has been IDE like use cases, so I'd be curious to hear if @erictraut thinks it worth doing.

@erictraut
Copy link
Collaborator

I'm a big advocate of the benefits of static type checking, but I also recognize that the vast majority of Python developers (99%?) don't use type checking. Most Python developers do use language server features like completion suggestions, so the type information in typeshed stubs benefits them in other ways. I need to continually remind myself to think about the needs of these developers.

I agree that AnyOf has marginal value to static type checking scenarios. It has some value because it allows for optional stricter type checking, but the real value is for the other Python users who would see improved completion suggestions. So yes, I am supportive of adding something like AnyOf.

@JukkaL
Copy link
Contributor

JukkaL commented Mar 2, 2022 via email

@srittau
Copy link
Collaborator

srittau commented Mar 2, 2022

I also agree that AnyOf has marginal value for static type checking.

I still wonder why people think that. I believe that this clearly improves type safety in a lot of cases:

def foo() -> AnyOf[str, bytes]: ...
def bar(x: URL) -> None: ...

f = foo()
bar(f)  # will fail
f.schema  # will fail

with open(...) as f:
    f.write(pow(x, y))  # will fail

#566 now has links to over 20 other issues or PRs from typeshed, most of which state the type checking could be improved by AnyOf.

It also sounds like a complex feature that would take a lot of effort to implement,

Fortunately, as a stopgap, AnyOf could be treated like Any. This provides the same (lack of) type safety that the current situation offers, but would allow type checkers and other tooling to use AnyOf to its full benefit.

@AlexWaygood
Copy link
Member

AlexWaygood commented Mar 2, 2022

I agree with @srittau. I think AnyOf would have great utility for typeshed and be very frequently used.

The disadvantage of STRICTER_STUBS, as it was put forward in @not-my-profile's original profile, is that it's an "all or nothing" approach. Either a project opts into all the stricter stubs, everywhere, or they opt into none at all. That might limit adoption, as lots of projects that previously type-checked might find that they now have many errors.

The two ideas are in some ways complementary, however. If we had an AnyOf special form, type checkers might be able to optionally provide an option to treat these unsafe unions as strict unions, giving additional type safety to users who do want that. (@JelleZijlstra already touched on this idea in #1096 (comment).)

@not-my-profile
Copy link
Author

not-my-profile commented Mar 2, 2022

Ok, I like the idea of AnyOf but I really dislike the name.

Because AnyOf is essentially a type checking implementation detail. Not every type checker will support it, so having a function of an API return AnyOf feels really weird to me since you don't know which type checker (if any) the calling code is using. And I especially dislike that when you see Union and AnyOf for the first time it's unclear when you should use which.

I feel like we don't need a new type for this. It could just be a type checker setting. The code could be:

def get_animal(name) -> Union[Cat, Dog]: ...

Type checkers (and users) can then decide if they want to treat Union as:

  • Any
  • "AnyOf" (improving type safety over Any)
  • a strict Union (improving type safety even further)

@srittau
Copy link
Collaborator

srittau commented Mar 2, 2022

This would actually reduce type safety for users that don't use strict mode. Many functions already return strict unions where it's necessary to check the return value at runtime. It's especially common to see X | None. Making all of these non-strict by default would be a step backwards.

@not-my-profile
Copy link
Author

not-my-profile commented Mar 2, 2022

X | None could be treated as an exception.

Many functions already return strict unions where it's necessary to check the return value at runtime.

This is really the point I don't get. IMO it is not up to an API to decide the level of type safety for the caller. That is solely up to the API user.

@AlexWaygood
Copy link
Member

I agree that the name of AnyOf isn't ideal, but I do think we need a new special form for this, for the reasons @srittau sets out. Special-casing X | None would be confusing and counterintuitive.

@not-my-profile
Copy link
Author

not-my-profile commented Mar 2, 2022

Well I find having both AnyOf and Union to be confusing and counterintuitive. Both semantically as well as the weird idea of encoding type safety into APIs.

I don't see any problem with special casing None, after all null references are the billion-dollar mistake. So IMO None very much deserves special treatment, and we can take advantage of the existing typing.Optional to communicate that to the user.

def get_text() -> str | None: ...
"Hello " + get_text() # always error: Unsupported operand types for + ("str" and "None")

def get_name() -> str | bytes | None: ...
"Hello " + get_name() # always error:  Unsupported operand types for + ("str" and "None")

name = get_name()
assert name is not None # narrowing Optional[Union[str, bytes]] down to Union[str, bytes]

You would now get (depending on your type checker settings):

expression unions as Any unions as "any of" strict unions
name + 1 ✔️ ERROR ERROR
"Hello " + name ✔️ ✔️ ERROR

To avoid user confusion, I think when using unions as Any or unions as "any of", type checkers should reveal Union[str, bytes, None] as Optional[Union[str, bytes]] (and thus making it obvious that the Optional has to be unwrapped first.

@JukkaL
Copy link
Contributor

JukkaL commented Mar 2, 2022 via email

@JukkaL
Copy link
Contributor

JukkaL commented Mar 2, 2022 via email

@not-my-profile
Copy link
Author

not-my-profile commented Mar 2, 2022

Also having stubs use a different syntax or semantics than normal code for common things sounds quite confusing.

Sorry for the confusion, what I proposed in #1096 (comment) is meant to apply to all type checking (so regular code as well).

And I do not think that we would loose any expressiveness, quite the opposite instead. APIs could always just return an expressive Union (instead of being forced to return Any like currently for the sake of backwards compatibility) and type checkers could interpret Union however they see fit (with the sole requirement that None checking must always be enforced).

@JukkaL
Copy link
Contributor

JukkaL commented Mar 2, 2022

type checkers could interpret Union however they see fit (with the sole requirement that None checking must always be enforced).

Ah sorry, I misunderstood your proposal. I don't think that we can change the meaning of union types, and we must remain backward compatible. The existing union types are used by (hundreds of?) thousands of projects and supported by many tools. I think that the current definition of union types is perfectly fine, but they don't cover all the use cases well (in particular, legacy APIs not designed for type checking).

@not-my-profile
Copy link
Author

not-my-profile commented Mar 2, 2022

Thanks @JukkaL that is a very good observation ... that the root cause of the problem are legacy APIs that weren't designed for type checking!

With that in mind, at least my concern could be addressed by introducing a @typing.warning decorator, which could be used to annotate the overloads that return Any, e.g:

from typing import overload, warning, Literal

@overload
def open(file: str, mode: Literal['rb']) -> io.BufferedReader: ...
@overload
def open(file: str, mode: Literal['wb']) -> io.BufferedWriter: ...
@overload
@warning('cannot statically determine return type because `mode` is unknown')
def open(file: str, mode: str) -> Any: ...

This would also allow deprecated functions and functions that should not be used (but are included in typeshed because their absence would confuse users) to be annotated with warnings, so that type checkers can forward these warnings when the functions (or specific overloads) are used.

For example the standard library docs note:

Note that Loggers should NEVER be instantiated directly, but always through the module-level function logging.getLogger(name).

So typeshed could annotate:

class Logger(Filterer):
    @warning('should not be used, use `logging.getLogger` instead')
    def __init__(self, name: str, level: _Level = ...) -> None: ...

I think it would even make sense for the __getattr__ functions typeshed uses for incomplete stubs:

@warning('you have reached the end of this type stub')
def __getattr__(name: str) -> Any: ...

Because it is very easy to accidentally call such __getattr__ placeholders without noticing it, resulting in a potentially dangerous loss of type safety.

@erictraut
Copy link
Collaborator

@JukkaL makes a good point about AnyOf being complex and expensive to implement in its fullest form, and I agree that the value for static type checking probably doesn't justify the work involved.

I think there are three ways AnyOf could be interpreted by a type checker:

  1. Interpreted as Any. This would be equivalent to the the behavior today.
  2. Interpreted as a "weak union". As Jukka noted, this would be a lot of work to implement.
  3. Interpreted as a true Union. This could be used in the strictest type checking modes.

Interpretations 1 and 3 should be "free" to implement since they are already fully supported by mypy and other type checkers. Interpretation 2 would be expensive, but I think it would be fine for type checkers to ignore this mode.

Here's an idea that should be (relatively) cheap to implement. We could leverage PEP 646 to create a typeshed-specific internal class called _WeakUnion (or similar). It could take a variadic TypeVarTuple but also derive from Any. Here's how this might look:

_Ts = TypeVarTuple("_Ts")
class _WeakUnion(Generic[*_Ts], Any): ...

@overload
def int_or_float(x: Literal[True]) -> int: ...
@overload
def int_or_float(x: Literal[False]) -> float: ...
@overload
def int_or_float(x: bool) -> _WeakUnion[int, float]: ...

Type checkers that want to interpret _WeakUnion as a "true union" could do so in strict modes. Likewise, language servers could use the additional information in the _WeakUnion to provide good completion suggestions. Type checkers that don't know about _WeakUnion would continue to treat it as an Any.

Thoughts?

@not-my-profile
Copy link
Author

Couple of notes:

  • mypy apparently currently doesn't let you subclass Any: Class cannot subclass "Any" (has type "Any")
  • pytype apparently currently doesn't support TypeVarTuple
  • so type checkers that wanted to support this typeshed-only type would have to hard-code the full name of the type? what happens for third-party stubs when the stubs are moved outside of typeshed when e.g. a package becomes py.typed?

@JelleZijlstra
Copy link
Member

mypy does support subclassing from Any, but emits an error for it in strict mode. It should work fine once you type ignore that error or turn off the option for it.

However, mypy doesn't support any of PEP 646 at all.

hauntsaninja pushed a commit to hauntsaninja/typeshed that referenced this issue Mar 2, 2022
As pointed out by @gvanrossum in python/typing#1096

Improves type inference in cases when we know that mode is
OpenBinaryMode, but don't know anything more specific:
```
def my_open(name: str, write: bool):
    mode: Literal['rb', 'wb'] = 'wb' if write else 'rb'
    with open(name, mode) as f:
        reveal_type(f)  # previously typing.IO[Any], now typing.BinaryIO
```

You may be tempted into thinking this is some limitation of type
checkers. mypy does in fact have logic for detecting if we match
multiple overloads and union-ing up the return types of matched
overloads. The problem is the last overload interferes with this logic.
That is, if you remove the fallback overload (prior to this PR), you'd get
"Union[io.BufferedReader, io.BufferedWriter]" in the above example.
hauntsaninja pushed a commit to hauntsaninja/typeshed that referenced this issue Mar 2, 2022
As pointed out by @gvanrossum in python/typing#1096

Improves type inference in cases when we know that mode is
OpenBinaryMode, but don't know anything more specific:
```
def my_open(name: str, write: bool):
    mode: Literal['rb', 'wb'] = 'wb' if write else 'rb'
    with open(name, mode) as f:
        reveal_type(f)  # previously typing.IO[Any], now typing.BinaryIO
```

You may be tempted into thinking this is some limitation of type
checkers. mypy does in fact have logic for detecting if we match
multiple overloads and union-ing up the return types of matched
overloads. The problem is the last overload interferes with this logic.
That is, if you remove the fallback overload (prior to this PR), you'd get
"Union[io.BufferedReader, io.BufferedWriter]" in the above example.
@not-my-profile
Copy link
Author

not-my-profile commented Mar 4, 2022

class _WeakUnion(Generic[*_Ts], Any): ...

Afaik the unpack operator in subscript requires Python 3.11 which is a problem because mypy parses the source code with ast.parse, so I guess that means TypeVarTuple is out of the question?

I think we could just do something simple like:

_WeakUnion = Any

instead, since Any can be subscripted arbitrarily. While this does mean that type checkers wanting to support _WeakUnion would have to check the type name instead of the type type, I assume that type checkers already track the type names.

Edit: Apparently mypy doesn't like arguments given to type aliases (unlike pyright which doesn't complain).

@JelleZijlstra
Copy link
Member

class _WeakUnion(Generic[*_Ts], Any): ...

Afaik the unpack operator in subscript requires Python 3.11 which is a problem because mypy parses the source code with ast.parse, so I guess that means TypeVarTuple is out of the question?

We can use typing_extensions.Unpack instead.

@AlexWaygood
Copy link
Member

AlexWaygood commented Mar 4, 2022

Edit: Apparently mypy doesn't like arguments given to type aliases (unlike pyright which doesn't complain).

No, the issue is that Any is not subscriptable, so mypy is correctly raising an error in your python/typeshed#7437 PR.

>>> from typing import Any
>>> Any[str]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\typing.py", line 311, in inner
    return func(*args, **kwds)
  File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\typing.py", line 402, in __getitem__
    return self._getitem(self, parameters)
  File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\typing.py", line 424, in Any
    raise TypeError(f"{self} is not subscriptable")
TypeError: typing.Any is not subscriptable

@hmc-cs-mdrissi
Copy link

This feels a lot like feature flags with current any_of direction we're doing. Specifically it reminds me of this comment. I would be very interested in feature flags for new peps in general or features that are useful, but vary across type checkers (recursive types is only one in mind for me).

Main two I'm interested in currently are PEP 646 and recursive types, although I think having new flag for each new pep that introduces new types would allow experimentation/stub improvement faster for type checkers that support them earlier. Both would be useful to try in stubs but without feature flags my guess is it'll take a while for 646 given it's complexity. If we had feature flags we could do in a stub file in typeshed,

if typing.flags.any_of:
  from typing_extensions import AnyOf
  AnyOf = AnyOf
else:
  AnyOf: Any (or some other definition)

or for recursive types,

BasicJSON = str | float | None
if typing.flags.recursive:
  JSON = Mapping[str, JSON] | Sequence[JSON] | BasicJSON
else:
  JSON = Mapping[str, BasicJSON] | Sequence[BasicJSON] | BasicJSON | Mapping[str, Any] | Sequence[Any]

This way type checkers who understand recursion get a fully accurate type, while other type checks still have a reasonable approximation.

JelleZijlstra pushed a commit to python/typeshed that referenced this issue Mar 6, 2022
As pointed out by @gvanrossum in python/typing#1096

Improves type inference in cases when we know that mode is
OpenBinaryMode, but don't know anything more specific:
```
def my_open(name: str, write: bool):
    mode: Literal['rb', 'wb'] = 'wb' if write else 'rb'
    with open(name, mode) as f:
        reveal_type(f)  # previously typing.IO[Any], now typing.BinaryIO
```

You may be tempted into thinking this is some limitation of type
checkers. mypy does in fact have logic for detecting if we match
multiple overloads and union-ing up the return types of matched
overloads. The problem is the last overload interferes with this logic.
That is, if you remove the fallback overload (prior to this PR), you'd get
"Union[io.BufferedReader, io.BufferedWriter]" in the above example.

Co-authored-by: hauntsaninja <>
@JukkaL
Copy link
Contributor

JukkaL commented Mar 7, 2022

class _WeakUnion(Generic[*_Ts], Any): ...

This looks promising. This still is missing the ability to specify a non-Any type as the "fallback" type. For example, static type checkers could prefer str | Any, while language servers might use str | None. We could have an alternative form that supports an explicit fallback type:

class _WeakUnionFallback(Generic[_T, *_Ts], Any): ...

Now type checkers could treat _WeakUnionFallback[str | Any, str, None] as equivalent to str | Any, while language servers could understand it as _WeakUnion[str, None].

If we don't want to wait until PEP 646 is supported everywhere, perhaps we could just use a union type type argument, which would be interpreted as a "weak union"? So instead of writing _WeakUnion[str, None], we'd express it as _WeakUnion[str | None]. Also, instead of _WeakUnionFallback[str | Any, str, None], we'd have _WeakUnionFallback[str | Any, str | None].

@erictraut
Copy link
Collaborator

I like your idea of using a union form within _WeakUnion. That's a clever way to eliminate the dependency on PEP 646.

I don't see the need for an explicit fallback. Perhaps you could give a concrete example where this would be desirable?

I was thinking that the fallback should be based on type checker configuration and capabilities, not specified by the library author.

Pyright is both a language server and a type checker, but it has no "language server mode" where it evaluate types differently. The two pieces of functionality (LS and type checker) are intrinsically tied. Having different modes for LS and type checker would be confusing because type errors would be inconsistent with LS features like "hover text" where hovering over an identifier displays its type.

I was thinking that pyright would interpret _WeakUnion in one of two ways.

  1. In "basic" type checking mode: as an intersection between Any and union U
  2. In "strict" type checking mode: as union U

In both cases, the LS functionality would provide completion suggestions for all of the subtypes in union U.

Maybe in the future I would consider replacing the "basic" mode with a true "weak union" implementation, but as you pointed out, that would be a lot of work, and I'm not sure it's worth it.

@JukkaL
Copy link
Contributor

JukkaL commented Mar 8, 2022

I don't see the need for an explicit fallback. Perhaps you could give a concrete example where this would be desirable?

open is a potential example. Currently if the mode is not known statically, the return type is IO[Any]. This seems like the best type we can currently have, and it's clearly better than plain Any. For example, code like this will generate type errors, as expected:

def f(m: str) -> None:
    f = open('x', mode=m)
    f.bad_call()  # "IO[Any]" has no attribute "bad_call"
    "x" + f  # Invalid types "str" and "IO[Any]"

However, tools that properly support _WeakUnion would probably prefer _WeakUnion[TextIOWrapper, BufferedReader]. This would allow a more precise type for code like this, in addition to catching the errors in the earlier example:

def f(m: str) -> None:
    f = open('x', mode=m)
    s = f.read()
    reveal_type(s)  # _WeakUnion[str, bytes] (instead of Any)

Tools that support _WeakUnion could give better completion suggestions for s (e.g. s.startswith). If we'd have IO[Any] as the return type, s would have the type Any, and we can't give very useful completion suggestions.

Here I think that the best compromise for the return type of open would be _WeakUnionFallback[IO[Any], TextIOWrapper | BufferedReader]. This wouldn't make the return type less precise for tools that support weak unions and don't implement completion suggestions.

I was thinking that the fallback should be based on type checker configuration and capabilities, not specified by the library author.

In the above example, we could perhaps invent rules so that IO[Any] would be automatically inferred as the fallback for tools that don't support _WeakUnion properly. I think that this seems too complicated/ad-hoc to reason about, at least for non-expert typeshed contributors, and spelling the fallback out explicitly would increase clarity.

In "basic" type checking mode: as an intersection between Any and union U

Would this result in worse type checking results compared to IO[Any] in the above example? It seems that completion suggestions would be better, but type checking would be compromised, but I guess that depends on the exact semantic of the intersection type. I wonder if an intersection between IO[Any] and TextIOWrapper | BufferedReader would be a more precise type in this "basic" mode?

In "strict" type checking mode: as union U

Mypy could possibly also support this behavior behind a strictness flag.

@erictraut
Copy link
Collaborator

Thanks for the example. It's clear to me now, and I agree there's utility in adding an explicit fallback — at least in some cases. These are cases where the stub can provide a fallback that's more precise than Any but still not a full union of all possible return types.

@antonagestam
Copy link

Noting that a flag would also provide a way forward for stricter typing of json loads and dumps functions. I don't see how the unsafe union types proposed could help in that case. For future proofing, assuming that there are many places left that we'll want to stricten typing for, perhaps it should rather be a function than a constant?

if typing.feature("strict-json"): ...

Type checkers would implement configuration parameters to enable or disable features.

In a large code-base with lots of legacy Python, the fact that typeshed uses Any in many places is giving quite a lot of grief. Automating the process of adding type ignores is easy for us, and this enables us to gradually improve explicitly disabled type checking, rather than constantly running into much less visible implicitly disabled type checking, caused by Any.

@antonagestam
Copy link

The introduction of a function would also allow third party packages to provide such conditional strictness levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

10 participants