Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy does not recognize conversion between fixed length list and tuple #7509

Open
boompig opened this issue Sep 13, 2019 · 11 comments
Open

mypy does not recognize conversion between fixed length list and tuple #7509

boompig opened this issue Sep 13, 2019 · 11 comments

Comments

@boompig
Copy link

boompig commented Sep 13, 2019

  • Are you reporting a bug, or opening a feature request?

I believe this is a bug

  • Please insert below the code you are checking with mypy,
    or a mock-up repro if the source is private. We would appreciate
    if you try to simplify your case to a minimal repro.

Minimal example to reproduce:

from typing import List, Tuple


def get_point_as_list() -> List[int]:
    return [1, 2]


def point_to_tuple(pt: List[int]) -> Tuple[int, int]:
    assert len(pt) == 2
    return tuple(pt)


if __name__ == "__main__":
    l = get_point_as_list()
    p = point_to_tuple(l)
    print(repr(p))
  • What is the actual behavior/output?

mypy test.py

test.py:10: error: Incompatible return value type (got "Tuple[int, ...]", expected "Tuple[int, int]")
  • What is the behavior/output you expect?

mypy should evaluate this as correct code

  • What are the versions of mypy and Python you are using?
    Do you see the same issue after installing mypy from Git master?

$ mypy --version
mypy 0.720

$ python --version
Python 3.7.3

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 16, 2019

This is a design limitation of mypy. Mypy currently doesn't even try to keep track of the lengths of list objects. However, issues similar to this come up every once in a while, and this is something mypy might support at some point in the future.

You can work around this limitation by using a cast or a # type: ignore comment.

@jamesohortle
Copy link

Is there an issue for Sequences of fixed length that I can watch for developments? I am also having the same issues and can see a few use cases (e.g., points in Euclidean space, etc). I mostly just use Sequence[<type>, ...] for my own code.

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 18, 2019

We can use this issue to track both fixed-length lists and sequences for now.

@simsa-st
Copy link

simsa-st commented Jun 5, 2020

I would also appreciate this feature. My use-case is even simpler, converting fixed-length tuples to fixed-length tuples. Specifically I would like this to work:

from typing import Tuple

t: Tuple[int, int] = (0, 0)
t = tuple(x for x in t)

@NeilGirdhar
Copy link
Contributor

This comes up a lot when you're splitting strings:

    return dict(kv.split('=') for kv in ['a=b', 'cdf=ghi'])

It might be nice to have a special split method on string like split_exact(sep: str, n: int) -> SizedTuple[n, str] that somehow binds the n from the call to the SizedTuple. My proposal is impossible unless the issue of fixed length lists or tuples is solved.

@thomasgilgenast
Copy link

I have run into this as well and wanted to summarize my understanding of the three possible workarounds based on the comments from @JukkaL, @simsa-st, and @NeilGirdhar above.

Rewrite code to avoid lists and generators

A workaround that can be used even in the absence of SizedTuple is to manually write a fixed-size version of the size-naive function you were previously calling. For example, to handle @NeilGirdhar's string splitting use case:

def split_2(s: str, on: str) -> Tuple[str, str]:
    pieces = s.split(on)
    return (pieces[0], pieces[1])

def h() -> Dict[str, str]:
    return dict(split_2(kv, '=') for kv in ['a=b', 'cdf=ghi'])

One caveat here is that the return value of str.split() is not sized, so this will not catch the IndexError thrown when the string you are splitting does not have any = in it. I don't think this kind of IndexError is possible to catch before runtime, though.

Similarly when sorting or doubling every element in a tuple, you can do:

T = TypeVar('T', str, int, float) 

def sort_2(x: Tuple[T, T]) -> Tuple[T, T]:
    if x[0] <= x[1]:
        return x
    return (x[1], x[0])

def double_2(x: Tuple[T, T]) -> Tuple[T, T]:
    #return tuple(y * 2 for y in x)  # can't use generator in mypy
    return (x[0] * 2, x[1] * 2)

def g(x: Tuple[str, str]) -> Tuple[str, str]:
    return sort_2(double_2(x))

In this example sort_2() avoids creating a list unlike the builtin sorted(), and double_2() avoids using a generator (which is I think where @simsa-st's example loses the size information).

In summary, neither lists nor generators have lengths in mypy, but it is often possible to rewrite code so that it avoids using lists or generators and uses only tuples (which can have lengths in mypy).

Ignore the mypy error via comment or cast

An alternative perspective is that sizing is "too much work" or "too unpythonic" to implement this way (we do love generator expressions after all), and so instead of the previous example, we can write (based on @JukkaL's suggestion):

def g_ignore(x: Tuple[str, str]) -> Tuple[str, str]:
    return tuple(sorted(y * 2 for y in x))  # type: ignore

or

def g_ignore(x: Tuple[str, str]) -> Tuple[str, str]:
    return cast(Tuple[str, str], tuple(sorted(y * 2 for y in x)))

These still gain the "benefits" of a sized return type, i. e.

def f() -> str:
    return g_ignore(('t', 's'))[2]

still causes mypy to throw error: Tuple index out of range as expected.

My preliminary understanding is that the main drawback of ignore/cast is that each one you write creates the possibility of false negatives occuring in the future whenever the surrounding code is modified.

Accept the unsized tuple type

If you don't care about catching the index out of range error, you can write:

def g_nosize(x: Tuple[str, ...]) -> Tuple[str, ...]:
    return tuple(sorted(y * 2 for y in x))

which passes mypy but fails to catch the index out of range error in f().

This workaround does not apply to @NeilGirdhar's use case, because dict() does not accept an unsized input.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 5, 2020

@thomasgilgenast That's a good summary of what we can do right now, but we don't want to work around this problem. We want mypy to keep track of sequence lengths. We want:

x: Sequence[int]
if len(x) == 2:
    y: Tuple[int, int] = tuple(x)

We want this to happen without any casting or ignoring or auxilliary functions.

@dandersson
Copy link

This comes up a lot when you're splitting strings:

    return dict(kv.split('=') for kv in ['a=b', 'cdf=ghi'])

Yes, I found this thread after having attempted to simplify

    return {k: v for k, v in (var.split('=', maxsplit=1) for var in env)}

into

    return dict(var.split('=', maxsplit=1) for var in env)

and then saw the Mypy-complaints.

So yet another workaround that passes at least Mypy 0.720 checks is to go via a dictionary comprehension with tuple unpacking like the original form above. That will instead trigger a Pylint warning:

R1721: Unnecessary use of a comprehension (unnecessary-comprehension)

so for Pylint users not much will be won in any case.

stevearc added a commit to stevearc/pynvim that referenced this issue Jul 26, 2021
This adds coverage for most of the public API, but there are still some
areas that aren't covered yet.
* script_host.py is ignored entirely
* host.py is ignored entirely
* the msgpack_rpc submodule has partial coverage
* there are some Any annotations sprinkled around on some of the more
  complex functions

Funtionality should remain largely unchanged. Notable exceptions are:
* Buffer.mark() and Window.cursor now return a tuple instead of a list.
  This is because there is currently no type for a fixed-size list
  (python/mypy#7509)
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Apr 24, 2022
### **SUMMARY:**
It is unnecessary to perform `n + 1` calls to `cast` (one cast for each parameter name-value pair and one cast for the filter generator itself) in a dictionary comprehension in an effort to avoid mypy errors.

Previously, the `cast` to `Tuple[str, str]` was necessary to prevent mypy from complaining that we are trying to create a dictionary out of lists as opposed to tuples (see the mypy issue [here](python/mypy#7509)). However, a `cast` is both adding unnecessary overhead due to the function call and should generally only be used when a linter is unable to properly infer the type of a variable, not to "lie" to it about the type. We can avoid this by instead using a generator within the dictionary comprehension and then indexing into it twice to produce tuples of size 2; mypy recognizes this as a valid dictionary initialization.

The above change is much more performant than the previous version of the code. Timing the two versions of the dictionary construction yielded the following results:
```
>python -m timeit -s "from typing import cast, Dict, Tuple, Iterable" -n 100000 -p "dict(cast(Tuple[str, str], pair.split('=')) for pair in cast(Iterable[str], filter(None, 'rank=3&world_size=5'.split('&'))))"
100000 loops, best of 5: 2.66 usec per loop

>python -m timeit -n 100000 -p "dict((pair[0], pair[1]) for pair in (pair.split('=') for pair in filter(None, 'rank=3&world_size=5'.split('&'))))"
100000 loops, best of 5: 1.09 usec per loop
```

The `cast` to `Iterable[str]` is similarly a "lie" that is not necessary. It is best to be as transparent as possible to the linter rather than using `cast` to eliminate errors. This actually does not even produce any mypy errors when removed in isolation from the other changes.

Further, it is good practice to type hint the return value of a function rather than specifying the type of the return value inside the function. Thus, the unnecessary intermediate variable `query_dict` inside `_query_to_dict` was removed, and the type hint of the return value was moved to the function declaration.

The type of the argument `query` is specified as `str`.

### **EDITS (additional commits):**
[The sole type hint for `query_dict` (in `_env_rendezvous_handler`) was removed to match all other functions in the file.](76d78bf)

[Incorrect typing is fixed for _env_rendezvous_handler typing so that `rank`, `world_size`, `master_port`, and `master_addr` are specified to be `int`, `int`, `int`, and `str`, respectively.](3cc5844)
Pull Request resolved: #75959
Approved by: https://github.com/kumpera, https://github.com/mrshenli
facebook-github-bot pushed a commit to pytorch/pytorch that referenced this issue Apr 26, 2022
)

Summary:
### **SUMMARY:**
It is unnecessary to perform `n + 1` calls to `cast` (one cast for each parameter name-value pair and one cast for the filter generator itself) in a dictionary comprehension in an effort to avoid mypy errors.

Previously, the `cast` to `Tuple[str, str]` was necessary to prevent mypy from complaining that we are trying to create a dictionary out of lists as opposed to tuples (see the mypy issue [here](python/mypy#7509)). However, a `cast` is both adding unnecessary overhead due to the function call and should generally only be used when a linter is unable to properly infer the type of a variable, not to "lie" to it about the type. We can avoid this by instead using a generator within the dictionary comprehension and then indexing into it twice to produce tuples of size 2; mypy recognizes this as a valid dictionary initialization.

The above change is much more performant than the previous version of the code. Timing the two versions of the dictionary construction yielded the following results:
```
>python -m timeit -s "from typing import cast, Dict, Tuple, Iterable" -n 100000 -p "dict(cast(Tuple[str, str], pair.split('=')) for pair in cast(Iterable[str], filter(None, 'rank=3&world_size=5'.split('&'))))"
100000 loops, best of 5: 2.66 usec per loop

>python -m timeit -n 100000 -p "dict((pair[0], pair[1]) for pair in (pair.split('=') for pair in filter(None, 'rank=3&world_size=5'.split('&'))))"
100000 loops, best of 5: 1.09 usec per loop
```

The `cast` to `Iterable[str]` is similarly a "lie" that is not necessary. It is best to be as transparent as possible to the linter rather than using `cast` to eliminate errors. This actually does not even produce any mypy errors when removed in isolation from the other changes.

Further, it is good practice to type hint the return value of a function rather than specifying the type of the return value inside the function. Thus, the unnecessary intermediate variable `query_dict` inside `_query_to_dict` was removed, and the type hint of the return value was moved to the function declaration.

The type of the argument `query` is specified as `str`.

### **EDITS (additional commits):**
[The sole type hint for `query_dict` (in `_env_rendezvous_handler`) was removed to match all other functions in the file.](76d78bf)

[Incorrect typing is fixed for _env_rendezvous_handler typing so that `rank`, `world_size`, `master_port`, and `master_addr` are specified to be `int`, `int`, `int`, and `str`, respectively.](3cc5844)

Pull Request resolved: #75959
Approved by: https://github.com/kumpera, https://github.com/mrshenli

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d65ab9a689bc86fbdf589546dc47c69bee3bb971

Reviewed By: seemethere

Differential Revision: D35902000

Pulled By: seemethere

fbshipit-source-id: 0f7b357e4aca35945eef576ee13e047982de1e9e
@TomFryers
Copy link

Another similar nice thing to catch would be

def f(c: tuple[list[int], list[int], list[int]]):
    ...

f(tuple([1, 2] for _ in range(3)))

recognising that comprehensions into tuple over range(n) have length n.

@Phil-Barber
Copy link

Similarly:

StrictLenIntTuple = tuple[int, int, int, int]

def func(some_tuple: StrictLenIntTuple) -> None:
    return None

str_tuple = ('a', 'bc', 'def', 'ghij')
int_tuple = tuple(map(len, str_tuple))

func(int_tuple)  # Argument 1 to "func" has incompatible type "Tuple[int, ...]"; expected "Tuple[int, int, int, int]"

@NeilGirdhar
Copy link
Contributor

@Phil-Barber Great example, but maybe that should be a separate issue? Since something like this would ideally work:

from typing import TypeVar

T = TypeVar('T')
def f(x: T) -> T: ...

class X: pass
class Y: pass
class Z: pass

x = (X(), Y(), Z())
y = tuple(map(f, x))
reveal_type(y)  # Ideally: tuple[X, Y, Z]

In other words, it doesn't just need to keep track of a count here.

justinmk pushed a commit to stevearc/pynvim that referenced this issue Jul 14, 2023
This adds coverage for most of the public API, but there are still some
areas that aren't covered yet.
* script_host.py is ignored entirely
* host.py is ignored entirely
* the msgpack_rpc submodule has partial coverage
* there are some Any annotations sprinkled around on some of the more
  complex functions

Funtionality should remain largely unchanged. Notable exceptions are:
* Buffer.mark() and Window.cursor now return a tuple instead of a list.
  This is because there is currently no type for a fixed-size list
  (python/mypy#7509)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants