for-loops that run at least once and "Possibly unbound" #2033

sorcio · 2021-06-25T16:35:48Z

sorcio
Jun 25, 2021

This has been discussed before as an issue, but I want to propose more general discussion about it.

def foo():
    for _ in range(10):
        x = 20
    print(x)  # "x" is possibly unbound

As a programmer, I can easily see that the Pyright error about x being unbound is not useful, because reading this code I know that the for-loop body will be executed at least once.

Given the nature of Python, though, it's impossible in general to determine if an iterator will yield at least one value, or stop before the first iteration. A similar form of this has been discussed in #844 and marked as "as designed" with the following motivation:

Unfortunately, there's no way for a type checker to know whether a loop will be executed one or more times. It's completely dependent on the internal logic of the class that implements the iterator.

One could argue that a type checker should have internal knowledge about internal implementations of certain built-in classes like list, but that's a very slippery slope, and I've avoided it so far. It's one of the core principles that I've stuck to throughout the development of Pyright.

As a workaround, I can convince Pyright that a binding of x exists in all cases, e.g. by setting it before the loop. Unfortunately this might have other consequences:

def bar():
    x = None
    for _ in range(10):
        x = 20
    print(x + 2)  # Operator "+" not supported for "None"

In this example I could have initialized the variable with an int; but some types have non-trivial constructors. Using None allows me to save the day with an assertion:

def baz():
    x = None
    for _ in range(10):
        x = 20
    assert isinstance(x, int)
    print(x + 2)

The workaround is good enough, and possibly preferable to a # type: ignore annotation which would silence all checks, and need to be repeated for every usage of x. I still feel it's awkward, because it adds noise (for a reader, the assertion is not adding information, and can be confusing), forces code to adapt to the logic of the type checker, and introduces runtime behavior.

I would like to discuss this more in general. Is there a way to deal with this, or should this be considered a necessary shortcoming of type checkers?

For comparison: Mypy doesn't have an equivalent check. Pytype doesn't seem to complain either. Pyright is the strictest about this, which I like in general. But it also means that only Pyright has an incentive to care about this.

Brainstorming some possible approaches:

Include information in the type system about iterators that yield at least once. This sounds hard, and would not cover all cases. I see it working with some generator functions and not others. In the range() case the behavior depends on the constructor. For collections, it depends on content (tracking an empty collection is also possible in theory but doesn't cover all cases and is hard).
Annotate loops whose body is certain to be run at least once. After all, maybe this is more about control flow than it is about iterators. If the annotation is a comment, it doesn't have runtime cost. I don't know precedents for this and it feels a bit inelegant. But also potentially more informative than the assertion in the above workaround.

def foo():
    for _ in range(10):
        # type: block-always-executed
        x = 20
    print(x)

Special-casing. Collection literals, range objects, count(), repeat(), and possibly a couple more types are common enough to cover many use cases. I see the value, but also the slippery slope argument.

I don't care particularly for any of the above solutions. Pragmatic arguments could be made for most points (excluding 1 maybe?) if many users need this, but I'm not strongly convinced any is significantly better than the status quo.

Anything else?

Extra question for fun: can Python change in a way to make this easier? What language change would enable this? It sounds remote right now, but there is a lot of ongoing work on static analysis and compilation. Is there a change that could make these efforts more valuable? This is currently not within Mypy's scope, but it could be if the language (or the optional typing system) made this possible.

erictraut · 2021-06-25T17:16:34Z

erictraut
Jun 25, 2021
Maintainer

As you mentioned, mypy doesn't track unbound variables or implement any checks in this area. If you feel that this diagnostic check isn't providing enough value, you can disable it in pyright (reportUnboundVariable = none), and it will act more like mypy in this respect.

If you do find value in unbound variable checking, then my recommendation is to write your code in a way where variables are provably bound in all code paths.

Python is unlike most programming languages in that it allows local variables to appear within a scope but have no value bound to them. In all other languages I can think of (C, C++, Javascript, TypeScript, Java, C#, Basic, Julia, etc.), you would be forced to assign a value to a variable on all code paths, and you would receive a compiler/interpreter error if you failed to do so. So if you adapt your usage of Python to be like all other programming languages in this respect, you won't run into any problems with unbound variable errors.

So I guess I look at this not as a "workaround" but as the preferred way to write robust, bug-free code — something that other programming languages would enforce but Python does not.

1 reply

sorcio Jun 25, 2021
Author

Hello Eric,

Thanks for your insightful response! I appreciate the value of unbound variable checks. It's one of the points that convinced me to adopt Pyright.

You are entirely right that unbound local variables are an uncommon design choice among programming languages. The closest that comes to mind is initialization in C/C++. While it's not the same as binding, from the point of view of static analysis it presents a similar challenge. Modern compilers can raise a warning when they encounter a code path that uses a variable before initialization. At a cursory glance, I think neither gcc nor clang are currently able to prove that a loop body will or will not be executed. It looks like they always assume it will be executed.

Rust is my benchmark for what can be pragmatically done with a static checker, and indeed it behaves similarly to Pyright (example in rust-playground). This is a hard compile error, because Rust needs to prove certain properties.

By contrast, I think Python developed some specific idioms that leverage its characteristic of allowing unbound variables, and I suspect code like the one in my post would be seen as idiomatic. I think this bears some weight.

write your code in a way where variables are provably bound in all code paths

I'm contrasted about this. Your point is excellent. But it could be argued that iterating over a literal non-empty collection, or a range with statically-known size can be proven to yield at least one value. This is part of the language definition and it can be determined statically—in theory. Writing code in a different way to satisfy the type checker is a limitation, and not necessarily one that adds to robustness and quality. Again, pragmatically this is probably more of a small nuisance.

To reiterate, I don't want to push for any specific solution. I wanted to elicit some discussion and I'm very grateful for your response. My feeling is there is more ground to cover, but I'm happy with the current state and whatever decision you have taken.

JelleZijlstra · 2021-06-25T17:54:14Z

JelleZijlstra
Jun 25, 2021

For what it's worth, I did implement this sort of logic in my company's typechecker, pyanalyze. There is code in https://github.com/quora/pyanalyze/blob/ae4292f9b3c02616912712551d08e8bff3eb69aa/pyanalyze/name_check_visitor.py#L2910 for figuring out how many elements are in an iterator. It was enough to make our possibly unbound variable check work without too many false positives, but the code is definitely not completely sound.

1 reply

sorcio Jun 25, 2021
Author

This is very interesting, thank you! If it can reduce false positives without introducing false negatives, it's definitely useful.

ronf · 2021-10-10T17:07:16Z

ronf
Oct 10, 2021

I'm running into a different version of this, where I have multiple related variables which get bound together but which all may be unbound if an exception is raised. The code pattern looks something like:

    try:
        x, y, z = somefunc()
    except ValueError:
        if should_ignore:
            x = None
        else:
            raise

    if x is not None:
        someotherfunc(x, y, z)

Pyright complains (rightfully) that y and z are potentially unbound, but in practice their use is guarded by the check on x, so any time we end up ignoring the exception the code will skip the call to someotherfunc() that uses x, y, and z. I don't expect a static analyzer to figure this out, but I'm curious of people's thoughts on other ways to structure this code, as I'd rather not assign dummy values to all of x, y, and z here. As pointed out above, that could have ripple effects if some of the types have non-trivial constructors which don't have any obvious dummy value which can be assigned, and making the type of those other objects Optional to allow the use of None could introduce other static type checking failures when members of those types are accessed without checking for None first.

Any thoughts?

3 replies

hmc-cs-mdrissi Oct 10, 2021

In situations like this I tend to either use namedtuple or add more asserts. I have some namedtuples that are only really used once/twice and mainly serve to better describe the return of functions that return tuple of arguments of different types. For your code it'd end up as

class Foo(NamedTuple):
  x: t1
  y: t2
  z: t3

try:
  foo = somefunc()
except ValueError:
  if should_ignore:
            foo = None
  else:
     raise

if foo is not None:
  someotherfunc(foo)

erictraut Oct 10, 2021
Maintainer

Yeah, I agree with @hmc-cs-mdrissi. I'd either use a tuple or named tuple rather than individual values or use extra asserts to document my assumption that y and z are guarded by x.

def somefunc() -> tuple[int, int, int]:
    ...


def someotherfunc(x: int, y: int, z: int):
    ...


def func1(should_ignore: bool):
    try:
        x = somefunc()
    except ValueError:
        if should_ignore:
            x = None
        else:
            raise

    if x is not None:
        someotherfunc(*x)

or

def func1(should_ignore: bool):
    (x, y, z) = (None,) * 3
    try:
        x, y, z = somefunc()
    except ValueError:
        if should_ignore:
            x = None
        else:
            raise

    if x is not None:
        assert y is not None and z is not None
        someotherfunc(x, y, z)

ronf Oct 12, 2021

In my existing code, somefunc() is already returning tuple, so that wouldn't be changing. Leaving it as a tuple until it is used later is an interesting idea, though. That gives me a way to make the tuple Optional without needing to modify the types of the x, y, and z arguments expected by someotherfunc(), or wherever else the tuple members might appear.

It turns out that you don't even need the assertions in this case, and you don't need to assign any initial values to the tuple members (avoiding the need for dummy values). What I ended up with looked something like:

    try:
        result: Optional[Tuple[T1, T2, T3]] = somefunc()
    except ValueError:
        if should_ignore:
            result = None
        else:
            raise

    if result:
        x, y, z = result
        someotherfunc(x, y, z)

Here's a more detailed real-world example:

    try:
        attach_result: Optional[Tuple[bytes, bytes, int]] = \
            await self._conn.attach_x11_listener(
                self, x11_display, x11_auth_path, x11_single_connection)
    except ValueError as exc:
        if x11_forwarding != 'ignore_failure':
            raise ChannelOpenError(OPEN_REQUEST_X11_FORWARDING_FAILED,
                                   str(exc)) from None
        else:
            attach_result = None
            self.logger.info('  X11 forwarding attach failure ignored')

    if attach_result:
        auth_proto, remote_auth, screen = attach_result

        result = await self._make_request(
            b'x11-req', Boolean(x11_single_connection),
            String(auth_proto), String(binascii.b2a_hex(remote_auth)),
            UInt32(screen))

Thanks very much for the suggestions!

nilskattenbeck-bosch · 2023-08-18T07:49:00Z

nilskattenbeck-bosch
Aug 18, 2023

My workaround is to use the else block though this might not be possible in all scenarios:

async for retry in AsyncRetrying(stop=stop_after_attempt(3)):
    with retry:
        response = await api_client.foobar()
    break
else:
    assert False
# response is no longer `Unbound | ...`

Regardless, I am a bit surprised to find out that assert "response" in locals() does not work...

0 replies

patrick-kidger · 2023-08-20T12:03:57Z

patrick-kidger
Aug 20, 2023

For those coming across this thread, this is the most elegant solution I've found:

x: int = None  # pyright: ignore
y: str = None  # pyright: ignore
done = False
while not done:
    x, y, done = foo(...)
bar(x, y)

1 reply

mtomassoli Mar 1, 2024

Here's a solution without ignore:

from typing import cast

x = cast(int, None)
y = cast(str, None)
done = False
while not done:
    x, y, done = foo(...)
bar(x, y)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for-loops that run at least once and "Possibly unbound" #2033

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

for-loops that run at least once and "Possibly unbound" #2033

Replies: 5 comments · 6 replies

erictraut Jun 25, 2021 Maintainer

sorcio Jun 25, 2021 Author

sorcio Jun 25, 2021 Author

erictraut Oct 10, 2021 Maintainer

Replies: 5 comments 6 replies

erictraut
Jun 25, 2021
Maintainer

sorcio Jun 25, 2021
Author

sorcio Jun 25, 2021
Author

erictraut Oct 10, 2021
Maintainer