Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of finite iterables #24757

Open
embray opened this issue Feb 16, 2018 · 17 comments
Open

Improve handling of finite iterables #24757

embray opened this issue Feb 16, 2018 · 17 comments

Comments

@embray
Copy link
Contributor

embray commented Feb 16, 2018

There is a general problem throughout much of Sage where code checks if some argument is a list or tuple (e.g. isinstance(x, (list, tuple))) where really it should be able to accept any argument which, depending on the context, implements either the Sequence protocol, or even just the Iterable protocol.

This especially becomes a problem on Python 3 where common built-ins like map() and filter() now return lazily-evaluated iterators instead of lists.

In many cases we've addressed this by replacing map/filter calls with list comprehensions and that works fine. But really also ought to make many of these interfaces more flexible in what types they accept.

For some cases simply replacing (list, tuple) with Sequence is sufficient (basically, this is any object that implements __iter__ as well as __len__ and __getitem__). Unfortunately, there are other cases where we don't need the full Sequence protocol, and in fact would like to take any iterable (like map).

The problem with the latter case is that in most cases what we really want is finite iterables. We don't want to write code that can be passed an infinite generate and hang. To some extent this is the user's responsibility but we should try as much as possible to prevent it. If an iterable implements __len__ then no problem, but map and filter do not (since they can wrap any iterable(s), finite or not).

An even bigger problem is that Python's interface does not allow us to easily introspect map or filter to see if the iterable's they're wrapping are finite. In some sense there's no general way to do this with certainty, if you have arbitrarily nested iterables (though I feel that that's a shortcoming in Python). But for most cases we should be able to check if a map or filter is finite by looking directly at the iterable(s) they were passed as arguments, so it might be nice to have a C-level helper for that.

Component: python3

Issue created by migration from https://trac.sagemath.org/ticket/24757

@embray embray added this to the sage-8.2 milestone Feb 16, 2018
@jdemeyer
Copy link

comment:1

Why is the finiteness so important? For me, an iterable is just an iterable. Trying to figure out the length of arbitrary iterables is not possible, so why bother?

@embray
Copy link
Contributor Author

embray commented Feb 22, 2018

comment:2

There's been lots of debate about this in the Python community in the past.

Of course finiteness is important. If you do something like list(infinite_generator) your code will hang forever, or until you blow up with a MemoryError. As I wrote, one can only do so much to protect users from that--it's not 100% possible in all cases--but I certainly prefer to as much as possible. I've definitely seen cases where just blindly assuming all iterables will be finite can blow up in one's face.

@jdemeyer
Copy link

comment:3

Replying to @embray:

Of course finiteness is important. If you do something like list(infinite_generator) your code will hang forever, or until you blow up with a MemoryError.

What's your point? If the user writes bad code, then bad things can happen. That's a basic axiom of programming. We don't "protect" the user from running

L = []
while True:
    L.append(0)

Of course, it can often be nice to check the input given by the user. That's what exceptions are for. But here we have the problem that we want to accept certain input where we cannot a priori determine whether it's good (finite) or bad (infinite). So there are two "solutions":

  1. Your solution: only accept input which can be proven good. This means breaking certain meaningful use cases.

  2. My solution: accept all input and assume that it's good. This puts the burden on the user not to do anything stupid.

@embray
Copy link
Contributor Author

embray commented Feb 22, 2018

comment:4

The problem with iterators, and especially generators, is that their behavior can be extremely obscured. If you have an explicit while True: it's obvious to see, and no one would write it. But it's quite easy (and this comes up a lot especially in coroutine-based programming) where you (the user, but which I also mean a developer) can wind up with some infinite generator buried deep in other generators.

This discussion led in part to the creation of __length_hint__ though that's more intended just as an optimization for the interpreter to use; it's not, nor is it intended to be, a fix to this problem. It was just a small concession that's come out of those discussions.

Of course, given some arbitrary code it's impossible to know if it will always terminate, but there are definitely common cases where we can do better than nothing.

This means breaking certain meaningful use cases.

Like what?

Also, I'll note, there are lots of places in my Python 3 branch where for now I just accept arbitrary iterables where it's possible to. But this is a concession to the fact that there's often no better way to check finiteness. The point of this ticket was to talk about adding some simple checks for common cases. E.g. if it were possible with the map object to dig into it and introspect what iterables were passed to it there's a lot one might be able to say, and it's unfortunate that the API doesn't allow that. I believe it should.

@jdemeyer
Copy link

comment:5

Replying to @embray:

This means breaking certain meaningful use cases.

Like what?

Generators.

@embray
Copy link
Contributor Author

embray commented Feb 27, 2018

comment:6

I'm not sure what you mean. I'm not proposing anything that would break using arbitrary generators. Though it's a shame Python doesn't provide a way to attach a length hint to generators--or at least something along the lines of "this is expected to terminate".

@jdemeyer
Copy link

comment:7

Replying to @embray:

I'm not proposing anything that would break using arbitrary generators.

OK, then what are you proposing? I don't understand anymore...

Maybe you should give a concrete example of some bad iterable that you want to catch.

@embray
Copy link
Contributor Author

embray commented Feb 27, 2018

comment:8

I'm not proposing anything specific here. This ticket is intended mostly for brainstorming.

I think it does matter whether or not an iterable is finite, but the problem is much of Python doesn't seem to care and I'm wondering out loud if there's anything we can do to improve that situation for places where it matters. I see it as a quality issue...

@jdemeyer
Copy link

comment:9

Thanks for the clarification. Still, I don't really what could be done. Surely, the problem cannot be solved for all iterables because of the Halting Problem. And there are almost no cases where an iterable is known to be infinite. So either you know that it's finite or you don't know anything. I think that we should always assume the best and treat the unknown cases as finite. So you end up with zero information.

@jdemeyer
Copy link

jdemeyer commented Mar 1, 2018

comment:10

Related: #13556

@embray
Copy link
Contributor Author

embray commented Mar 5, 2018

comment:11

Obviously we can't always say with absolute certainty that some calculation will terminate, but when implementing iterators, even generators, we can say "this should terminate".

@embray
Copy link
Contributor Author

embray commented Mar 5, 2018

comment:12

Just to give an example, say iterables had some flag on them like __finite__. I'm not sure how you'd set it on generator expressions--perhaps a built-in you can wrap it in like finite(<generator expression>) which sets said flag (or use it as a decorator on generator functions).

Then, for example, a range object would certainly have __finite__ == True. Then, say, take map. By default map.__finite__ == False (this does not mean it is definitely infinite, it just means we don't know). If do something like m = map(func, range(10)) then m.__finite__ == True. Likewise for m = map(func, range(10), range(10)) and so on.

I get your point that if we said "okay, this function will reject any inputs that don't have __finite__ == True" then we would be excluding otherwise valid inputs that simply don't have that flag set. But if such a feature did work I think you'd still wind up accepting the vast majority of well-formed inputs. I'm especially motivated here by map and filter now that they return iterators instead of lists, as well as some other built-ins.

Anyways, this isn't necessarily a problem that Sage alone can solve.

@jdemeyer
Copy link

jdemeyer commented Mar 5, 2018

comment:13

Replying to @embray:

Just to give an example, say iterables had some flag on them like __finite__.

OK, now I understand what you mean. This one sentence explains everything. Still, that sounds like something for a PEP, not something that Sage can fix.

@dimpase
Copy link
Member

dimpase commented Mar 16, 2018

comment:14

It seems to me that a wholesale replacement of these new (py3-only) iterators (e.g. filter()) by iterables (e.g. lists) is not a good idea, as it prevents potential performance optimisations.

Why not just wrapping them with list() or tuple() instead? It is much less work, and it does not remove the opportunity to optimise later on, if desired.

@embray embray modified the milestones: sage-8.2, sage-8.3 Apr 26, 2018
@jdemeyer
Copy link

comment:16

See also https://bugs.python.org/issue33939

@embray
Copy link
Contributor Author

embray commented Jun 22, 2018

comment:17

Interesting. I'm curious, how did you stumble on that?

@jdemeyer
Copy link

@embray embray removed this from the sage-8.3 milestone Jul 18, 2018
@jdemeyer jdemeyer changed the title py3: improve handling of finite iterables Improve handling of finite iterables Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants