-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different namespaces for iterators/procs are a design mistake #8901
Comments
I always assumed they shared namespace, so I did not even ever notice they live in different namespaces: I never tried to overload a proc with an iterator of the same name and signature :-) |
In that example, Wouldn't that cause really a lot of breaking? |
Yes, indeed.
There would be a switch like |
As a first step, we would do: proc splitLines(...): seq[string] {.deprecated: "use sequtils.toSeq splitLines() instead".} to prepare everybody for this change. |
What about `..`? Would we use newSlice instead? |
@hlaaftana I totally missed that overload. This needs some serious thinking. :-( |
Or maybe |
Let |
This is covered in the tutorials. And why would it iterate over something if |
That I didn't read :)
Because I expected it to be some kind of smart-ass range iterator, using |
Oh, I love the |
|
So the HSlice creation using |
|
Sorry, I'm wrong, all these can continue to work if |
In that case, |
No, why? There is no |
There is this: Lines 3510 to 3512 in e81fe6d
|
Ok, you know the stdlib much better than I do. :-) That's a bit of a problem indeed then. |
While I like the proposal, one performance concern. |
I've been meaning to write a proper post on that (later..), but briefly: I think we can make (inline) iteratos be 1st class entities, allowing things like: assert type(iter) == iterator[int] # instead of int (there's a way to make this work without breakage, possibly using a different name eg`typeGeneralized`)
iterator iter2 = iter # can be passed around, while remaining inline Furthermore, it'd allow to extend (in user code) the iterator concept reusing existing Nim features, bridging some of the gap with D's ranges.
|
Wow. I'm really going against the consensus here, but I dislike this. Are you guys really okay with no longer being able to write |
IMO the current namespace rules cause a lot of confusion, with very little benefit. It's completely non-obvious that the following calls to let x = foo(y)
for z in foo(y):
discard I hope this RFC will be implemented. |
Let's turn this into a complete RFC. I suggest the following changes in the type system surrounding inline iterators and the semantics of closure iterators:
proc countup(a, b: int): itarator: T =
result = iterator () {.closure.} =
var res: T = T(a)
while res <= b:
yield res
inc(res, step)
var myIter = countup(1, 2)
echo myIter() # some(1)
echo myIter() # some(2)
echo myIter() # none() The API above uses option types and perhaps that's controversial (for Araq). Alternatively, step 3 can return a default value and raise a
iterator map[A, B](iter: iterator[A], f: proc(a: A): B): B =
for elem in iter: yield f(elem) When instantiated with inline iterators, these definitions are expanded to efficient inline code.
The current semantics of closure iterators are not well understood by many and potentially quite confusing. Consider this snippet and take some time to understand it: proc foo(start: int) =
iterator bar {.closure.} =
var i = start
yield i
inc i
yield i
var x = bar # this creates one instance of the iterator
echo x()
var y = bar # this creates another fresh instance
echo y() The output is
The explanation is that every time you reference the name of a closure iterator, you create a fresh new instance of it. This is in contrast to regular closures where you can reference the name of an inner proc many times and you'll always get the same closure object. If closure iterators were behaving exactly as regular closures in this regard, I think the language would be more consistent and intuitive for newcomers.
This is another feature that is confusing for a lot of people and it doesn't add any expressivity to the language, because you can always model it with additional captured variables. Furthermore, having an iterator param requires you to update the value at each step, while updating a captured value can be done on-demand when that's actually required. Points 5 and 6 are breaking changes, but I think the currently confusing semantics are so rarely activated that most of the code wouldn't require any changes. Using the |
My RFC was about making the language simpler, yours is yet-another type extension.
Captured variables can introduce a copy and avoiding it means you need to move the data into the iterator's state and then move it out again after the iteration. Not exactly an elegant design IMO. |
Well, my point is that the Can you provide an example where a captured variable would produce different code than an interator param? Having an iterator param also expands the iterator's state. The two approaches look the same to me. |
Actually, I agree that the generated code is different enough for closure iterators with parameters and they can actually have different expressivity when you return them from functions and their parameters get updated far from the place where they were created. Point 6 can be retracted, but it's relatively independent from the rest of the proposals. |
It's not clear what you mean here, is this a) |
It's important to recognize that an inline iterator is a factory function for creating such values. Once you create the iterator with
|
From a pure theoretical standpoint, your Iterator[T] type should be named Stream[T]. So, we can say that iterator is a procedure that returns a stream. And, similarly to Stream[T], it would be nice to have Lazy[T] that lazily returns a single T value. Actually, Lazy[T] = proc(): T, but we may add some extra sugar to simplify work with lazy values. Also, sorry, I don't remember the manual details - can we just make a single iterator type and rely on compiler to decide whether it should use inline or closure implementation? I.e. have closure official semantics and allow compiler to optimize it in simple cases? |
That's a wrong way to design things IMO, if we were to redesign this from scratch. Firstly, the type
BaseIter {.inheritable.} = object
state: int
proc hasNext(it: BaseIter): bool = it.state >= 0 An type
CountupIter = object of BaseIter
v, a, b: int
proc next(it: var CountupIter): int =
discard "state machine here" The var it: CountupIter
it.a = 1
it.b = 3
it.state = 0
whlie hasNext(it):
let x = next(it) Which a good optimizer can optimize to the old for loop as it only touches stack slots. (We can also do this optimization in the Nim compiler.) But the It unifies closure iterators with "inline" iterators, allows for iterating over 2 data structures simultaneously, only uses the compiler to write these state machines for us, fits "systems programming" better. |
Araq's design is very similar to Rust's and Python 2 iterators: An iterator type or even concept just requires an As shown by @alehander42 in zero-functional benchmarks, this get compiled in very efficient code even when chaining iterators, it is even faster than Nim's with macro fusing the iterators. Even in Arraymancer for efficient parallel iteration on multiple tensors I had to use the following design:
It is key that an inline iterator doesn't allocate otherwise it's slow, completely incompatible with multithreading and also cannot be used in embedded devices. The following doesn't work:
|
@Araq, of course the for loop must unpack the iterated values. This should be true even with the closure iterators: var iter = iterator: int {.closure.} =
yield 1
yield 2
yield 3
# The type of `iter` is iterator: int right now
for el in iter:
echo el # the type of `el` is int as expected I was actually expecting that Nim already works this way and I just found out this is not the case. @Araq seems to be arguing that You cannot efficiently implement the So, if type
# Applying the `with` operator to a closure type allow you to obtain
# a more specific closure type that has a specific captured state.
MyClosure = proc(x: int): int with (string, int)
MyIterator = iterator: string with (Foo, Bar)
var c: MyClosure = ...
c[0] = "some string" # You can read and modify the values captured by the closure To use your own example, we don't want to end up in a situation where pragmatists avoid closures and iterators, because they may end up needing to serialize the values eventually (just like in C++ and Java people always write getter functions in case they become necessary). Removing such trade-offs in the language is the better way to design things IMO and the suggested types above are one way to do it. To re-iteratate the main point, not having specially handled inline iteration values sounds too risky to me. We can have the @mratsim. you seem to be confused that I suggest that inline iterators are compiled as closure iterators. This is not the case, I was merely explaining that the control-flow matches what the programmer would expect from the equivalent closure iterator. Inline iterators are still compiled to efficient inlined code without allocations. |
my first thought was is that he copied Lua iterators intact. But I don't know much Python/Rust
I also want to be able to use iterators in hard-optimized, low-level code which now I write in C++. So f.e. OTOH, I think that Nim has too much small practical features, which are better be replaced by fewer more fundamental ones. In particular, we can implement iterators in the follwoing ways:
Now, while we have good amount of implementation approaches (and non-C backends may add their own, in particular based on particular VM features!), we are better to find single concept and syntax that can cover them all, while ensuring that specific optimizations are possible, f.e. via pragmas. So, I propose the following plan:
|
Now, my first attempt to iterator proposal: concept Iterator[T,T1,T2...] = lambda(T1,T2...): Maybe(T) So, iterator is either a procedure with internal state (i.e. lambda), or an object with method
iterator range(i,j:int): int =
int x=i
while x<j:
yield x++ is translated to proc range(i,j:int): Iterator[int] =
int x=i
return proc(): int =
if x>=j: return Nothing
return Just(x++) |
I should also point out that Araq's transformation is not equivalent to what Python is doing. Consider the following snippet: def generator(a, b):
print("iteration started")
while a < b:
yield a
a = a + 1
print("iteration ended")
g = generator(1, 5)
print("iterator created")
for e in g:
print("for loop body start")
print(e)
print("for loop body end")
print("program done") The output is:
Please notice how "iterator created" is printed before "iteration started" - this means that creating the iterator is a separate step from starting the execution of its body, which happens only after the fist call to def iterateNodeChildred(node):
for e in node.sons:
yield e It won't produce values if you pass in a node without any sons. In order for type
Iterator[T] = concept it
it.next() is bool # try to advance the iterator to get to the next value
it.value is T # read the reached value as a cached field ... or you can use the Using such a concept may be optimized just fine for simple loops, but I doubt that it will produce efficient code when higher-order iterators are created. Anyone can prove me wrong by writing the classic algorithms such as Also, it was suggested that zero-functional is using Araq's transformation to achieve its high performance, but that's not the case. Here is an example taken from the README: var n = zip(a, b) -->
map(f(it[0], it[1])).
filter(it mod 4 > 1).
map(it * 2).
all(it > 4) It's compiled to: (proc (): auto =
var minHigh134598 = min([high(a), high(b)])
var empty = true
for z in low(a) .. minHigh134598:
var it0 = (a[z], b[z])
var it1 = f(it0[0], it0[1])
if it1 mod 4 > 1:
var it2 = it1
var it3 = it2 * 2
result = true
if not(it3 > 4):
return false)() So, it's doing inlining, only in user space and with a limited set of algorithms. |
Araq's design is just a simple state machine. That is very efficient. Another look into Stream/iterators in Rust (state-machine based) and Haskell (coroutine-based): https://www.fpcomplete.com/blog/2017/07/iterators-streams-rust-haskell |
@mratsim, from your own citation:
The reality is that the "simple state machine" needs the compiler to perform inter-procedural switch statement elimination to be able to remove the state variables and compile the code as regular loops. This may happen if you are lucky and the functions got inlined, but not otherwise. |
The fact that currently |
@Araq what about replacing iterator Overall, we can replace each existing iterator with a procedure building temporary object, and mulitude of procedures consuming this object for various types of iteration. In particular, in addition to (or instead of) Overall, in the If EXPR doesn't implement any supported Iterator concept, we should replace it with items(EXPR) or pairs(EXPR) and check again. This proposal essentially turns UPDATE: As I see, the iterator items(HSlice) already exists. So, it seems that we can just drop the Finally, we can recommend user libraries to do the same if they need to use the same name both as an object and an iterator. |
Yeah I know, but the last time we did something very similar with how |
It should also be noted that GCC does not optimize Slice[int].items to what .. does and runs 24 times slower according to this benchmark on TIO |
Since this is apparently staying with us, we might as well think of it as a feature. 😄 This would allow e.g. keys of a table to be used directly as a sequence (see also #13595), as one creative user did here:
|
Hi guys, as @pietroppeter mentioned I discovered this issue today :). I tried +1 for keeping this feature until there will be a better way to support function composition like chaining etc. Being able to compose functions is important feature, would be sad to loose it. |
I still prefer a unification of procs and iterators but ok, you have a point. |
as remarked by a user over reddit, in Nim official tutorial we actually have a sentence that says: "it is common practice to wrap iterators in procs of the same name which accumulate the result of the iterator and return it as a sequence, like split from the strutils module." |
@pietroppeter PR welcome to change this sentence from the tutorial to the opposite, now that we have
best practices change as nim evolves. |
@timotheecour to be honest I actually kind of like this feature. I added the remark so that we have this tracked if at some point we decide to change. As it stands now, I think the sentence is not that incorrect: it does happen in stdlib to have proc overload for iterators. And if you were to declare it an anti-pattern you would have for consistency to at least deprecate its uses in the stdlib which, this issue being closed, is not going to happen. if I were to change I would go with something like:
|
In today's Nim iterators and procs can have the same name and type signature. The idea was that iterators can only be used in
for
loop contexts anyway and procs cannot. However, with the addition ofclosure
iterators this is not true anymore.It also leads to confusing error messages, like "Undeclared someIter" when "someIter" clearly was declared.
The
type
operator has a special rule so that the "iterator" operation is preferred over the "proc" operation interpretation, so thattoSeq
and its siblings can work. GiventoSeq
, we don't need to be able to overloadproc splitLines
anditerator splitLines
, the iterator version is good enough, when aseq
is required fromsplitLines
one can writetoSeq splitLines(...)
instead. Yes, it's slightly more typing effort, but Nim becomes easier to understand, weird edge cases are eliminated, the spec gets simpler.The text was updated successfully, but these errors were encountered: