A `yield` construct in the vein of C# #7746

Closed
bstrie opened this Issue Jul 12, 2013 · 35 comments

Projects

None yet
@bstrie
Contributor
bstrie commented Jul 12, 2013

@thestinger has mentioned this several times as a wishlist item. This is a feature that greatly simplifies the implementation of external iterators by allowing them to be compiled to state machines at compile time (did I get that right?). See http://msdn.microsoft.com/en-us/library/vstudio/9k7k7cf0.aspx for reference.

While we shouldn't expect this for 1.0, it might be prescient to reserve the keyword right now.

Nominating for far-future milestone.

@Sod-Almighty

Seconded. Hopefully not too far in the future.

@bstrie
Contributor
bstrie commented Jul 12, 2013

@Sod-Almighty far-future milestone just means that it would not be a priority for the primary developers until after 1.0. It leaves open the possibility that a volunteer could implement it themselves, and such a patch could be accepted regardless of the current milestone.

@Kimundi
Member
Kimundi commented Jul 12, 2013

This could probably not be done with a syntax extension, because it needs to do some serious rewriting, typechecking, inference etc. Using some mockup syntax you'd have:

yield fn foo(m: uint) -> pub struct FooIterator {
    loop {
        yield 1;
        yield 2;
        yield m;
    }
}

Which would get rewritten into something like this:

pub struct FooIterator {
    priv state: uint,
    priv m: uint
}

impl Iterator<uint> for FooIterator {
    fn next(&mut self) -> Option<uint> {
        let (new_state, ret) = match self.state {
            0 => (1, 1) 
            1 => (2, 2) 
            2 => (0, self.m)
            _ => fail!()
        };
        self.state = new_state;
        ret
    }
}

fn foo(m: uint) -> FooIterator {
    FooIterator{ state: 0, m: m }
}
@flying-sheep

It's pretty much the only option we have to make our new external iterators as nice as the internal ones can be in situations where external ones usually suck.

@thestinger
Contributor

This would be really nice to have, because manually hoisting out the state into a struct gets tiresome. The only case where it's truly painful to write an external iterator is a recursive traversal where you have to build a stack of the state you hoisted out.

Context switches for each iteration aren't acceptable so the solutions used by Ruby (to_enum) and Python (generators) aren't a good model. C#'s state machine generator is a killer feature, but would likely take a lot of effort to implement correctly.

@flying-sheep

Also yield is very awesome for lexers: yield some tokens, delegate to a subparser, and yield more is trivial to do and leads to very elegant code.

@graydon
Contributor
graydon commented Jul 13, 2013

Interesting. I'd recommend digging up a formal description of the rewriting algorithm, and doing a patent search to ensure it's safe to prototype. Happy to reserve the keyword though

@bstrie
Contributor
bstrie commented Jul 15, 2013

Given that #5633 was assigned to Far-Future and was closed in favor of this bug, I'm assigning that milestone here.

@JulianBirch

It would probably be desirable to support await as well, maybe clojure async style. I believe the state machine generation is fairly similar between the two cases.

@erickt
Contributor
erickt commented Aug 6, 2013

@bstrie: coincidentally enough, @huonw and I have been chatting about if we could support a syntax extension that can build state machines which compile down into computed gotos. This would be really handy for my ragel parser generator fork, and would probably help make yields as fast as possible. So if this issue goes forward, please try to leave some hooks available for general purpose state machines.

@Kimundi
Member
Kimundi commented Nov 24, 2013

I thought a bit about this today. One problem with providing yield syntax is that while it makes writing Iterator impls easy to write...

yield fn zero2ten() -> struct Zero2TenIter {
    let mut x = 0;
    while x <= 10 {
        yield x;
        x += 1;
    }
}

...DoubleEndedIterator impls nevertheless still require defining a struct and writing next() and next_back() manually:

fn zero2ten() -> Zero2TenIter { Zero2TenIter { start: 0, end: 10 } }
struct Zero2TenIter { start: uint, end: uint }
impl Iterator<uint> for Zero2TenIter {
    fn next(&mut self) -> Option<uint> {
        if self.start <= self.end {
            let v = self.start;
            self.start += 1;
            Some(v)
        } else {
            None
        }
    }
}
impl DoubleEndedIterator<uint> for Zero2TenIter {
    fn next_back(&mut self) -> Option<uint> {
        if self.start <= self.end {
            let v = self.end;
            self.end -= 1;
            Some(v)
        } else {
            None
        }
    }
}

As you can see, the problem with that is that now upgrading a existing yield function to a DoubleEndedIterator becomes even more work than if it started as a regular struct+impl to begin with, which means users will be less likely to do that, and just live with one-ended ones even if double-ended ones would make perfect sense and potentially improve re usability of the iterator.

So, as a solution, how about just making the yield syntax optionally a bit more complicated, with the ability to write both Iterator and DoubleEndedIterator impls more easily than in the manual "constructor fn + struct + trait impls" way:

yield fn zero2ten() -> struct Zero2TenIter {
    let mut start = 0;
    let mut end   = 10;
} front {
    while start <= end {
        yield start;
        start += 1;
    }
} back {
    while start <= end {
        yield end;
        end -= 1;
    }
}

The idea here is to give the user three code blocks to fill out: initialization, which contains all the setup code for the iterator and any variable declarations that are shared state for both Iterator impls, and two code blocks corresponding to the two iterator impls. A user of this syntax would still have to take care of checking that both code blocks play nice together and properly recognize that the ranges meet in the middle, but they could do so more easily than in the manual implementation way.

@Sod-Almighty

@Kimundi: Sounds good, but should be optional. Original syntax for single-ended iterators.

@JulianBirch

The problem with that design is that it kind of defeats the object of the generator: to be able to fully use the normal control flow to generate the iterator, you're back to implementing MoveNext. Instead, how about a syntax that returns from the iterator whether you want to go forwards or back?

goForwards = yield x;
start = start + goForwards ? 1 : -1;

This allows for fairly arbitrary control flows.

@Kimundi
Member
Kimundi commented Nov 24, 2013

@JulianBirch My proposal still uses the control flow to generate a state machine, and DoubleEndedIterators don't work that way. It's not "step back", it's "shorten range of values from the other end"

@huonw
Member
huonw commented Nov 24, 2013

(We could also use this to implement size_hint.)

@JulianBirch

Sorry, bit of a thinko. Should have been

goForwards = yield goForwards ? start : end;
if goForwards {
  start += 1;
} else {
  end -= 1;
}

Anyway, the point is that there's only one program counter and you can do arbitrarily nasty things. There's some gaps, but I think that gets the idea across.

@Kimundi
Member
Kimundi commented Nov 24, 2013

Hm, that might work too, but I'm not sure if all forward/reverse Iterators follow the same kind of control flow, that would need to be investigated. Provided they do though, I'd imagine a syntax like this:

yield fn slice_iter<'a, T>(slice: &'a [T]) -> struct SliceIter<'a, T> for &'a T {
    let mut start = 0;
    let mut end = slice.len();
    while start < end {
        yield in {
            [>..] => { let x = &slice[start]; start += 1; x }
            [..<] => { let x = &slice[end-1];   end -= 1; x }
        }
    }
}

That is, provide fork points in the control flow where iterating from the front vs from the back should cause different state changes.

I also on purpose picked exotic syntax instead of using something simpler like boolean values: While discussing the topic of double ended iterators, people repeatedly confuse how next_back() is supposed to work, so it seems worthwhile to me to have a little syntactic reminder: A double ended Iterator represents a range of values where elements can be picked from the front ([>..]) or from the back ([..<]).

A yield fn without yield in blocks could then simply be inferred to just be a simple one-ended Iterator impl

@cmr
Contributor
cmr commented Nov 24, 2013

I don't really think double ended iterators are that common when you'd want a generator, and I don't think generators should become sugar for iterators in general. Most things I've wanted generators for (usually processing text of some sort) don't make sense double ended.

@vadimcn
Contributor
vadimcn commented Nov 25, 2013

I've been thinking a bit about co-routines lately, and started putting together a Bikeshed/RFC proposal: https://github.com/vadimcn/rust/wiki/Bikeshed-Coroutines Edit: superseded by the RFC.
I would like to hear what other people on this thread think about it.

@JulianBirch

For the record, I agree with @cmr.

@cmr
Contributor
cmr commented Dec 9, 2013

@vadimcn looks fine to me, though I haven't put as much thought into this as others have. How would the proposal change with the recent closure reform?

@vadimcn
Contributor
vadimcn commented Dec 9, 2013

@cmr: Good question. My intention is that coroutines can be both stack- and heap- allocated, as inferred from usage. However these days closure storage is not automatically inferred, so this plan probably won't work. Also, unlike proc's, coroutines are not "once fn's", so I am not sure what the type of a heap-allocated coroutine should be.

I think I'll have to wait out to see how the whole closure reform / Fn traits thing turns out...

@sanxiyn
Member
sanxiyn commented Jan 24, 2014

I note that yield was reserved as a keyword in #8560.

@errordeveloper
Contributor

@farcaller I think having yield would help to implement stackless co-operative multitasking on bare-metal devices and the fact that it would be done at compile time would be a huge bonus too. I'm actually wondering how a yield! macro could be implemented?

@farcaller
Contributor

C++ coroutine implementations that I'm aware of are using switch {} "hack" of C that allows you to put the case statement literally everywhere inside the block. I don't think it's possible to do something similar in rust though.

@vadimcn
Contributor
vadimcn commented Jul 9, 2014

BTW, I did submit my proposal as Coroutines RFC. Was not accepted.

@rpjohnst
Contributor

One way to help support this would be to expose stack frames as explicit objects from the language. That could potentially help a library-based syntax extension convert a function to a state machine for a generator, or to implement full coroutines with their own stacks, or even delimited continuations. It could also be a hook for a library-based GC.

@errordeveloper
Contributor

Just trying to recap this now, and realised that coroutines implemented with a C switch, i.e. Protothreads, are incredibly unsafe. So this would a major language extension, I can imagine.

@errordeveloper
Contributor

So for me, the point is to have very lightweight single threaded co-operative multitasking, similar to Protothreads. And now I just realised one more thing. In fact, it would be rather correct to say that Protothreads had been invented to ease coding of complex state machines in C. Thereby, in theory, if one defines a few simple macros that generate a single-threaded state machine code in rust, that would a good enough replacement. Although, the task of implementing such macros would be eased if yield keyword existed. (Thanks to @olgierdh for some input on this).

@japaric
Member
japaric commented Oct 10, 2014

Should this be moved to the rust-lang/rfcs repo? I don't think this is actionable without having an RFC accepted first.

cc @nick29581, @thestinger

@thestinger
Contributor
@thestinger thestinger closed this Oct 10, 2014
@alexcrichton
Member

@thestinger, closed RFCs should now have corresponding tracking issues in the RFC repo, and one does not exist for the RFC you linked to as it was closed before we started making tracking issues.

In the meantime, I'd like to leave this open so it can be migrated. @nick29581, can you migrate this issue and close it afterwards?

@alexcrichton alexcrichton reopened this Oct 10, 2014
@hatahet
Contributor
hatahet commented Oct 10, 2014

Is this related to rust-lang/rfcs#105?

@thestinger
Contributor

@hatahet: No, not really.

@rust-highfive
Collaborator

This issue has been moved to the RFCs repo: rust-lang/rfcs#388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment