Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerator fiber yield #2002

Open
wants to merge 1 commit into
base: master
from

Conversation

@ioquatix
Copy link
Member

commented Nov 3, 2018

This makes it so that it is possible to call Fiber.yield within a Enumerator block.

This is breaking async when non-blocking IO is used in an Enumerator: socketry/async#23

I'm not sure if this is the best solution, but it feels like the right approach.

The general idea is that user should not worry about how Enumerator is implemented, Fiber.yield should work as expected.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 5, 2018

Second commit just cancels first, and it tries to solve problem in a wrong way...

Async should use transfer, because logically, transfer is to switch fiber, and yield is to dive into fiber.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 5, 2018

Fiber.resume/yield is to dive into fiber as into assymetric coroutine. Enumerator is assymetric coroutune, that is why Fiber.resume/yield is suitable for.

Fiber.transfer is for swithing between symmetric coroutines, and "lightweight threads" are symmetric coroutines.

Fiber.resume/yield is just Fiber.transfer with call stack maintained. Therefore, you can always use Fiber.transfer instead of Fiber.yield if you maintain call stack by your self. But you don't need to.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

Ignoring Async for a moment, and thinking purely about Enumerator - do you think that the internal detail of how Enumerator is working should change the behaviour of user code?

Because, to me, I think behaviour of Fiber.yield should not change whether Enumerator is "internal" or "external". So, the 2nd commit is a way to achieve that.

That being said, my first approach was to use transfer, but it turns out to be impossible to use transfer because there is no way to resume the correct fiber. Consider the following:

#!/usr/bin/env ruby

require 'fiber'

class Fiberator
	def initialize(&block)
		@caller = nil
		@fiber = Fiber.new(&block)
	end
	
	def next
		return nil unless @fiber.alive?
		
		@caller = Fiber.current
		
		return @fiber.transfer(self)
	end
	
	def << value
		@caller.transfer(value)
	end
end

e = Fiberator.new do |y|
	while true
		Fiber.yield
		
		y << 10
	end
end

f = Fiber.new do
	puts e.next
	puts e.next
end

f.resume
f.resume # double resume

Once you transfer to another fiber, you MUST transfer back. Otherwise, your assumptions about fiber stack are wrong and you don't know who to resume.

So, using Fiber.transfer in Enumerator is impossible if we want to allow use to call Fiber.yield predictably.

Therefore, the only solution is the 2nd commit, which captures Fiber.yield, and correctly forwards it to user code.

Taking into account async, unfortunately it depends on nested resume/yield, so unless we implement our own stack, it's impossible to simply use transfer. It has similar issues to the above design too.

I did try implementing it here: https://github.com/socketry/async/blob/fiber-transfer/lib/async/scheduler.rb and I might have another go at trying to make it work, but it was tricky to get the right behaviour. I'm also concerned about performance of tracking that in "interpreted" code since by design fiber context switch needs to be fast. I'd rather pay a small cost in Enumerator than a big cost in async for every context switch.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

Impossible? I did it once, and it worked quite well for me: https://gist.github.com/funny-falcon/2023354
But today I really think Enumerator should not use Fiber.transfer.

The fact "async" uses nested Fiber.yield is a design mistake of "async", and it should not lead to bad decisions in Ruby.

I did some thing that were quite close to "async" by features on top of EventMachine, and all attempts to use nested Fiber.yield lead to errors. Use of EM.next_tick always lead to much more composable and managable solution, because symmetric coroutines should be scheduled with symmetric mechanism.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

I am interested in your patch, I will try it out.

The fact "async" uses nested Fiber.yield is a design mistake of "async", and it should not lead to bad decisions in Ruby.

Can you explain why you think using Fiber.yield is a design mistake?

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

I've already explained. But I will repeat:

Symmetric coroutines should not use assymmetric control switch between them. Assymetric control switch should only be between coroutine and scheduler. Direct switch between coroutines should be only symmetric.

Async's Condition, Notification, Queue and Semafore should not use Fiber#resume to dive into "tasks", but rather Reactor#<< to schedule "tasks" for future execution.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

Assymetric control switch should only be between coroutine and scheduler.

But, since Fiber.yield and Fiber#resume already occupied by Enumerator, "async" have to implement its own assymetric control switch on top of Fiber#transfer

It was big mistake to hide Fiber#transfer into library and not to expose it in core.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

It was big mistake to hide Fiber#transfer into library and not to expose it in core.

You mean, require 'fiber'?

Async's Condition, Notification, Queue and Semafore should not use Fiber#resume to dive into "tasks", but rather Reactor#<< to schedule "tasks" for future execution.

I thought about this design. I wouldn't say it's better or worse. In some ways, it's better, in some ways, it's worse. I understand now what you are talking about though.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

You mean, require 'fiber'?

Yep. Because of that many people doesn't consider transfer at all.

Async's Condition, Notification, Queue and Semafore should not use Fiber#resume to dive into "tasks", but rather Reactor#<< to schedule "tasks" for future execution.
I thought about this design. I wouldn't say it's better or worse.

It certainly better, because it uses right thing for the task. As I've said, symmetric coroutines should be switched using only symmetric mechanism. Reactor#<< is symmetric.

Think in another way: when one uses operation systems synchronization instruments, does operation system switches tasks immediately? No, it schedules them for execution. And beside simplicity of implementation, it provides better composability.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

Probably, both Enumerator and async should use Fiber#transfer with their own stack maintenance. This way Fiber.yield will always mean "user intented call to coroutine".

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

I understand your explanation.

Think in another way: when one uses operation systems synchronization instruments, does operation system switches tasks immediately? No, it schedules them for execution. And beside simplicity of implementation, it provides better composability.

I understand this. I agree with your reasoning and I think it's a valid concurrency model that is very common. For me, however, another thing to consider is determinism.

I think coroutines provide determinism which OS/threads cannot. This is a major benefit because we schedule IO when it's possible, rather than OS which doesn't always know what to do next (i.e. which thread to resume). So, in theory, it's more efficient, because when we call resume we go directly to code which should execute next. By putting it back into reactor, we loose determinism and we also incur an overhead because every operation must go through scheduler.

I don't really believe one can say which approach is better. They have different trade-offs IMHO.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Probably, both Enumerator and async should use Fiber#transfer with their own stack maintenance. This way Fiber.yield will always mean "user intented call to coroutine".

Unfortunately, if you call Fiber.yield after Fiber.transfer, it's almost impossible to know what to call resume on. So, it cannot be composed together. At least, that was my experience when trying to implement it.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

Ah? If you call Fiber.yield after transfer, you will return to fiber, which called Fiber#resume into fiber, which called yield.

transfer acts as switching between independant control flows. And resume+yield as "calling into coroutune", therefore, resume always returns from fiber which it calls to.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Ah? If you call Fiber.yield after transfer, you will return to fiber, which called Fiber#resume into fiber, which called yield.

transfer acts as switching between independant control flows. And resume+yield as "calling into coroutune", therefore, resume always returns from fiber which it calls to.

Yes, that's right, but after transfer, then yield, what do you call resume on to get back?

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 6, 2018

Yes, that's right, but after transfer, then yield, what do you call resume on to get back?

That depends on what you mean by "back". There are many "backs".

require 'fiber'
queue = []
sched = Fiber.new do
  while fib = queue.shift
    puts "Schedule #{fib}"
    if f=fib.transfer
      f.resume # finish fiber
    end
  end
  puts "No tasks to execute"
end

task = lambda do |n|
  Fiber.new do
    subcoro = Fiber.new do |k|
      k = Fiber.yield "#{n}-#{k}-1"
      #blocking call
      queue << Fiber.current
      sched.transfer
      #resume
      Fiber.yield "#{n}-#{k}-2"
    end

    puts "task#{n} #{subcoro.resume 1}"
    puts "task#{n} #{subcoro.resume 2}"
    # task exit
    sched.transfer Fiber.current
  end
end

queue << task[1] << task[2]
sched.resume
Schedule #<Fiber:0x00005604af70e4f0@fibb.rb:12 (created)>
task1 1-1-1
Schedule #<Fiber:0x00005604af70e3b0@fibb.rb:12 (created)>
task2 2-1-1
Schedule #<Fiber:0x00005604af70dca8@fibb.rb:13 (suspended)>
task1 1-2-2
Schedule #<Fiber:0x00005604af70d5f0@fibb.rb:13 (suspended)>
task2 2-2-2
No tasks to execute
@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Thanks for the example, it's late, I will take a look tomorrow.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 7, 2018

Fixed example a bit: added fiber finalization (f.resume # finish fiber)

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 7, 2018

Looks like there is a need for Fiber. transfer_on_exit(target) method, that will allow to resemble "switch back to caller on exit" behavior of fiber.resume, but in more flexible way.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 7, 2018

Looks like there is a need for Fiber. transfer_on_exit(target) method, that will allow to resemble "switch back to caller on exit" behavior of fiber.resume, but in more flexible way.

Yep, I understand. Otherwise, transfer makes (predictable) resume impossible.

@ko1

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

Just FYI (I don't read this thread completely because of many English text...), Fiber#transfer is not supported enough because it cause critical bugs in some situation (I forget correct example...). This is why it is not supported. It is same position as callcc (it also has some critical issues).

Generally speaking, Fiber#transfer is difficult to use for ordinal ruby programmer I think. This is another reason why Fiber#transfer is not supported w/o require fiber lib.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 7, 2018

Thanks for that @ko1 it's really helpful to understand the historical context and how it integrates with the rest of the system.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 7, 2018

@ko1

Fiber#transfer is not supported enough because it cause critical bugs in some situation (I forget correct example...).

But Fiber.yield is not enough. Yes, transfer is low level, but it is unavoidable for building new functionality, because Fiber.yield could not be orthogonal to itself.

Therefore, either transfer used, or there should keyed resume_with+yield_for to maintain orthogonal fiber nesting stack.

Offtopic: for me, critical bug is "ensure could be ignored if fiber not returned", ie fiber could be forgotten and garbage collected despite pending ensure block.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 7, 2018

In my C++ implementation, if a fiber goes out of scope but it's not finished, it's automatically resumed and terminated.

https://github.com/kurocha/concurrent/blob/master/source/Concurrent/Fiber.cpp#L29-L44

@ko1

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

But Fiber.yield is not enough. Yes, transfer is low level, but it is unavoidable for building new functionality, because Fiber.yield could not be orthogonal to itself.

Yes. Fiber without transfer is designed as semi-coroutine. Not a coroutine in computer science. This design is from Python's generator (maybe...). This limitation is intentional and I understand it is not enough for some users.

Therefore, either transfer used, or there should keyed resume_with+yield_for to maintain orthogonal fiber nesting stack.

Offtopic: for me, critical bug is "ensure could be ignored if fiber not returned", ie fiber could be forgotten and garbage collected despite pending ensure block.

Offtopic too. Yes. I want to solve this issue, but it is difficult to solve it, implementation and compatibility....

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 7, 2018

New coroutine implementation can solve this problem. The next step is pooled fibers, with explicit scope.

@funny-falcon

This comment has been minimized.

Copy link

commented Nov 8, 2018

Mew coroutine will not solve external enumerator, that iterated over File.open{} file, imho. Unless external enumerator became to be cooutine.

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Nov 8, 2018

The coroutine implementation exposes a consistent API on which Fibers and other abstractions can be implemented. It can help us solve some of these issues, for example Enumerator might not use Fiber.. it can still use coroutine, but it won't affect fiber stack in any way.

@ioquatix ioquatix referenced this pull request Jan 10, 2019

@ioquatix ioquatix force-pushed the ioquatix:enumerator-fiber-yield branch from 9d686fb to 03440f6 Jul 28, 2019

@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2019

@elct9620 here is the sample code we worked on

#!/Users/samuel/Documents/ruby/ruby/build/miniruby

def sequence
	yield 1
	yield 2
	yield 3
end

def things
	yield "cats"
	Fiber.yield "dogs"
end

f = Fiber.new do
	#e = enum_for(:things)
	
	#puts "next: #{e.next}"
	#puts "next: #{e.next}"
	
	# things do |item|
	to_enum(:sequence).each.zip(to_enum(:things)) do |item|
		puts "each: #{item}"
	end
end

puts "resume: #{f.resume}"
#f.resume
@ioquatix

This comment has been minimized.

Copy link
Member Author

commented Aug 5, 2019

Another example/repro:

#!/usr/bin/env ruby

def things
	yield :cat
	Fiber.yield :dog
	yield :fish
end

fiber = Fiber.new do
	iterator = to_enum(:things)
	
	puts "iterator.next: #{iterator.next}"
	puts "iterator.next: #{iterator.next}"
end

puts "fiber.resume: #{fiber.resume}"

@k0kubun k0kubun changed the base branch from trunk to master Aug 15, 2019

@k0kubun

This comment has been minimized.

Copy link
Member

commented Aug 17, 2019

It seems to have a conflict now. Could you rebase this from master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.