New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enumerator fiber yield #2002
Enumerator fiber yield #2002
Conversation
Second commit just cancels first, and it tries to solve problem in a wrong way... Async should use transfer, because logically, |
|
Ignoring Async for a moment, and thinking purely about Enumerator - do you think that the internal detail of how Enumerator is working should change the behaviour of user code? Because, to me, I think behaviour of That being said, my first approach was to use transfer, but it turns out to be impossible to use transfer because there is no way to resume the correct fiber. Consider the following: #!/usr/bin/env ruby
require 'fiber'
class Fiberator
def initialize(&block)
@caller = nil
@fiber = Fiber.new(&block)
end
def next
return nil unless @fiber.alive?
@caller = Fiber.current
return @fiber.transfer(self)
end
def << value
@caller.transfer(value)
end
end
e = Fiberator.new do |y|
while true
Fiber.yield
y << 10
end
end
f = Fiber.new do
puts e.next
puts e.next
end
f.resume
f.resume # double resume Once you transfer to another fiber, you MUST transfer back. Otherwise, your assumptions about fiber stack are wrong and you don't know who to resume. So, using Therefore, the only solution is the 2nd commit, which captures Taking into account I did try implementing it here: https://github.com/socketry/async/blob/fiber-transfer/lib/async/scheduler.rb and I might have another go at trying to make it work, but it was tricky to get the right behaviour. I'm also concerned about performance of tracking that in "interpreted" code since by design fiber context switch needs to be fast. I'd rather pay a small cost in Enumerator than a big cost in async for every context switch. |
Impossible? I did it once, and it worked quite well for me: https://gist.github.com/funny-falcon/2023354 The fact "async" uses nested Fiber.yield is a design mistake of "async", and it should not lead to bad decisions in Ruby. I did some thing that were quite close to "async" by features on top of EventMachine, and all attempts to use nested Fiber.yield lead to errors. Use of EM.next_tick always lead to much more composable and managable solution, because symmetric coroutines should be scheduled with symmetric mechanism. |
I am interested in your patch, I will try it out.
Can you explain why you think using |
I've already explained. But I will repeat: Symmetric coroutines should not use assymmetric control switch between them. Assymetric control switch should only be between coroutine and scheduler. Direct switch between coroutines should be only symmetric. Async's Condition, Notification, Queue and Semafore should not use |
But, since It was big mistake to hide |
You mean,
I thought about this design. I wouldn't say it's better or worse. In some ways, it's better, in some ways, it's worse. I understand now what you are talking about though. |
Yep. Because of that many people doesn't consider
It certainly better, because it uses right thing for the task. As I've said, symmetric coroutines should be switched using only symmetric mechanism. Think in another way: when one uses operation systems synchronization instruments, does operation system switches tasks immediately? No, it schedules them for execution. And beside simplicity of implementation, it provides better composability. |
Probably, both Enumerator and async should use |
I understand your explanation.
I understand this. I agree with your reasoning and I think it's a valid concurrency model that is very common. For me, however, another thing to consider is determinism. I think coroutines provide determinism which OS/threads cannot. This is a major benefit because we schedule IO when it's possible, rather than OS which doesn't always know what to do next (i.e. which thread to resume). So, in theory, it's more efficient, because when we call I don't really believe one can say which approach is better. They have different trade-offs IMHO. |
Unfortunately, if you call |
Ah? If you call
|
Yes, that's right, but after transfer, then yield, what do you call resume on to get back? |
That depends on what you mean by "back". There are many "backs". require 'fiber'
queue = []
sched = Fiber.new do
while fib = queue.shift
puts "Schedule #{fib}"
if f=fib.transfer
f.resume # finish fiber
end
end
puts "No tasks to execute"
end
task = lambda do |n|
Fiber.new do
subcoro = Fiber.new do |k|
k = Fiber.yield "#{n}-#{k}-1"
#blocking call
queue << Fiber.current
sched.transfer
#resume
Fiber.yield "#{n}-#{k}-2"
end
puts "task#{n} #{subcoro.resume 1}"
puts "task#{n} #{subcoro.resume 2}"
# task exit
sched.transfer Fiber.current
end
end
queue << task[1] << task[2]
sched.resume
|
Thanks for the example, it's late, I will take a look tomorrow. |
Fixed example a bit: added fiber finalization ( |
Looks like there is a need for |
Yep, I understand. Otherwise, transfer makes (predictable) resume impossible. |
Just FYI (I don't read this thread completely because of many English text...), Generally speaking, |
Thanks for that @ko1 it's really helpful to understand the historical context and how it integrates with the rest of the system. |
But Fiber.yield is not enough. Yes, Therefore, either Offtopic: for me, critical bug is "ensure could be ignored if fiber not returned", ie fiber could be forgotten and garbage collected despite pending ensure block. |
In my C++ implementation, if a fiber goes out of scope but it's not finished, it's automatically resumed and terminated. https://github.com/kurocha/concurrent/blob/master/source/Concurrent/Fiber.cpp#L29-L44 |
Yes. Fiber without
Offtopic too. Yes. I want to solve this issue, but it is difficult to solve it, implementation and compatibility.... |
New coroutine implementation can solve this problem. The next step is pooled fibers, with explicit scope. |
Mew coroutine will not solve external enumerator, that iterated over |
The coroutine implementation exposes a consistent API on which Fibers and other abstractions can be implemented. It can help us solve some of these issues, for example Enumerator might not use Fiber.. it can still use coroutine, but it won't affect fiber stack in any way. |
9d686fb
to
03440f6
Compare
@elct9620 here is the sample code we worked on
|
Another example/repro:
|
It seems to have a conflict now. Could you rebase this from master? |
While this PR still has value, in https://www.codeotaku.com/journal/2020-04/ruby-concurrency-final-report/index I define blocking and non-blocking fibers. Naturally, this avoids the problem because Enumerator's fiber can be defined as blocking. By doing this, no scheduling operation should occur during Considering all possible options, I think modelling blocking/non-blocking makes more sense when we are adding implicit context switches, which is the reason why this was an issue in the first place. In any case, we can revisit this problem in the future if necessary. |
@funny-falcon not sure If I said this elsewhere, but you were totally correct, fiber scheduler should use transfer only. Well, it can be done either way, but it's technically better to use |
@ioquatix , thanks. |
This makes it so that it is possible to call
Fiber.yield
within a Enumerator block.This is breaking
async
when non-blocking IO is used in an Enumerator: socketry/async#23I'm not sure if this is the best solution, but it feels like the right approach.
The general idea is that user should not worry about how Enumerator is implemented,
Fiber.yield
should work as expected.