Simple MapReduce that melt my brain (yes, fibers there) #221
Unanswered
Inversion-des
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It started with a simple task.
We have a structure of nested dirs with files. What we need is:
Dirs on the same level are independent, so it is a good opportunity for parallelization (while IO blocks one thread, we can calculate checksum in others or we can even do parallel calculations with Ractors).
Now the problem: how to organize concurrency/parallelization? (I like to do parallel things with Ruby)
If you will try to "keep it simple", you will start by creating a thread for each sub dir and join all those threads after, to get the results.
This simple solution has two major problems:
ThreadError: can't create Thread: Resource temporarily unavailable
).A better solution is to have a workers pool (adjusted to the number of CPU cores) and a single tasks queue.
But still, there is a problem: when a thread takes a task and in this task, it adds more tasks to the queue and will need to wait for results — this worker becomes blocked. So if you have 2 workers and 3 nested dirs: after processing 2 levels, there will be no free worker to process level 3.
At this point I made my life harder: I started to think "how can we have a worker that, after adding new tasks to the queue, does not become blocked but returns to loop and works on other tasks and at some point returns to this blocked task and checks if there are some results ready?".
From this idea description, we can feel that it is concurrency within a thread and it is possible to do in Ruby with Fibers.
So I started to work on the implementation and this was a very tough time for my brain because concurrency is hard… few times I gave up with the conclusion that it is just impossible…
But eventually, it seems like I have a working solution:
Here is a mind map (use a dark theme) that helped me a lot (a screenshot of it is at the bottom of this post).
And here is a playground where you can hit Run to see how it works and can edit code. Code notes:
((( ... )))
to mark a place in the code, where the thread becomes blocked or where it jumps to work on another task via fibers.So, my questions to the Ruby community are:
Beta Was this translation helpful? Give feedback.
All reactions