"How many Ruby fibers does it take to screw in a lightbulb?"
Add this line to your application's Gemfile:
gem 'stepladder'
And then execute:
$ bundle
Or install it yourself:
$ gem install steplader
And then use it:
require 'stepladder'
As of version 0.2.0 there is now a DSL which provides convenient shorthand for several common types of workers. Let's look at them.
But first, be sure to include the DSL mixin:
include Stepladder::DSL
At the headwater of every Stepladder pipeline, there is a source worker which is able to generate its own work items.
Here is possibly the simplest of source workers, which provides the same string each time it is invoked:
worker = source_worker { "Number 9" }
worker.shift #=> "Number 9"
worker.shift #=> "Number 9"
worker.shift #=> "Number 9"
# ...ad nauseum...
Sometimes you want a worker which generates until it doesn't:
counter = 0
worker = source_worker do
if counter < 3
counter += 1
counter + 1000
end
end
worker.shift #=> 1001
worker.shift #=> 1002
worker.shift #=> 1003
worker.shift #=> nil
worker.shift #=> nil (and will be nil henceforth)
If all you want is a source worker which generates a series of numbers the DSL provides an easier way to do it:
w1 = source_worker [0,1,2]
w2 = source_worker (0..2)
w1.shift == w2.shift #=> 0
w1.shift == w2.shift #=> 1
w1.shift == w2.shift #=> 2
w1.shift == w2.shift #=> nil (and henceforth, etc.)
This is perhaps the most "normal" kind of worker in Stepladder. It accepts a value and returns some transformation of that value:
squarer = relay_worker do |number|
number ** 2
end
Of course a relay worker needs a source from which to get values upon which to operate:
source = source_worker (0..3)
squarer.supply = source
Or, if you prefer, the DSL provides a vertical pipe for linking the workers together into a pipeline:
pipeline = source | squarer
pipeline.shift #=> 0
pipeline.shift #=> 1
pipeline.shift #=> 4
pipeline.shift #=> 9
pipeline.shift #=> nil
As we travel down the pipeline, from time to time, we will want to
stop and smell the roses, or write to a log file, or drop a beat, or
create some other kind of side-effect. That's the purpose of the
side_worker
. A side worker will pass through the same value that
it received, but as it does so, it can perform some kind side-effect
work.
source = source_worker (0..3)
evens = []
even_stasher = side_effect do |value|
if value % 2 == 0
evens << value
end
end
# re-using "squarer" from above example...
pipeline = source | even_stasher | squarer
pipeline.shift #=> 0
pipeline.shift #=> 1
pipeline.shift #=> 4
pipeline.shift #=> 9
pipeline.shift #=> nil
evens #=> [0, 2]
Notice that the output is the same as the previous example, even though
we put this even_stasher
side_worker
in the middle of the pipeline.
However, we can also see that the evens
array now contains the even numbers from source
(the side effect).
But wait, you want to say, can't we still create side effects with regular ole relay workers? Why, yes. Yes you can. Ruby being what it is, there really isn't a way to prevent the implementation of any worker from creating side effects.
And still wait, you'll also be wanting to say, isn't it possible that a side worker could mutate the value as it's passed through? And again, yes. It would be very difficult* in the Ruby language (where so many things are passed by reference) to perfectly prevent a side worker from mutating the value. But please don't.
The side effect worker's purpose is to provide intentionality and clarity. When you're creating a side effect, let it be very clear. And don't do regular, pure functional style work in the same worker.
This will make side effects easier to troubleshoot; if you pull them out of the pipeline, the pipeline's output shouldn't change. By the same token, if there is a problem with a side effect, troubleshooting it will be much simpler if the side effects are already isolated and named.
Another measure for preventing side workers from creating unwanted
side-effects is to use :hardened
mode. That's right, side_worker
has a mode (which defaults to :normal
). When set to :hardened
,
each value is Marshal.dump
ed and Marshal.load
ed before being
passed to the side worker's callable. This helps to ensure that
the original value will be passed through to the next worker in the
queue in a pristine state (unmodified), and that no future or
subsequent modification can take place.
Given a source...
source = source_worker [[:foo], [:bar]]
Consider this unsafe, unhardened side worker:
unsafe = side_worker { |v| v << :boo }
This produces unwanted mutations, because of the <<
.
pipeline = source | unsafe
pipeline.shift #=> [:foo, :boo] <-- Oh noes!
pipeline.shift #=> [:bar, :boo] <-- Disaster!
pipeline.shift #=> nil
Let's harden it:
unsafe = side_worker(:hardened) { |v| v << :boo }
The :hardened
side worker doesn't have access to the original
value, and therefore its shenanigans are moot with respect to
the main workflow:
pipeline = source | unsafe
pipeline.shift #=> [:foo] <-- No changes
pipeline.shift #=> [:bar] <-- Much better
pipeline.shift #=> nil
The filter worker simply passes through the values which are given to it, but only those values which result in truthiness when your provided callable is run against it. Values which result in falsiness are simply discarded.
source = source_worker (0..5)
filter = filter_worker do |value|
value % 2 == 0
end
pipeline = source | filter
pipeline.shift #=> 0
pipeline.shift #=> 2
pipeline.shift #=> 4
pipeline.shift #=> nil
The batch worker gathers outputs into batches and the returns each batch as its output.
source = source_worker (0..7)
batch = batch_worker gathering: 3
pipeline = source | batch
pipeline.shift #=> [0, 1, 2]
pipeline.shift #=> [3, 4, 5]
pipeline.shift #=> [6, 7]
pipeline.shift #=> nil
Notice how the final batch doesn't have full compliment of three.
Fixed-numbered batch workers are useful for things like pagination, perhaps, but sometimes we don't want a batch to include a specific number of items, but to batch together all items until a certain condition is met. So batch worker can also accept a callable:
source = source_worker [
"some", "rain", "must\n", "fall", "but", "ok" ]
line_reader = batch_worker do |value|
value.end_with? "\n"
end
pipeline = source | line_reader
pipeline.shift #=> ["some", "rain", "must\n"]
pipeline.shift #=> ["fall", "but", "ok"]
pipeline.shift #=> nil
The splitter worker accepts a value from its supply, and generates an array and then successively returns each element of the array. Once the array has been expended, the splitter worker appeals to its supplier for another value and the process repeats.
source = source_worker [
'A bold', 'move westward' ]
splitter = splitter_worker do |value|
value.split(' ')
end
pipeline = source | splitter
pipeline.shift #=> 'A'
pipeline.shift #=> 'bold'
pipeline.shift #=> 'move'
pipeline.shift #=> 'westward'
pipeline.shift #=> nil
The trailing worker accepts a value, and returns an array containing the last n values. This worker could be useful for things like rolling averages.
No values are returned until n values had been accumulated. New values are unshifted into the zero-th position in the array of trailing values, so that a given value will graduate through the array until it reaches the last position and then subsequently removed.
Maybe it's easiest to just give an example:
source = source_worker (0..5)
trailer = trailing_worker 4
pipeline = source | trailer
pipeline.shift #=> [3,2,1,0]
pipeline.shift #=> [4,3,2,1]
pipeline.shift #=> [5,4,3,2]
pipeline.shift #=> nil
Note that as soon as a nil is received from the supplying queue, the trailing worker simply provides a nil.
Stepladder grew out of experimentation with Ruby fibers, after readings Dave Thomas' demo of Ruby fibers, wherein he created a pipeline of fiber processes, emulating the style and syntax of the *nix command line. I noticed that, courtesy of fibers' extremely low surface area, fiber-to-fiber collaborators could operate with extremely low coupling. That was the original motivation for creating the framework.
After playing around with the new framework a bit, I began to notice other interesting characteristics of this paradigm.
Suppose we are performing several operations on the members of a largish collection. If we daisy-chain enumerable operators together (which is so easy and fun with Ruby) we will notice that if something goes awry with item number two during operation number seven, we nonetheless had to wait through a complete run of all items through operations 1 - 6 before we receive the bad news early in operation seven. Imagine if the first six operations take a long time to each complete? Furthermore, what if the operations on all incomplete items must be reversed in some way (e.g. cleaned up or rolled back)? It would be far less messy and far more expedient if each item could be processed though all operations before the next one is begun.
This is the design paradigm which stepladder makes easy. Although all the workers in your assembly line can be coded in the same context (which is one of the big selling points of the daisy-chaining of enumerable methods,incidentally), you also get the advantage of passing each item though the entire op-chain before starting the next.
Because stepladder workers use fibers as their basic API, the are almost unaware of the outside world. And they can pretty much be written as such. At the heart of each Stepladder worker is a task which you provide, which is a callable ruby object (such as a Proc or a lambda).
The scope of the work is whatever scope existed in the task when you
initially created the worker. And if you want a worker to maintain its
own internal state, you can simply include a loop within the worker's
task, and use the #handoff
method to pass along the worker's product at
the appropriate point in the loop.
For example:
realestate_maker = Stepladder::Worker.new do
oceans = %w[Atlantic Pacific Indiana Arctic]
previous_ocean = "Atlantic"
while current_ocean = oceans.sample
drain current_ocean #=> let's posit that this is a long-running async process!
handoff previous_ocean
previous_ocean = current_ocean
end
Anything scoped to the outside of that loop but inside the worker's task will essentially become that worker's initial state. This means we often don't need to bother with instance variables, accessors, and other features of classes which deal with maintaining instances' state.
Despite the fact that the work is done in separate threads, the scope of the work is the scope of the coordinating body of code. This means that even the remaining coupling in such systems --which is primarily just common objects coupling-- this little remaining coupling is mitigated by the possibility of having coupled code live together. I call this feature the folded collaborators effect.
Consider the following code vaporware:
SUBJECT = 'kitteh'
include Stepladder::DSL
tweet_getter = source_worker do
twitter_api.fetch_my_tweets
end
about_me = filter_worker do |tweet|
tweet.referenced.include? SUBJECT
end
tweet_formatter = relay_worker do |tweet|
apply_format_to tweet
end
kitteh_tweets = tweet_getter | about_me | tweet_formatter
while tweet = kitteh_tweets.shift
display(tweet)
end
None of these tasks have hard coupling with each other. If we were to insert another worker between the filter and the formatter, neither of those workers would need changes in their code, assuming the inserted worker plays nicely with the objects they're all passing along.
Which brings me to the point: these workers to have a dependency upon the objects they're handing off and receiving. But we have the capability to coordinate those workers in a centralized location (such as in this code example).
The Stepladder::Worker documentation has been moved here.