Lazy set operations on sorted streams
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



StreamLogic can apply set operations to sorted finite streams represented by Enumerables.


Say you got three files, a, b and c that contains words, one on each line, sorted alphabetically. To get a stream with the words that appear in all three files, do this:

include StreamLogic

s1 = {'a') }
s2 = {'b') }
s3 = {'c') }
(s1 & s2 & s3).each do |line|
  puts line

It doesn't matter if the files have ten lines each, or ten million, StreamLogic will not gobble them into memory, but perform the & (logical and, set union) operation streamingly.

An example of the lazyness of StreamLogic is this example that extracts the first four numbers that are divisible by both 3 and 5 from the endless streams of all numbers divisible by 3 and 5, respectively:

def endless_stream_of_numbers_divisible_by(n) do |y|
    counter = 0
    loop do
      y << counter
      counter += n

s1 = { endless_stream_of_numbers_divisible_by(3) }
s2 = { endless_stream_of_numbers_divisible_by(5) }

(s1 & s2).take(4) # => [0, 15, 30, 45]

StreamLogic also supports the | (logical or, set union) operator:

people_a_to_m = {'big_list_of_people_a_through_m') }
people_n_to_z = {'big_list_of_people_n_through_z') }
some_other_people = {'another_big_list_of_people') }
((people_a_to_m | people_n_to_z) & some_other_people).each do |name|
  puts name

The code above will print all names in the file "another_big_list_of_people" that appear in either "big_list_of_people_a_through_m" or "big_list_of_people_n_through_z".

Sometimes logic operations sound like the opposite of natural language, so people_a_to_m | people_n_to_z is read as "people_a_to_m OR people_n_to_z", but means the items in "people_a_to_m" and the items in "people_n_to_z". Another way to say the same thing is as I did above, "[the items] that appear in either 'big_list_of_people_a_through_m' or 'big_list_of_people_n_through_z'", so the operation | can be described using either "or" or "and". Language is hard.

If you want to get a merged stream but not filter out duplicates you can use the + operator, and if you want to find the elements that appear only in one of the streams, use -:

s1 =[phil sam steve sue])
s2 =[anne phil sue vincent])
(s1 - s2).to_a # => ['anne', 'sam', 'steve', 'vincent']


  • Create a Stream by passing an Enumerable as parameter to new, or give a block that returns an Enumerable. The latter is useful when you need to restore some state before reusing the Enumerable (IOs like Files for example).
  • The Enumerables must respond to #to_enum.
  • It's very important that the Enumerables are sorted.
  • The resulting streams will contain only unique items, with the exception of the + operator.


StreamLogic relies heavily on external iteration. In Ruby external iteration is usually performed with Enumerable#next, but it's not very performant (see discussion here: Where possible StreamLogic will perform external iteration using mechanisms like IO#gets, Array#[]. If you need to pass in some Enumerable that does not provide an efficient external iteration mechanism, you can implement your own, just include StreamLogic::ExternalEnumeration and implement #next_element to return the next element on each invocation, and :stop_iteration when the end has been reached.