Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 29270ad99e
Fetching contributors…

Cannot retrieve contributors at this time

66 lines (45 sloc) 3.764 kb

README

StreamLogic can apply set operations to sorted finite streams represented by Enumerables.

Examples

Say you got three files, a, b and c that contains words, one on each line, sorted alphabetically. To get a stream with the words that appear in all three files, do this:

include StreamLogic

s1 = Stream.new { File.open('a') }
s2 = Stream.new { File.open('b') }
s3 = Stream.new { File.open('c') }
(s1 & s2 & s3).each do |line|
  puts line
end

It doesn't matter if the files have ten lines each, or ten million, StreamLogic will not gobble them into memory, but perform the & (logical and, set union) operation streamingly.

An example of the lazyness of StreamLogic is this example that extracts the first four numbers that are divisible by both 3 and 5 from the endless streams of all numbers divisible by 3 and 5, respectively:

def endless_stream_of_numbers_divisible_by(n)
  Enumerator.new do |y|
    counter = 0
    loop do
      y << counter
      counter += n
    end
  end
end

s1 = Stream.new { endless_stream_of_numbers_divisible_by(3) }
s2 = Stream.new { endless_stream_of_numbers_divisible_by(5) }

(s1 & s2).take(4) # => [0, 15, 30, 45]

StreamLogic also supports the | (logical or, set union) operator:

people_a_to_m = Stream.new { File.open('big_list_of_people_a_through_m') }
people_n_to_z = Stream.new { File.open('big_list_of_people_n_through_z') }
some_other_people = Stream.new { File.open('another_big_list_of_people') }
((people_a_to_m | people_n_to_z) & some_other_people).each do |name|
  puts name
end

The code above will print all names in the file "another_big_list_of_people" that appear in either "big_list_of_people_a_through_m" or "big_list_of_people_n_through_z".

Sometimes logic operations sound like the opposite of natural language, so people_a_to_m | people_n_to_z is read as "people_a_to_m OR people_n_to_z", but means the items in "people_a_to_m" and the items in "people_n_to_z". Another way to say the same thing is as I did above, "[the items] that appear in either 'big_list_of_people_a_through_m' or 'big_list_of_people_n_through_z'", so the operation | can be described using either "or" or "and". Language is hard.

If you want to get a merged stream but not filter out duplicates you can use the + operator, and if you want to find the elements that appear only in one of the streams, use -:

s1 = StreamLogic::Stream.new(%w[phil sam steve sue])
s2 = StreamLogic::Stream.new(%w[anne phil sue vincent])
(s1 - s2).to_a # => ['anne', 'sam', 'steve', 'vincent']

Notes

  • Create a Stream by passing an Enumerable as parameter to new, or give a block that returns an Enumerable. The latter is useful when you need to restore some state before reusing the Enumerable (IOs like Files for example).
  • The Enumerables must respond to #to_enum.
  • It's very important that the Enumerables are sorted.
  • The resulting streams will contain only unique items, with the exception of the + operator.

Performance

StreamLogic relies heavily on external iteration. In Ruby external iteration is usually performed with Enumerable#next, but it's not very performant (see discussion here: http://www.ruby-forum.com/topic/196086). Where possible StreamLogic will perform external iteration using mechanisms like IO#gets, Array#[]. If you need to pass in some Enumerable that does not provide an efficient external iteration mechanism, you can implement your own, just include StreamLogic::ExternalEnumeration and implement #next_element to return the next element on each invocation, and :stop_iteration when the end has been reached.

Jump to Line
Something went wrong with that request. Please try again.