Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
570 lines (380 sloc) 23.1 KB

pipe-iterators

Like underscore for Node streams (streams2 and up).

Functions for iterating over object mode streams:

Installation

npm install --save pipe-iterators

Preamble:

var pi = require('pipe-iterators');

Changelog

v1.3.0: Updated dependencies to more recent versions - thanks @asgoth!

v1.2.0: added pi.fromAsync(callable)

v1.1.0:

  • added the merge, forkMerge, matchMerge and parallel functions.
  • fixed a bug in pipeline.

Iteration functions

The iterator functions closely follow the native Array.* iteration API (e.g. forEach, map, filter), but the functions return object mode streams instead of operating on arrays.

forEach

pi.forEach(callback, [thisArg])

Returns a duplex stream which calls a function for each element in the stream. callback is invoked with two arguments - obj (the element value) and index (the element index). The return value from the callback is ignored.

If thisArg is provided, it is available as this within the callback.

pi.fromArray(['a', 'b', 'c'])
  .pipe(pi.forEach(function(obj) { console.log(obj); }));

map

pi.map(callback, [thisArg])

Returns a duplex stream which produces a new stream of values by mapping each value in the stream through a transformation callback. The callback is invoked with two arguments, obj (the element value) and index (the element index). The return value from the callback is written back to the stream.

If thisArg is provided, it is available as this within the callback.

Note: if you return null from your map function, core streams will interpret this as EOF for the stream.

pi.fromArray([{ a: 'a' }, { b: 'b' }, { c: 'c' }])
  .pipe(pi.map(function(obj) { return _.defaults(obj, { foo: 'bar' }); }));

reduce

pi.reduce(callback, [initialValue])

Reduce returns a duplex stream which boils down a stream of values into a single value. initialValue is the initial value of the reduction, and each successive step of it should be returned by the callback. The callback is called with three arguments: prev (the accumulator value), curr (the current value) and index (the index).

When the input stream ends, the stream emits the value in the accumulator.

If initialValue is not provided, then prev will be equal to the first value in the array and curr will be equal to the second on the first call.

pi.fromArray(['a', 'b', 'c'])
  pipe(pi.reduce(function(posts, post) { return posts.concat(post); }, []));

filter

pi.filter(callback, [thisArg])

Returns a duplex stream which writes all values that pass (return true for) the test implemented by the provided callback function.

The callback is invoked with two arguments, obj (the element value) and index (the element index). If the callback returns true, the element is written to the next stream, otherwise the element is filtered out.

pi.filter(function(post) { return !post.draft; })

mapKey

pi.mapKey(key, callback, [thisArg])
pi.mapKey(hash, [thisArg])

Returns a duplex stream which produces a new stream of values by mapping a single key (when given key and callback) or multiple keys (when given hash) through a transformation callback. The callback is invoked with three arguments: value (the value element[key]), obj (the element itself) and index (the element index). The return value from the callback is set on the element, and the element itself is written back to the stream.

If thisArg is provided, it is available as this within the callback.

pi.fromArray([{ path: '/a/a' }, { path: '/a/b' }, { path: '/a/c' }])
  .pipe(pi.mapKey('path', function(p) { return p.replace('/a/', '/some/'); }));

You can also call the mapKey with a hash:

pi.mapKey({
  a: function(value) { /* ... */},
  b: 'str',
  c: true
})

Each key in the hash is replaced with the return value of the function in the hash. When the value in the hash is not a function, it is simply assigned as the new value for that key.

Input and output

These utility functions make it easy to provide input into a stream or capture output from a stream.

fromArray

pi.fromArray(arr)

Returns a readable stream given an array. The stream will emit one item for each item in the array, and then emit end.

toArray

pi.toArray(callback)
pi.toArray(array)

Returns a writable stream which buffers the input it receives into an array. When the stream emits end, the callback is called with one parameter - the array which contains the input elements written to the the stream.

You can also pass an instance of an array instead of a callback. The array's contents will be updated with the elements from the stream when the writable stream emits finish.

fromAsync

pi.fromAsync(fn)

Returns a readable stream given an async function. (since v1.2.0)

The async function should accept one argument, onDone, which is a function(err, results). The function is called once - the first time someone reads from the stream. It should return either a single result, or an array of results.

The stream will emit one item for each item in the result (the single result, or each array item individually), and then emit end.

Constructing streams

These functions make creating readable, writable and transform streams a bit less boilerplatey.

thru & through

pi.thru([options], [transformFn], [flushFn]);
pi.thru.obj([transformFn], [flushFn]);
pi.thru.ctor([options], [transformFn], [flushFn]);

Returns a Transform stream given a set of options, a transformFn and flushFn. You can call this function as pi.through or pi.thru. This uses the through2 module, so you should take a look at the documentation for that module. In short:

  • The options hash is passed to stream.Transform to construct the stream. See the core docs.
  • The transformFn has the signature: function (chunk, encoding, onDone) {}. See the core docs for details.
  • The flushFn has the signature function(onDone). See the core docs for details.
  • thru.obj(fn) is a convenience wrapper around thru({ objectMode: true }, fn).
  • thru.ctor() returns a constructor for a custom Transform. This is useful when you want to use the same transform logic in multiple instances.

BTW, if you need parallel execution but with the same API as a thru stream, check out parallel in the control flow section.

writable

pi.writable([options], writeFn)
pi.writable.obj(writeFn)
pi.writable.ctor([options], writeFn)

Returns a Writable stream given a set of options and a writeFn.

Has the same options as thru:

  • The options hash is passed to stream.Writable to construct the stream. See the core docs.
  • The writeFn has the signature: function(chunk, encoding, callback) {}. See the core docs for details.
  • writable.obj() is a convenience wrapper for writable({ objectMode: true }).
  • writable.ctor() returns a constructor for the writable stream.

readable

pi.readable([options], [readFn])
pi.readable.obj([readFn])
pi.readable.ctor([options], [readFn])

Returns a Readable stream given a set of options and a readFn.

Has the same options as thru:

  • The options hash is passed to stream.Readable to construct the stream. See the core docs.
  • The readFn has the signature: function(size) {}. See the core docs for details.
  • readable.obj() is a convenience wrapper for readable({ objectMode: true }).
  • readable.ctor() returns a constructor for the readable stream.

duplex

pi.duplex([options], writeFn, readFn)
pi.duplex.obj(writeFn, readFn)
pi.duplex.ctor([options], writeFn, readFn)

Returns a Duplex stream given a set of options, a writeFn and a readFn.

Has the same options as thru:

  • The options hash is passed to stream.Duplex to construct the stream. See the core docs.
  • The writeFn has the signature: function(chunk, encoding, callback) {}. See the core docs for details.
  • The readFn has the signature: function(size) {}. See the core docs for details.
  • duplex.obj() is a convenience wrapper for duplex({ objectMode: true }).
  • duplex.ctor() returns a constructor for the duplex stream.

combine

pi.combine(writableStream, readableStream)

Takes a readable stream and a writable stream and returns a duplex stream.

Note: the two streams ARE NOT piped together. If you want to construct a pipeline with multiple streams, you can, but you need to perform the pipe operations yourself (or use the .pipeline function instead). This makes .combine work with streams where the connections is not via a pipe mechanism, like with child_process.spawn:

var child = require('child_process').spawn('wc', ['-c']);
pi.fromArray(['a', 'b', 'c'])
  .pipe(pi.combine(child.stdin, child.stdout))
  .pipe(process.stdout);

Listeners for the error event will receive errors that are emitted in either stream, or that are emitted as a result of piping into the duplex stream.

devnull

pi.devnull()

Returns a writable stream which consumes any input and produces no output. Useful for consuming output from duplex streams when prototyping or when you want to run the processing but discard the final output.

cap

pi.cap(duplex)

Returns a writable stream given a duplex stream. Any input written into the stream is written to the duplex stream.

clone

pi.clone()

Returns a duplex stream. Inputs written to the stream are cloned and then written out. This is useful if you need to ensure that concurrent modifications to objects written into multiple streams do not influence each other.

Control flow

These functions allow you to write more advanced streams, going from one linear sequence of transformation steps to multiple pipelines.

fork

pi.fork(stream1, [stream2], [...])
pi.fork([ stream1, stream2, ... ])

Returns a duplex stream. Inputs written to the stream are written to all of the streams passed as arguments to fork.

Every forked stream receives a clone of the original input object. Cloning prevents annoying issues that might occur when one fork stream modifies an object that is shared among multiple forked streams.

Also accepts a single array of streams as the first parameter.

Listeners for the error event on the stream returned from fork will receive errors that are emitted in any of the streams in passed to the function.

match

pi.match(condition1, stream1, [condition2], [stream2], [...], [rest])
pi.match([ condition1, stream1, condition2, stream2, ..., rest ])

Allows you to construct if-else style conditionals which split a stream into multiple substreams based on a condition.

Returns a writable stream given a series of condition function and stream pairs. When elements are written to the stream, they are matched against each condition function in order.

The condition function is called with two arguments - obj (the element value) and index (the element index). If the condition returns true, the element is written to the associated stream and no further matches are performed.

The last argument, rest is optional. It should be a writable stream (without a preceding condition function). Any elements not matching the other conditions will be written into it.

Listeners for the error event on the stream returned from match will receive errors that are emitted in any of the streams in passed to the function.

pi.fromArray([
  { url: '/people' },
  { url: '/posts/1' }, { url: '/posts' },
  { url: '/comments/2' }])
  .pipe(pi.match(
    function(req) { return /^\/people.*$/.test(req.url); },
    pi.pipeline(
        pi.forEach(function(obj) { console.log('person!', obj); }),
        pi.devNull()
    ),
    function(req) { return /^\/posts.*$/.test(req.url); },
    pi.pipeline(
        pi.forEach(function(obj) { console.log('post!', obj); }),
        pi.devNull()
    ),
    pi.pipeline(
        pi.forEach(function(obj) { console.log('other:', obj); }),
        pi.devNull()
    )
  ));

merge

pi.merge(stream1, [stream2], [...])
pi.merge([ stream1, stream2, ... ])

Takes multiple readable streams and merges them into one stream. Accepts any number of readable streams and returns a duplex stream.

forkMerge

pi.forkMerge(stream1, [stream2], [...])
pi.forkMerge([ stream1, stream2, ... ])

Fork followed by merge on a set of streams. Accepts any number of duplex streams; returns a duplex stream that:

  • forks each input, writes each input into the streams,
  • reads and merges the inputs from the streams and writes them out

Useful if you need to concurrently apply different operations on a single input but want to produce a single merged output.

           / to-html() \
read .md() - to-pdf()  - write-to-disk()
           \ to-rtf()  /

For example, imagine converting a set of Markdown files into the HTML, PDF and RTF formats - the same file goes in, each of the processing operations are applied, but at the end there are three objects (binary files in the different formats) that go into the same "write to disk" pipeline.

matchMerge

pi.matchMerge(condition1, stream1, [condition2], [stream2], [...], [rest])
pi.matchMerge([ condition1, stream1, condition2, stream2, ..., rest ])

Match followed by merge on a set of streams. Accepts any number of duplex streams; returns a duplex stream that:

  • matches conditions, selects the correct stream and writes to that stream
  • reads and merges the inputs from each of the streams and writes them out

Useful if you want to conditionally process some elements differently, while sharing the same downstream pipeline.

For example, if you want to first check a cache and skip some processing for items that hit in the cache, you could do something like pi.matchMerge(checkCache, getResultFromCache, performFullProcessing) (where checkCache is a function and the other two are through streams).

parallel

pi.parallel(limit, [transformFn], [flushFn])

Returns a object-mode Transform stream given a limit, a transformFn and flushFn. Works like a through.obj stream but:

  • the transformFn can be launched multiple times in parallel, with up to limit tasks running at the same time
  • the flushFn is only called after both 1) the thru-stream is instructed to end AND 2) all the tasks have been completed.
  • the stream emits the following events:
    • "done": emitted after each transformFn execution completes
    • "empty": emitted when the execution queue becomes empty

The usual thru-stream conventions apply:

  • The transformFn has the signature: function (chunk, encoding, onDone) {}. See the core docs for details.
  • The flushFn has the signature function(onDone). See the core docs for details.

Both transformFn and flushFn are optional. If the transformFn is not provided, then it defaults to:

function(task, enc, done) { task.call(this, done); }

which works nicely if the items in your stream are something like:

pi.fromArray([
    function(done) { this.push(1); done(); },
    function(done) { this.push(2); done(); }
  ])
  .pipe(pi.parallel(2))
  .pipe(pi.toArray(function(result) {
    assert.deepEqual(result.sort(), [1, 2]);
  }));

Note how each task runs with this set to the parallel stream, which means you can push results out. Similar to normal core streams, the done function can return one argument - err. If you need to process the other arguments, define your own transformFn.

Of course, you don't have to use callback functions just to get parallel processing - any task, even a basic thru stream like:

pi.parallel(16, function(filename, enc, done) {
  var self = this;
  fs.stat(filename, function(err, result) {
    self.push(result); done();
  })
});

will execute up to 16 stat calls at a time with parallel.

Note that you can safely call this.write() from within the transform function to add more tasks to run - this can be useful if your task processing causes more tasks to need to run. If you need the new payloads to go through some upstream processing, you can might consider writing to another stream that precedes parallel, provided you haven't ended that stream yet.

Constructing pipelines from individual elements

These functions apply pipe in various ways to make it easier to go from an array of streams to a pipeline.

pipe()

pi.pipe(stream1, [stream2], [...])
pi.pipe([ stream1, stream2, ...])

Given a series of streams, calls .pipe() for each stream in sequence and returns an array which contains all the streams. Used by head() and tail().

Also accepts a single array of streams as the first parameter.

head()

pi.head(stream1, [stream2], [...])
pi.head([ stream1, stream2, ... ])

Given a series of streams, calls .pipe() for each stream in sequence and returns the first stream in the series.

Also accepts a single array of streams as the first parameter.

Similar to a.pipe(b).pipe(c), but .head() returns the first stream (a) rather than the last stream (c).

tail()

pi.tail(stream1, [stream2], [...])
pi.tail([ stream1, stream2, ... ])

Given a series of streams, calls .pipe() for each stream in sequence and returns the last stream in the series.

Also accepts a single array of streams as the first parameter.

Just like calling a.pipe(b).pipe(c).

pipeline

pi.pipeline(stream1, stream2, ...)
pi.pipeline([ stream1, stream2, ... ])

Constructs a pipeline from a series of streams. Always returns a single stream object, which is either duplex or writable. Pipelines are series of streams that either:

  • start with a duplex stream and end with a duplex stream or
  • start with a duplex stream and end with a writable stream

Given a pipeline that starts with a duplex stream and ends with a duplex stream, pipeline returns a single duplex stream in which any writes go the first stream and any reads/pipes etc. are done from the last stream.

Normally, when just manually applying pipe you have to pick whether to return the first stream or the last stream in the pipeline. Returning the first stream has the benefit that writes to it will correctly go into the pipeline, but of course any reads/pipes from it will skip the rest of the pipeline. Returning the last stream has the opposite problem: you can read from the pipeline but cannot pipe to the first stream anymore. With pipeline you don't need to choose, since the return result works as you would expect.

Given a pipeline that starts with a duplex stream and ends with a writable stream, pipeline returns a single writable stream in which any writes go the first stream. Since the last stream in the pipeline is writable but not readable, the pipeline is also only writable but not readable. This helps stop errors where you accidentally pipe the first stream of a pipeline out (which will not work as expected since outputs do not pass through the whole pipeline).

With .pipeline(), writes into the return value go to the first stream but reads (and pipe calls) are applied to the last value in the stream:

module.exports = function() {
  return pi.pipeline(a, a2, a3);
};

works as expected and input.pipe(myPipeline).pipe(b) writes to a but reads from a3.

Listeners for the error event on the stream returned from pipeline will receive errors that are emitted in any of the streams in passed to the function.

Checking stream instances

These functions are like rvagg/isstream, but they work correctly on Node 0.8. The main differences are that 1) the 0.8 core streams from things like fs and child_process are correctly detected and 2) the functions use duck typing (checking for conformance to an API) rather than instanceof checks which can be problematic in a browser environment or when using modules that are compatible from an API perspective but do not descend from the native stream.

isStream

pi.isStream(obj)

Returns true if a stream provides either the Readable stream interface or the Writable stream interface.

isReadable

pi.isReadable(obj)

Returns true if a stream provides the Readable stream interface.

isWritable

pi.isWritable(obj)

Returns true if a stream provides the Writable stream interface.

isDuplex

pi.isDuplex(obj)

Returns true if a stream provides both the Readable and Writable stream interfaces.

What about asynchronous iteration?

Meh, through2 streams already make writing async iteration quite easy.

What about splitting strings?

Best handled by something that can do that in an efficient manner, such as binary-split.

Related