Home

levand edited this page Sep 13, 2010 · 2 revisions
Clone this wiki locally

Welcome to the tubes wiki!

This is all about Inputters, Transforms and Outputters

Project Description

(This is a starting point for a project description by Keith):

The Clojure Pipes Project is similar to Yahoo Pipes, except that:

  1. All software can be self-hosted, enabling the protection and privacy of proprietary data.
  2. Components are written in the Clojure language.

Components

A number of useful components will be shipped standard with the product, but special purpose components can be created by users in a reasonably simple way. It should also be possible for users to easily share their custom components.

Components are either inputters, transformers, or outputters. An inputter is defined as the source component of data in a given pipeline; an outputter is defined as the destination component in a given pipeline; and a transformer is anything that is not an inputter or an outputter.

Any component that is not an outputter will provide a source of data in some internal format, probably a Clojure lazy sequence of map objects.

Where possible and practical, components will wrap existing open source high quality Java libraries, rather than reinventing the wheel. On the other hand, there may be cases where a Clojure solution would be superior enough to warrant the additional work of a rewrite.

In general, software should be written in Clojure wherever possible, and Java in other cases. Only in exceptionally compelling cases should software be written in other languages such as JRuby.

A component’s implementation will probably be a function. Inputters and transformers will probably return a lazy sequence. Transformers and outputters will probably read from the lazy sequence provided by the previous component in the chain.

Data Flow

Data is processed record by record. Many transformers and outputters will operate on a single record without needing the context of the others; others will require access to a subset or the entire set of records at once (e.g. a sorter component).

Records will be hashes, at least conceptually but possibly also in implementation.

Records will travel in only one direction, from the inputter towards the outputter.

Product Focus

Our focus is the 80% of use cases requiring simpler processing, rather than supporting complex data flows that would be better supported by existing open source and commercial products. The target user is someone who wants to quickly deliver custom mashups, rather than do heavy processing of large amounts of back-end data.

In addition to the correct handling of those 80%, our goal is to encourage widespread adoption by providing a framework that is easy to use.

However, we will start with the simplest cases (far less than 80%), and expand outward from there.

Example Applications

  1. RSS to HTML generator
  2. RSS Aggregator with date sorting
  3. CSV to RSS Aggregator
  4. RFC 3164 to RSS 2.0
  5. Mashups combining data from proprietary & public sources