Skip to content

Commit

Permalink
Updated README and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
chriso committed Nov 17, 2010
1 parent 01eb509 commit 8f797fb
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 21 deletions.
33 changes: 21 additions & 12 deletions README.md
@@ -1,27 +1,36 @@
# [node.io](http://node.io/)

A distributed data scraping and processing engine for [Node.js](http://nodejs.org/)

To install node.io, use [npm](http://github.com/isaacs/npm):

$ npm install node.io

For usage details, run

$ node.io --help

## What is [node.io](http://node.io/)?

node.io is a framework for scraping and processing data. A node.io job typically consists of a) taking some input, b) using or transforming it, and c) outputting something.

node.io can simplify the process of:

- Filtering / sanitizing a list
- MapReduce
- Loading a list of URLs and scraping and saving some data from each
- Parsing log files
- Transforming data from one format to another, e.g. from CSV to a database
- Recursively load all files in a directory and it's subdirectories and execute a command on each file

## Why node.io?

- Create modular and extensible jobs for scraping and processing data
- Seamlessly distribute work among child processes and other servers (soon)
- Written in Node.js == FAST
- Handles a variety of input / output
- Written in Node.js and Javascript - jobs are concise, asynchronous and FAST
- Speed up execution by distributing work among child processes and other servers (soon)
- Easily handle a variety of input / output situations
* Reading / writing lines to and from files
* Reading all files in a directory (and recursing if specified)
* To / from a database
* Reading all files in a directory (and optionally recursing)
* Reading / writing rows to and from a database
* STDIN / STDOUT
* Piping between other node.io jobs
* Custom IO / any combination of the above
* Any combination of the above, or completely custom IO
- Includes a robust framework for scraping and selecting web data
- Support for a variety of proxies when making requests
- Includes a data validation and sanitization framework
Expand All @@ -31,7 +40,7 @@ For usage details, run

Initial documentation is [available here](https://github.com/chriso/node.io/tree/master/docs/).

Better documentation will be available once I have time to write it. See [http://node.io/](http://node.io/) for updates.
Better documentation will be available once I have time to write it.. See [http://node.io/](http://node.io/) for updates.

## Examples

Expand Down
12 changes: 3 additions & 9 deletions docs/README.md
@@ -1,20 +1,14 @@
node.io executes jobs in the following format.

job.js
A node.io job takes the following format

var Job = require('node.io').Job;

var options = {}, methods = {};

exports.job = new Job(options, methods);

To run job.js from the command line, run the following command in the same directory:
To run this job (e.g. saved as myjob.js) from the command line, run the following command in the same directory

$ node.io myjob

A typical node.io job typically consists of a) taking some input, b) using or transforming it, and c) outputting something.

A full list of available job methods and options is [available here](#). however jobs typically contain an input, run, and output method. If omitted, input and output default to STDIN and STDOUT.
A full list of available job methods and options is [available here](#), however jobs typically contain an input, run, and output method. If omitted, input and output default to STDIN and STDOUT.

## Getting started

Expand Down

0 comments on commit 8f797fb

Please sign in to comment.