Skip to content
This repository has been archived by the owner on Aug 4, 2020. It is now read-only.

Commit

Permalink
Added example of how scrape works in 0.0.8.
Browse files Browse the repository at this point in the history
  • Loading branch information
rsdoiel committed Jan 12, 2012
1 parent 1c8979c commit 4ac1ae3
Showing 1 changed file with 54 additions and 8 deletions.
62 changes: 54 additions & 8 deletions docs/Scrape.md
@@ -1,9 +1,9 @@
Scrape
======
revision 0.0.7c
---------------
revision 0.0.8
--------------

# Scrape(document_or_path, map, callback, cleaner, transformer)
# Scrape(document_or_path, map, options, callback)

The scrape method is used to create to extract content from HTML markup. It
has three required parameters - document_or_path, map, and callback. It has two
Expand All @@ -27,18 +27,20 @@ to the callback function's second parameter. E.g. var map = { title:'title',
article:'#article'} would pass an object with title extracted from the title
element of hte HTML document and the element with the attribute id of article
in those respective properties.
* options - options for processing request (e.g. cleaner and transform functions).
* callback - this is a function to process the results of the page scraped. The
function should accept three parameters - error, data and pathname.
** error - is your typical error object passed in functions like fs.readFile()
** data is an object with property names corresponding to map's property names
but with the property's value containing the scraped results as a string (e.g.
the innerHTML of the tag)
** pathname - of the HTML markup was retrieved using FetchPage() then this is
the name of the path or URL used. Otherwise this is undefined.
** env - the environment used to process request (e.g. env.pathname would be the
path to the HTML source)

## optional

There are two optional parameters - cleaner and transformer. They must be
## options

There are two important options parameters - cleaner and transformer. They must be
JavaScript functions or they are ignored.

* cleaner - if present
Expand All @@ -57,4 +59,48 @@ font, spacing tags).

# Examples

[ EXAMPLE SHOULD GO HERE ]
```javascript
var extractor = require('extractor'), util = require('util');

var clean = function (source) {
console.log("clean() would allow you to cleanup the markup before passing to jsdom.");
console.log("EXAMPLE: Upcasing all the content");
return source.toUpperCase();
};

var transform = function (ky, val) {
var oddeven = 0;
console.log("transform() process by ky/value pairs allowing modification of attributes found in val.");
console.log("ky: " + util.inspect(ky));
console.log("val (before): " + util.inspect(val));
console.log("EXAMPLE: change values to Lower or Uppercase.");
Object.keys(val).forEach(function(i) {
// There is more then on div > h2 so we traverse an array of items that can be processed.
if (typeof val[i] === 'object') {
if (val[i].innerHTML !== undefined) {
if (oddeven) {
val[i].innerHTML = String(val[i].innerHTML).toLowerCase();
} else {
val[i].innerHTML = String(val[i].innerHTML).toUpperCase();
}
}
}
oddeven = (oddeven + 1) % 2;
});
console.log("val (after): " + util.inspect(val));
return val;
};

extractor.Scrape("http://nodejs.org", { title: "title", div_h2: "div > h2" }, { response: true,
cleaner:clean, transformer: transform}, function (err, data, env) {
if (err) {
console.error('ERROR: ' + err);
}
if (data) {
console.log('data: ' + JSON.stringify(data));
}
if (env) {
console.log('http return status code: ' + env.response.statusCode);
}
});
```

0 comments on commit 4ac1ae3

Please sign in to comment.