From 462f9e8f221c8a2fe7ddaeeb98533495d5526ad7 Mon Sep 17 00:00:00 2001 From: Dusko Jordanovski Date: Tue, 28 Feb 2012 22:09:59 +0100 Subject: [PATCH] Updated README --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 95d4946..19c5dbc 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,12 @@ # Picksy Picksy is a scraper that will extract the relevant text from an HTML page like a blog post, a news article or anything that has a considerable chunk of text. -I developed it to help me scrape articles from the web that will be further used for data mining where absolutely precise extraction is not essential. I wouldn't suggest using it for projects like [Readability](http://www.readability.com/) since it will often show an extra link or gobble up an occasional table of contents. + +I developed it to help me scrape articles from the web that will be further used for data mining where absolutely precise extraction is not essential. + +I wouldn't suggest using it for projects like [Readability](http://www.readability.com/) since it will often show an extra link or gobble up an occasional table of contents. + +You should expect nothing useful from homepages, navigation/category pages, forums and discussion thread web applications. Picksy depends on [node-htmlparser](https://github.com/tautologistics/node-htmlparser) to provide its input and works directly on the DOM tree constructed by htmlparser.