Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Updated README
  • Loading branch information
skid committed Feb 28, 2012
1 parent 25bee65 commit 462f9e8
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion README.md
@@ -1,7 +1,12 @@
# Picksy

Picksy is a scraper that will extract the relevant text from an HTML page like a blog post, a news article or anything that has a considerable chunk of text.
I developed it to help me scrape articles from the web that will be further used for data mining where absolutely precise extraction is not essential. I wouldn't suggest using it for projects like [Readability](http://www.readability.com/) since it will often show an extra link or gobble up an occasional table of contents.

I developed it to help me scrape articles from the web that will be further used for data mining where absolutely precise extraction is not essential.

I wouldn't suggest using it for projects like [Readability](http://www.readability.com/) since it will often show an extra link or gobble up an occasional table of contents.

You should expect nothing useful from homepages, navigation/category pages, forums and discussion thread web applications.

Picksy depends on [node-htmlparser](https://github.com/tautologistics/node-htmlparser) to provide its input and works directly on the DOM tree constructed by htmlparser.

Expand Down

0 comments on commit 462f9e8

Please sign in to comment.