Skip to content
/ scraper Public

Scraper is a simple webscraper built using node.js, phantomjs, and phantom.

Notifications You must be signed in to change notification settings

selbyk/scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scraper

Scraper is a simple webscraper built using node.js, phantomjs, and phantom.

Scraper is based on code found in the tutorial Screen Scraping with Node.js, which provides some background to web scraping and essentially explains the foundations of the code line by line. Definitely give it a read before getting started.

Basically, the best way to scrape a dynamic internet built with javascript is by using tools built in javascript that can imitate the way web browsers render the content of an increasing number of dynamic pages.

After you get node installed, just run these commands.

$ git clone git@github.com:selbyk/scraper.git
$ cd scraper
$ npm install
$ node app.js

Install and/or debug until you get this output instead of a sea of errors:

$ node app.js
opened site?  success
{ h2: [ 'Article 1', 'Article 2', 'Article 3' ],
  p:
   [ 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
     'Ut sed nulla turpis, in faucibus ante. Vivamus ut malesuada est. Curabitur vel enim eget purus pharetra tempor id in tellus.',
     'Curabitur euismod hendrerit quam ut euismod. Ut leo sem, viverra nec gravida nec, tristique nec arcu.' ] }
$

Don't know where this project is going, but I have some ideas if you're open to collaboration. You can find me in #sentiment on chat.freenode.net, by e-mail, or can stalk me down any other way.

Just don't show up at my apartment at 3 AM. Not cool, bro.

-Selby Kendrick

About

Scraper is a simple webscraper built using node.js, phantomjs, and phantom.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published