Scraper

Scraper is a simple webscraper built using node.js, phantomjs, and phantom.

Scraper is based on code found in the tutorial Screen Scraping with Node.js, which provides some background to web scraping and essentially explains the foundations of the code line by line. Definitely give it a read before getting started.

Basically, the best way to scrape a dynamic internet built with javascript is by using tools built in javascript that can imitate the way web browsers render the content of an increasing number of dynamic pages.

After you get node installed, just run these commands.

$ git clone git@github.com:selbyk/scraper.git
$ cd scraper
$ npm install
$ node app.js

Install and/or debug until you get this output instead of a sea of errors:

$ node app.js
opened site?  success
{ h2: [ 'Article 1', 'Article 2', 'Article 3' ],
  p:
   [ 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
     'Ut sed nulla turpis, in faucibus ante. Vivamus ut malesuada est. Curabitur vel enim eget purus pharetra tempor id in tellus.',
     'Curabitur euismod hendrerit quam ut euismod. Ut leo sem, viverra nec gravida nec, tristique nec arcu.' ] }
$

Don't know where this project is going, but I have some ideas if you're open to collaboration. You can find me in #sentiment on chat.freenode.net, by e-mail, or can stalk me down any other way.

Just don't show up at my apartment at 3 AM. Not cool, bro.

-Selby Kendrick

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
app.js		app.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper

About

Releases

Packages

Languages

selbyk/scraper

Folders and files

Latest commit

History

Repository files navigation

Scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages