Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A web-scraping framework written in Javascript, using PhantomJS and jQuery
JavaScript Python Shell
Branch: data-persisten…
Pull request Compare This branch is 22 commits behind master.

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
client
lib
tests
.gitignore
LICENSE.txt
README.md
VERSION.txt
pjscrape.js

README.md

Homepage: http://nrabinowitz.github.com/pjscrape/

Overview

pjscrape is a framework for anyone who's ever wanted a command-line tool for web scraping using Javascript and jQuery. Built to run with PhantomJS, it allows you to scrape pages in a fully rendered, Javascript-enabled context from the command line, no browser required.

Dependencies

Features

  • Client-side, Javascript-based scraping environment with full access to jQuery functions
  • Easy, flexible syntax for setting up one or more scrapers
  • Recursive/crawl scraping
  • Delay scrape until a "ready" condition occurs
  • Load your own scripts on the page before scraping
  • Modular architecture for logging and writing/formatting scraped items
  • Client-side utilities for common tasks
  • Growing set of unit tests

Please see http://nrabinowitz.github.com/pjscrape/ for usage, examples, and documentation.

Comments and questions welcomed at nick (at) nickrabinowitz (dot) com.

Something went wrong with that request. Please try again.