Simple scraper to download pdf pages from Exact Editions magazines. Using CasperJS and PhantomJs. Only usable with a subscription.
JavaScript
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
getissue.js
getpage.js
package.json

README.md

Exact Editions Issue Pages Scraper

Simple scraper to download pdf pages from Exact Editions magazines. Using CasperJS and PhantomJs. Only usable with a subscription.

Please only use if you have a subscription to an existing magazine and do not use to scrape for distribution. Respect intellectual property -- writers and artists gotta make a living too :)

Please read Exact Edition's terms of service if in doubt.

Installation

Dependencies

Npm

To install dependencies

npm install -g

Usage

###Fetch all pages of one issue Clone this repo. Cd to the folder.

Then run in terminal

casperjs getissue.js --username=<your EE username> --password=<your EE password> <issue_link_1>...<issue_link_n>

###Fetch specific pages of one issue Usually when a few pages from getissue.js fails.

Run in terminal

casperjs getissue.js --username=<your EE username> --password=<your EE password> --pages=<page 1>:<prefix 1>,<page 2>:<prefix 2> <issue_link>

Note that prefix is used in the following way for file naming (to be consistent with getissue.js named files:

<issue title>-<prefix>-<page label number>.pdf

For example:

casperjs getpage.js --username=name@example.com --password=example --pages=OFC:001,11:ABC http://www.exacteditions.com/read/popshot/the-time-issue-40247

Will download the following

  • The Time Issue-001-OFC.pdf
  • The Time Issue-ABC-11.pdf

Files will be downloaded to 'download' child directory

Compiling to one PDF

Used PDFTK on command line for this. You will have to install it.

pdftk *.pdf cat output output.pdf