Scrape comics strips from feed, embed the strip images and create a new feed
JavaScript
Switch branches/tags
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
parsers
screenshots
test
.gitignore
.npmignore
LICENSE
README.md
package.json

README.md

node-comics-feed

npm Licence David

RSS feeds of comics sites usually contains the links to a webpage but not the strip images.
This module iterates on the items on a feed and parse the webpages to create a new feed with embedded comic strips.

Supported websites:

  • GoComics
  • Dilbert.com
  • Explosm.net (credits to eguendelman)

The list of parsers is meant to be extensible, see Parsers.
PRs are welcome.

Inspired by gocomics-scrape and re-implemented using Node.

Usage

npm install comics-feed
comics-feed [.rss|url]

Turns this

Before

into this

After

(rendered by Firefox)

Parsers

parsers/*.js will be loaded automatically by parserFactory as of 0.0.9.

A parser should have this interface:

/**
 * Parser = {
 *   name,
 *   match(),
 *   scrape()
 * }
 *
 * match():
 * @param {Object}   siteUrl  parsed url for the comic strips site
 * Returns a boolean whether this scraper can handle this site 
 *
 * scrape():
 * @param {String}   baseUrl  url of the webpage containing the comic strip
 * @param {Object}   $        [cheerio](http://matthewmueller.github.io/cheerio/) object containing the parsed page
 * @param {Function} callback callback function to return the scraped info
 *
 * callback:
 * @param {Object}   error    error object if one occurs
 * @param {String}   imgUrl   URL for the strip's image 
 *
 */

Tested on

See test/live.js

TODO

  • allow parsers to return custom description
  • error handling
    • invalid URL
    • malformed feed
    • scraping error
  • adds pubDate for items
  • re-entrance
  • module globals cleanup

SaaS on Heroku

heroku-comics-feed uses this module to provide a subscribable RSS service.