Ingest a batch of URLs and output clean "source" domains.
Example:
http://www.cnn.com/2015/02/12/europe/ukraine-conflict/index.html
becomescnn.com
Extracts data from a news article using Diffbot.
Check a URL to see if it provides a clue about the date an article was published.
Given a URL, extract data from a page using the Readability Pareser API, then search it for geo references using the Yahoo PlaceFinder API.
Ingest an RSS feed and push each entry into a database. There is conditional handling for lots of different date formats.
This is a general utility to wrap the database connection and handle other odds and ends.