Crawl is experimental project that tracks popular finnish news sites and reveals what changes are made to the articles since they are published.
Modular structure supports also external site parsers.
GIT is used to store the articles. Article changes could be browsed using GIT tools, but project also contains web front-end built on Sinatra and Backbone.js.
Install required gems (nokogiri, json, grit, thin, sinatra) with
bundle install
Intialize empty GIT repository. Default path is ./repository
git init repository
Sinatra backend could be started with command
bundle exec ruby sinatra-backend.rb
To detect changes crawlers should be executed for example every hour (cron task is recommended). Run all crawlers in ./crawlers directory with command
bundle exec ruby crawler.rb
crawler.rb takes also list of files as parameter to run specific crawlers.