scrape the best content from a page
Ruby
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
.gitignore
.rspec
.rvmrc
Gemfile
LICENSE
README.rdoc
Rakefile
mechanize_content.gemspec

README.rdoc

mechanize-content

Returns the most important pieces of content on a web page. Finds the best block of text, image and title by analysing the page content.

Usage

Pass in a URL on initialisation and then call the helpers to pull the best content out.

mc = MechanizeContent::Parser.new("http://www.joystiq.com/2010/03/19/mag-gets-free-trooper-gear-pack-dlc-next-week/")

mc.best_title

"MAG gets free 'Trooper Gear Pack' DLC next week -- Joystiq"

mc.best_text

"Ten-hut, soldiers! HQ has just sent word that some new gear will be shipping to the front lines of MAG next week, free of charge: the Trooper Gear Pack. In this parcel, we'll finally get access to the Flashbang grenade..."

mc.best_image

"http://www.blogcdn.com/www.joystiq.com/media/2010/03/580mage302.jpg"

The gem also supports multiple URLs and will find the best content between them. The order in which they are inserted determines priority.

Dependancies

  • Mechanize

  • imagesize

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright © 2010 John Griffin. See LICENSE for details.